Jump to content

Data Toolbar

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Vlb50 (talk | contribs) at 04:21, 17 December 2010 (Broken links fixed). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Data Toolbar
Developer(s)DataTool Services
Operating systemMicrosoft Windows
TypeBrowser toolbar, Web scraping
LicenseShareware
Websitewww.datatoolbar.com

Data Toolbar is an Internet Explorer add-on to collect catalog style information from the web.

Algorithm

The program implements a variation of the genetic tree matching algorithm with respect to nested lists.[1] That is, inside a given website, the program recursively traverses the branches of its DOM tree, aiming to detect nested lists of data sets matching the format of the specified content. This approach has several known advantages over a simple string matching algorithm.[2]

Features

  • Collection of data and images directly from the Internet Exlorer
  • Collection of information from Details pages linked to the catalog
  • Automatic processing of multi-page catalogs
  • Support of irregular multi-row catalogs mixed with advertisement

Similar Tools

Sources

  1. ^ Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Altigran S. da Silva, Juliana S. Teixeira A Brief Survey of Web Data Extraction Tools ACM SIGMOD Volume 31 Issue 2
  2. ^ Nitin Jindal, Bing Liu A Generalized Tree Matching Algorithm Considering Nested Lists for Web Data Extraction Proceedings of the Tenth SIAM International Conference on Data Mining, 2010