Jump to content

Web data integration

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Mjgp2 (talk | contribs) at 03:14, 12 February 2019 (Added examples of sources, challenges and applications.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Web Data Integration (WDI) is the process of aggregating data from different sources into a homogenous view, which includes data access, transformation, mapping, quality assurance and fusion of data. Web Data Integration is an extension and specialization of Data Integration, that views the Web as a collection of heterogenous databases.

Data integration techniques in the context of the Web form the foundation for taking advantage of the ever increasing number of publicly-accessible Web data sources by businesses.[1]

Spending on this area amounted to about US$2.5bn in 2017, and it is expected that by 2020 the market will reach almost US$7bn.[2]

Web Data Integration Sources

Web Data Integration extends and specialises Data Integration to see the Web as a collection of views of databases accessible over the Web protocols, including, but not limited to:

  • Open data catalogues
  • Government data catalogues
  • Web applications and sites
    • UI
    • API
  • The Semantic Web (SPARQL)
  • HTML Embedded Structured Data
  • HTML Data Tables
  • Spreadsheets
  • PDFs
  • Online encyclopaedias

Technical challenges

Web Data Integration has technical challenges different to Data Integration due to the data access and transformation required for the web data sources.

Understanding quality and veracity of data is even more important in than in Data Integration, as in Data Integration the data is generally more implicitly trusted and of higher quality than that which is collected from an untrusted web source.

Applications

Web Data Integration has application in many fields, including bioinformatics[3], search engines[4], price comparison[5], and forensic search[6].

References

  1. ^ "IE 670 Web Data Integration". www.uni-mannheim.de. 2019-01-24. Retrieved 2019-02-11.
  2. ^ "Opimas: The Web Data Extraction Market". Opimas: We begin with an understanding. Retrieved 2019-02-12. {{cite web}}: Cite has empty unknown parameter: |dead-url= (help)
  3. ^ "Web Data Integration". Database Group Leipzig. {{cite web}}: Cite has empty unknown parameter: |dead-url= (help)
  4. ^ "Web-scale Data Integration - You Can Only Afford to Pay as You Go". www.datascienceassn.org. Retrieved 2019-02-12.
  5. ^ Siegel, Michael D.; Madnick, Stuart E.; Zhu, Hongwei (2008). "Enabling global price comparison through semantic integration of web data". undefined. Retrieved 2019-02-12.
  6. ^ "PwC buys Kusiri, London-based fraud detection start-up". www.consultancy.uk. Retrieved 2019-02-12.