Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu


  • Oliver FrölichEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1166


ETL using Web data extraction techniques


As ETL (acronym for extraction, transformation, and loading) is a well-established technology for the extraction of data from several sources, their cleansing, normalization, and insertion into a data warehouse (e.g., a business intelligence system), Web ETL stands for an ETL process where the external data to be inserted into the data warehouse is extracted from semi-structured Web pages (e.g., in HTML or PDF format) using Web data extraction techniques.

Particularly, back-end interchange of structured data just using the Web, e.g., two database systems exchanging data with Web electronic data interchange (EDI) technology (EDI stands for techniques and standards for the transmission of structured data, e.g., over the Web, in an application-to-application context.), is not a Web ETL process as no semi-structured data needs to be transformed using Web data extraction techniques.

Key Points

Powerful and efficient tools...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 119–28.Google Scholar
  2. 2.
    Baumgartner R, Frölich O, Gottlob G, Harz P, Herzog M, Lehmann P. Web data extraction for business intelligence: the Lixto approach. In: Proceedings of the Datenbanksysteme in Business, Technologie und Web; 2005. p. 48–65.Google Scholar
  3. 3.
    Frölich O. Optimierung von Geschäftsprozessen durch Integrierte Wrapper-Technologien. Dissertation, Institute of Information Systems, Vienna University of Technology; 2006.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Lixto Software GmbHViennaAustria

Section editors and affiliations

  • Georg Gottlob
    • 1
  1. 1.Computing Lab.Oxford Univ.OxfordUK