Web Data Extraction System

Baumgartner, Robert; Gatterbauer, Wolfgang; Gottlob, Georg

doi:10.1007/978-1-4614-8265-9_1154

Robert Baumgartner³,
Wolfgang Gatterbauer⁴ &
Georg Gottlob⁵

81 Accesses
1 Citations

Synonyms

Web information extraction system; Web macros; Web scraper; Wrapper generator

Definition

A web data extraction system is a software system that automatically and repeatedly extracts data from web pages with changing content and delivers the extracted data to a database or some other application. The task of web data extraction performed by such a system is usually divided into five different functions: (i) Web interaction, which comprises mainly the navigation to usually pre-determined target web pages containing the desired information; (ii) Support for wrapper generation and execution, where a wrapper is a program that identifies the desired data on target pages, extracts the data and transforms it into a structured format; (iii) Scheduling, which allows repeated application of previously generated wrappers to their respective target pages; (iv) Data transformation, which includes filtering, transforming, refining, and integrating data extracted from one or more sources and...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Vienna University of Technology, Vienna, Austria
Robert Baumgartner
University of Washington, Seattle, WA, USA
Wolfgang Gatterbauer
Computing Laboratory, Oxford University, Oxford, UK
Georg Gottlob

Authors

Robert Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Gatterbauer
View author publications
You can also search for this author in PubMed Google Scholar
Georg Gottlob
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Baumgartner .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

Computing Lab., Oxford Univ., Oxford, UK
Georg Gottlob

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Baumgartner, R., Gatterbauer, W., Gottlob, G. (2018). Web Data Extraction System. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1154

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_1154
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Web Data Extraction System

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Web Data Extraction System

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation