Synonyms
Web data extraction; Web information extraction; Web mining
Definition
Web harvesting describes the process of gathering and integrating data from various heterogeneous web sources. Necessary input is an appropriate knowledge representation of the domain of interest (e.g., an ontology), together with example instances of concepts or relationships (seed knowledge). Output is a structured data (e.g., in the form of a relational database) that is gathered from the Web. The term harvesting implies that, while passing over a large body of available information, the process gathers only such information that lies in the domain of interest and is, as such, relevant.
Key Points
The process of web harvesting can be divided into three subsequent tasks:
- (i)
Data or information retrieval, which involves finding relevant information on the Web and storing it locally. This task requires tools for searching and navigating the Web, i.e., crawlers and means for interacting with dynamic or...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Ciravegna F, Chapman S, Dingli A, Wilks Y. Learning to harvest information for the Semantic Web. In: Proceedings of the 1st European Semantic Web Symposium; 2004. p. 312–26.
Crescenzi V, Mecca G. Automatic information extraction from large websites. J ACM. 2004;51(5):731–79.
Etzioni O, Cafarella MJ, Downey D, Kok S, Popescu AM, Shaked T, Soderland S, Weld DS, Yates A. Web-scale information extraction in KnowItAll: (preliminary results). In: Proceedings of the 12th International World Wide Web Conference; 2004. p. 100–10.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Gatterbauer, W. (2018). Web Harvesting. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1172
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1172
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering