Skip to main content

Web Harvesting

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 44 Accesses

Synonyms

Web data extraction; Web information extraction; Web mining

Definition

Web harvesting describes the process of gathering and integrating data from various heterogeneous web sources. Necessary input is an appropriate knowledge representation of the domain of interest (e.g., an ontology), together with example instances of concepts or relationships (seed knowledge). Output is a structured data (e.g., in the form of a relational database) that is gathered from the Web. The term harvesting implies that, while passing over a large body of available information, the process gathers only such information that lies in the domain of interest and is, as such, relevant.

Key Points

The process of web harvesting can be divided into three subsequent tasks:

  1. (i)

    Data or information retrieval, which involves finding relevant information on the Web and storing it locally. This task requires tools for searching and navigating the Web, i.e., crawlers and means for interacting with dynamic or...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Ciravegna F, Chapman S, Dingli A, Wilks Y. Learning to harvest information for the Semantic Web. In: Proceedings of the 1st European Semantic Web Symposium; 2004. p. 312–26.

    Google Scholar 

  2. Crescenzi V, Mecca G. Automatic information extraction from large websites. J ACM. 2004;51(5):731–79.

    Article  MathSciNet  MATH  Google Scholar 

  3. Etzioni O, Cafarella MJ, Downey D, Kok S, Popescu AM, Shaked T, Soderland S, Weld DS, Yates A. Web-scale information extraction in KnowItAll: (preliminary results). In: Proceedings of the 12th International World Wide Web Conference; 2004. p. 100–10.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Gatterbauer .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Gatterbauer, W. (2018). Web Harvesting. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1172

Download citation

Publish with us

Policies and ethics