Skip to main content

Fully Automatic Web Data Extraction

  • Reference work entry
  • First Online:
Book cover Encyclopedia of Database Systems

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Crescenzi V, Mecca G, Merialdo P. RoadRunner: towards automatic data extraction from large web sites. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 109–18.

    Google Scholar 

  2. Debnath S, Mitra P, Giles CL. Automatic extraction of informative blocks from webpages. In: Proceedings of the 2005 ACM Symposium on Applied Computing; 2005. p. 1722–6.

    Google Scholar 

  3. Glance N, Hurst M, Nigam K, Siegler M, Stockton R, Tomokiyo T. Deriving marketing intelligence from online discussion. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2005. p. 419–28.

    Google Scholar 

  4. Hofmann K, Weerkamp W. Web corpus cleaning using content and structure. In: Fairon C, Naerts H, Kilgarrif A, de Schryver G, editors. Building and exploring web Corpora. vol. 4, UCL; 2007.p. 145–54.

    Google Scholar 

  5. Kovacevic M, Dilligenti M, Gori M, Milutinovic V. Recognition of common areas in a web page using a visualization approach. In: Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications; 2002. p. 203–12.

    Chapter  Google Scholar 

  6. Kushmerick N, Weld D, Doorenbos R. Wrapper induction for information extraction. In: Proceedings of the 15th International Joint Conference on AI; 1997. p. 119–28.

    Google Scholar 

  7. Lin SH, Ho JM. Discovering informative content blocks from web documents. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002.p. 588–93.

    Google Scholar 

  8. Liu B, Grossman R, Zhai Y. Mining data records in web pages. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003. p. 601–6.

    Google Scholar 

  9. Muslea I, Minton S, Knoblock C. Hierarchical wrapper induction for semistructured information sources. Auton Agent Multi-Agent Syst. 2001;4(1–2):93–114.

    Article  Google Scholar 

  10. Simon K, Lausen G. ViPER: augmenting automatic information extraction with visual perceptions. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management; 2005. p. 381–8.

    Google Scholar 

  11. Ziegler CN, Skubacz M. Towards automated reputation and brand monitoring on the web. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence; 2006. p. 1066–70.

    Google Scholar 

  12. Ziegler CN, Skubacz M. Content extraction from news pages using particle swarm optimization on an linguistic and structural features. In: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence; 2007. p. 242–9.

    Google Scholar 

  13. Zhao H, Meng W, Wu Z, Raghavan V, Yu C. Fully automatic wrapper generation for search engines. In: Proceedings of the 14th International World Wide Web Conference; 2005. p. 66–75.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cai-Nicolas Ziegler .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Ziegler, CN. (2018). Fully Automatic Web Data Extraction. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1159

Download citation

Publish with us

Policies and ethics