Skip to main content

Wrapper Induction

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems
  • 26 Accesses

Synonyms

Information extraction; Wrapper generation

Definition

Wrapper induction (or query induction) is a subfield of wrapper generation, which itself belongs to the broader field of information extraction (IE). In IE, wrappers transform unstructured input into structured output formats, and a wrapper generation system describes the transformation rules involved in such transformations. Wrapper induction is a solution to wrapper generation where transformation rules are learned from examples and counterexamples (inductive learning). The induced wrapper subsequently is applied to unseen input documents to collect further label relations of interest. To ease annotation of examples by the user, the learning framework is often implemented within a visual annotation environment, where the user selects and deselects elements visually.

The term “wrapper induction” was first conceptualized by Nicholas Kushmerick in his influential Ph.D thesis in 1997 in the context of semi-structured Web...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Adelberg B. NoDoSE: a tool for semi-automatically extracting structured and semistructured data from text documents. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 283–94.

    Google Scholar 

  2. Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 119–28.

    Google Scholar 

  3. Carme J, Ceresna M, Goebel M. Web wrapper specification using compound filter learning. In: Proceedings of the IADIS International Conference on WWW/Internet 2006; 2006.

    Google Scholar 

  4. Chang CH, Kuo SC. OLERA: semisupervised web-data extraction with visual support. IEEE Intell Syst. 2004;19(6):56–64.

    Article  Google Scholar 

  5. Finn A, Kushmerick N. Active learning selection strategies for information extraction. In: Proceedings of the Workshop on Adaptative Text Extraction and Mining; 2003.

    Google Scholar 

  6. Freitag D, Kushmerick N. Boosted wrapper induction. In: Proceedings of the 12th National Conference on AI; 2000. p. 577–83.

    Google Scholar 

  7. Hsu CN, Dung MT. Generating finite-state transducers for semi-structured data extraction from the web. Inf Syst. 1998;23(8):521–38.

    Article  Google Scholar 

  8. Irmak U, Suel T. Interactive wrapper generation with minimal user effort. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 553–63.

    Google Scholar 

  9. Knoblock CA, Lerman K, Minton S, Muslea I. Accurately and reliably extracting data from the web: a machine learning approach. Q Bull, IEEE TC Data Eng. 2000;23(4):33–41.

    Google Scholar 

  10. Kushmerick N. Wrapper induction for information extraction. PhD thesis, University of Washington; 1997.

    Google Scholar 

  11. Kushmerick N. Wrapper induction: efficiency and expressiveness. Artif Intell. 2000;118(1–2):15–68.

    Article  MathSciNet  MATH  Google Scholar 

  12. Laender AHF, Ribeiro-Neto B, da Silva AS. DEByE – date extraction by example. Data Knowl Eng. 2002;40(2):121–54.

    Article  MATH  Google Scholar 

  13. Liu L, Pu C, Han W. XWRAP: an XML-enabled wrapper construction system for web information sources. In: Proceedings of the 16th International Conference on Data Engineering; 2000. p. 611–21.

    Google Scholar 

  14. Muslea I, Minton S, Knoblock C. STALKER: learning extraction rules for semistructured, web-based information sources. 1998. URL http://citeseer.ist.psu.edu/muslea98stalker.html

  15. Muslea I, Minton S, Knoblock CA. Selective sampling with redundant views. In: Proceedings of the 12th National Conference on AI; 2000. p. 621–26.

    Google Scholar 

  16. Sahuguet A, Azavant F. WysiWyg web wrapper factory (W4F). 2001. URL http://citeseer.ist.psu.edu/553711.html; http://www.ai.mit.edu/people/jimmylin/papers/Sahuguet99.ps

  17. Seymore K, McCallum A, Rosenfeld R. Learning hidden Markov model structure for information extraction. In: Proceedings of the AAAI 99 Workshop on Machine Learning for Information Extraction; 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Goebel .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Goebel, M., Ceresna, M. (2018). Wrapper Induction. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1160

Download citation

Publish with us

Policies and ethics