Synonyms
Information extraction; Wrapper generation
Definition
Wrapper induction (or query induction) is a subfield of wrapper generation, which itself belongs to the broader field of information extraction (IE). In IE, wrappers transform unstructured input into structured output formats, and a wrapper generation system describes the transformation rules involved in such transformations. Wrapper induction is a solution to wrapper generation where transformation rules are learned from examples and counterexamples (inductive learning). The induced wrapper subsequently is applied to unseen input documents to collect further label relations of interest. To ease annotation of examples by the user, the learning framework is often implemented within a visual annotation environment, where the user selects and deselects elements visually.
The term “wrapper induction” was first conceptualized by Nicholas Kushmerick in his influential Ph.D thesis in 1997 in the context of semi-structured Web...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Adelberg B. NoDoSE: a tool for semi-automatically extracting structured and semistructured data from text documents. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 283–94.
Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 119–28.
Carme J, Ceresna M, Goebel M. Web wrapper specification using compound filter learning. In: Proceedings of the IADIS International Conference on WWW/Internet 2006; 2006.
Chang CH, Kuo SC. OLERA: semisupervised web-data extraction with visual support. IEEE Intell Syst. 2004;19(6):56–64.
Finn A, Kushmerick N. Active learning selection strategies for information extraction. In: Proceedings of the Workshop on Adaptative Text Extraction and Mining; 2003.
Freitag D, Kushmerick N. Boosted wrapper induction. In: Proceedings of the 12th National Conference on AI; 2000. p. 577–83.
Hsu CN, Dung MT. Generating finite-state transducers for semi-structured data extraction from the web. Inf Syst. 1998;23(8):521–38.
Irmak U, Suel T. Interactive wrapper generation with minimal user effort. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 553–63.
Knoblock CA, Lerman K, Minton S, Muslea I. Accurately and reliably extracting data from the web: a machine learning approach. Q Bull, IEEE TC Data Eng. 2000;23(4):33–41.
Kushmerick N. Wrapper induction for information extraction. PhD thesis, University of Washington; 1997.
Kushmerick N. Wrapper induction: efficiency and expressiveness. Artif Intell. 2000;118(1–2):15–68.
Laender AHF, Ribeiro-Neto B, da Silva AS. DEByE – date extraction by example. Data Knowl Eng. 2002;40(2):121–54.
Liu L, Pu C, Han W. XWRAP: an XML-enabled wrapper construction system for web information sources. In: Proceedings of the 16th International Conference on Data Engineering; 2000. p. 611–21.
Muslea I, Minton S, Knoblock C. STALKER: learning extraction rules for semistructured, web-based information sources. 1998. URL http://citeseer.ist.psu.edu/muslea98stalker.html
Muslea I, Minton S, Knoblock CA. Selective sampling with redundant views. In: Proceedings of the 12th National Conference on AI; 2000. p. 621–26.
Sahuguet A, Azavant F. WysiWyg web wrapper factory (W4F). 2001. URL http://citeseer.ist.psu.edu/553711.html; http://www.ai.mit.edu/people/jimmylin/papers/Sahuguet99.ps
Seymore K, McCallum A, Rosenfeld R. Learning hidden Markov model structure for information extraction. In: Proceedings of the AAAI 99 Workshop on Machine Learning for Information Extraction; 1999.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Goebel, M., Ceresna, M. (2018). Wrapper Induction. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1160
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1160
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering