An Overview and Classification of Adaptive Approaches to Information Extraction

Siefkes, Christian; Siniakov, Peter

doi:10.1007/11603412_6

Christian Siefkes^17,18 &
Peter Siniakov¹⁷

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 3730))

959 Accesses
5 Citations

Abstract

Most of the information stored in digital form is hidden in natural language texts. Extracting and storing it in a formal representation (e.g. in form of relations in databases) allows efficient querying, easy administration and further automatic processing of the extracted data. The area of information extraction (IE) comprises techniques, algorithms and methods performing two important tasks: finding (identifying) the desired, relevant data and storing it in appropriate form for future use.

The rapidly increasing number and diversity of IE systems are the evidence of continuous activity and growing attention to this field. At the same time it is becoming more and more difficult to overview the scope of IE, to see advantages of certain approaches and differences to others. In this paper we identify and describe promising approaches to IE. Our focus is adaptive systems that can be customized for new domains through training or the use of external knowledge sources. Based on the observed origins and requirements of the examined IE techniques a classification of different types of adaptive IE systems is established.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aone, C., Halverson, L., Hampton, T., Ramos-Santacruz, M.: SRA: Description of the IE2 system used for MUC. In: Proceedings of the Seventh Message Understanding Conference (MUC-7) (1998)
Google Scholar
Bagga, A., Chai, J.Y.: A trainable message understanding system. In: CoNLL, pp. 1–8 (1997)
Google Scholar
Califf, M.E.: Relational Learning Techniques for Natural Language Extraction. PhD thesis, University of Texas at Austin (1998)
Google Scholar
Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: Working Notes of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Menlo Park, CA, pp. 6–11 (1998)
Google Scholar
Califf, M.E., Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research 4, 177–210 (2003)
Article MathSciNet Google Scholar
Cardie, C.: A case-based approach to knowledge acquisition for domain-specific sentence analysis. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 798–803. AAAI Press, Menlo Park (1993)
Google Scholar
Chai, J.Y., Biermann, A.W.: The use of word sense disambiguation in an information extraction system. In: AAAI/IAAI (1999)
Google Scholar
Chieu, H.L., Ng, H.T.: A maximum entropy approach to information extraction from semi-structured and free text. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI 2002), pp. 786–791 (2002)
Google Scholar
Ciravegna, F.: (LP)², an adaptive algorithm for information extraction from Web-related texts. In: Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, USA (2001)
Google Scholar
Ciravegna, F., Lavelli, A.: LearningPinocchio: Adaptive information extraction for real world applications. In: Proceedings of the 2nd Workshop on Robust Methods in Analysis of Natural Language Data (ROMAND 2002), Frascati, Italy (2002)
Google Scholar
Collier, R.: Automatic template creation for information extraction, an overview. Technical report, University of Sheffield (1996)
Google Scholar
De Sitter, A., Daelemans, W.: Information extraction via double classification. In: Proceedings of the International Workshop on Adaptive Text Extraction and Mining, ATEM-2003 (2003)
Google Scholar
Delisle, S., Barker, K., Delannoy, J.-F., Matwin, S., Szpakowicz, S.: From text to Horn clauses: Combining linguistic analysis and machine learning. In: 10th Canadian AI Conf. (1994)
Google Scholar
Eikvil, L.: Information extraction from World Wide Web – A survey. Technical Report 945, Norwegian Computing Center (1999)
Google Scholar
Embley, D.W., Campbell, D.M., Smith, R.D., Liddl, S.W.: Ontology-based extraction and structuring of information from data-rich unstructured documents. In: Conference on Information and Knowledge Management (CIKM), pp. 52–59 (1998)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden Markov model: Analysis and applications. Machine Learning 32(1), 41–62 (1998)
Article MATH Google Scholar
Finn, A., Kushmerick, N.: Information extraction by convergent boundary classification. In: AAAI-2004 Workshop on Adaptive Text Extraction and Mining, San Jose, USA (2004)
Google Scholar
Finn, A., Kushmerick, N.: Multi-level boundary classification for information extraction. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 111–122. Springer, Heidelberg (2004)
Chapter Google Scholar
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. PhD thesis, Carnegie Mellon University (1998)
Google Scholar
Freitag, D.: Toward general-purpose learning for information extraction. In: Boitet, C., Whitelock, P. (eds.) Proc. 36th Annual Meeting of the Association for Computational Linguistics, San Francisco, CA, pp. 404–408 (1998)
Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: AAAI/IAAI, pp. 577–583 (2000)
Google Scholar
Freitag, D., McCallum, A.K.: Information extraction with HMMs and shrinkage. In: Proceedings of the AAAI-1999 Workshop on Machine Learning for Information Extraction (1999)
Google Scholar
Freitag, D., McCallum, A.K.: Information extraction with HMM structures learned by stochastic optimization. In: AAAI/IAAI, pp. 584–589 (2000)
Google Scholar
Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13(1), 3–54 (1999)
Article MATH Google Scholar
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM—semi-automatic creation of metadata. In: Gomez-Perez, A., Benjamins, V.R. (eds.) Proc. 13th International Conference on Knowledge Engineering and Management (2002)
Google Scholar
Kauchak, D., Smarr, J., Elkan, C.: Sources of success for information extraction methods. Technical Report CS2002-0696, UC San Diego (2002)
Google Scholar
Lafferty, J., McCallum, A.K., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Google Scholar
Lavelli, A., Califf, M., Ciravegna, F., Freitag, D., Giuliano, C., Kushmerick, N., Romano, L.: A critical survey of the methodology for IE evaluation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, LREC 2004 (2004)
Google Scholar
Lavelli, A., Califf, M.-E., Ciravegna, F., Freitag, D., Giuliano, C., Kushmerick, N., Romano, L.: IE evaluation: Criticisms and recommendations. In: AAAI-2004 Workshop on Adaptive Text Extraction and Mining, San Jose, USA (2004)
Google Scholar
Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2, 285–318 (1988)
Google Scholar
McCallum, A., Wellner, B.: Object consolidation by graph partitioning with a conditionally-trained distance metric. In: KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation (2003)
Google Scholar
McCallum, A.K., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: ICML (2000)
Google Scholar
McCallum, A.K., Jensen, D.: A note on the unification of information extraction and data mining using conditional-probability, relational models. In: IJCAI 2003 Workshop on Learning Statistical Models from Relational Data (2003)
Google Scholar
Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stone, R., Weischedel, R., and the Annotation Group.: Algorithms that learn to extract information—BBN: Description of the SIFT system as used for MUC. In: MUC-7 (1998)
Google Scholar
Miller, S., Fox, H., Ramshaw, L., Weischedel, R.: A novel use of statistical parsing to extract information from text. In: ANLP-NAACL, pp. 226–233 (2000)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.A.: Hierarchical wrapper induction for semistructured information sources. Autonomous Agents and Multi-Agent Systems 4(1/2), 93–114 (2001)
Article Google Scholar
Muslea, I., Minton, S., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI 2003 (2003)
Google Scholar
Nahm, U.Y., Mooney, R.J.: Using information extraction to aid the discovery of prediction rules from text. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, Boston, MA (2000)
Google Scholar
Nobata, C., Sekine, S.: Towards automatic acquisition of patterns for information extraction. In: International Conference of Computer Processing of Oriental Languages (1999)
Google Scholar
Peshkin, L., Pfeffer, A.: Bayesian information extraction network. In: IJCAI (2003)
Google Scholar
Quinlan, J.R., Cameron-Jones, R.M.: Induction of logic programs: FOIL and related systems. New Generation Computing 13(3,4), 287–312 (1995)
Article Google Scholar
Riloff, E., Jones, R.: Learning dictionaries for information extraction by multi-level bootstrapping. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 1044–1049. The AAAI Press/MIT Press (1999)
Google Scholar
Riloff, E., Schmelzenbach, M.: An empirical approach to conceptual case frame acquisition. In: Proceedings of the Sixth Workshop on Very Large Corpora. (1998)
Google Scholar
RISE repository, http://www.isi.edu/info-agents/RISE/
Roth, D., Yih., W.-t.: Relational learning via propositional algorithms: An information extraction case study. In: IJCAI (2001)
Google Scholar
Roth, D., Yih, W.-t.: Probabilistic reasoning for entity & relation recognition. In: COLING 2002 (2002)
Google Scholar
Scheffer, T., Decomain, C., Wrobel, S.: Active hidden Markov models for information extraction. In: Proceedings of the International Symposium on Intelligent Data Analysis (2001)
Google Scholar
Scheffer, T., Wrobel, S., Popov, B., Ognianov, D., Decomain, C., Hoche, S.: Learning hidden Markov models for information extraction actively from partially labeled text. Künstliche Intelligenz (2) (2002)
Google Scholar
Siefkes, C.: Incremental information extraction using tree-based context representations. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 510–521. Springer, Heidelberg (2005)
Chapter Google Scholar
Skounakis, M., Craven, M., Ray, S.: Hierarchical hidden Markov models for information extraction. In: IJCAI (2003)
Google Scholar
Soderland, S.: Learning Text Analysis Rules for Domain-specific Natural Language Processing. PhD thesis, University of Massachusetts, Amherst (1997)
Google Scholar
Soderland, S.: Learning to extract text-based information from the World Wide Web. In: Proc. Third International Conference on Knowledge Discovery and Data Mining (KDD 1997), pp. 251–254 (1997)
Google Scholar
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1–3), 233–272 (1999)
Article MATH Google Scholar
Soderland, S.: Building a machine learning based text understanding system. In: Proc. IJCAI-2001 Workshop on Adaptive Text Extraction and Mining (2001)
Google Scholar
Soderland, S., Fisher, D., Aseltine, J., Lehnert, W.: CRYSTAL: Inducing a conceptual dictionary. In: Mellish, C. (ed.) Proc. 14th International Joint Conference on Artificial Intelligence, San Francisco, pp. 1314–1319 (1995)
Google Scholar
Sudo, K., Sekine, S., Grishman, R.: Automatic pattern acquisition for Japanese information extraction. In: HLT 2001(2001)
Google Scholar
Thompson, C.A., Califf, M.E., Mooney, R.J.: Active learning for natural language parsing and information extraction. In: Proc. 16th International Conf. on Machine Learning, pp. 406–414 (1999)
Google Scholar
Zavrel, J., Daelemans, W.: Feature-rich memory-based classification for shallow NLP and information extraction. In: Franke, J., Nakhaeizadeh, G., Renz, I. (eds.) Text Mining, Theoretical Aspects and Applications, pp. 33–54. Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Database and Information Systems Group, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
Christian Siefkes & Peter Siniakov
Berlin-Brandenburg Graduate School in Distributed Information Systems,
Christian Siefkes

Authors

Christian Siefkes
View author publications
You can also search for this author in PubMed Google Scholar
Peter Siniakov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EPFL-IC-IIF-LBD, Station 14 - INJ 236, 1015, Lausanne, Switzerland
Stefano Spaccapietra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siefkes, C., Siniakov, P. (2005). An Overview and Classification of Adaptive Approaches to Information Extraction. In: Spaccapietra, S. (eds) Journal on Data Semantics IV. Lecture Notes in Computer Science, vol 3730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11603412_6

Download citation

DOI: https://doi.org/10.1007/11603412_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31001-3
Online ISBN: 978-3-540-31447-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Overview and Classification of Adaptive Approaches to Information Extraction