Can we make Information Extraction more adaptive?

Wilks, Yorick; Catizone, Roberta

doi:10.1007/978-1-4471-0745-3_1

Yorick Wilks⁴ &
Roberta Catizone⁴

72 Accesses
1 Citations

Abstract

It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains.

In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the bench-mark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have pre-existing templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus.

We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user’s needs with feedback at an interface can be transferred to IE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Aberdeen, J. Burger, D. Day, L. Hirschman, P. Robinson, and M. Vilain. MITRE — Description of the AlembicSystem used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 141–156, 1995.
Chapter Google Scholar
S. Azzam, K. Humphreys., and R. Gaizauskas. Using corefernece chains for text summarization. In Proceedings of the ACL’99 Workshop on Corefernce and its Applications, Maryland, 1999.
Google Scholar
R. Basili, M. Pazienza, and P. Velardi. Aquisition of selectional patterns from sub-langauges. Machine Translation, 8, 1993.
Google Scholar
R. Catizone Basili, R. Catizone Basili M.T. Pazienza, M. Stevenson, P. Velardi, M. Vindigni, and Y. Wilks. An empirical approach to lexical tuning. In Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, LREC, First International Conference on Language Resources and Evaluation, Granada, Spain, 1998.
Google Scholar
D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a High- Performance Learning Name-finder. In Proceedings of the Fifth conference on Applied Natural Language Processing, 1997.
Google Scholar
D.G. Bobrow and T. Winograd. An overview of krl, a knowledge representation language. Cognitive Science 1, pages 3–46, 1977.
Article Google Scholar
A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Description of the mene named entity system as used in muc-7 muc-7. In Proceedings of the MUC-7 Conference, NYU. Proceedings available at http://muc.www.saic.com/.
Google Scholar
E. Brill. Some Advances in Transformation-Based Part of Speech Tagging. In Proceedings ofthe Twelfth National Conference on AI (AAAI-94), Seattle, Washington, 1994.
Google Scholar
E. Brill . Transformation-Based Error-Driven Learning and Natural Language. Computational Linguistics, 21(4), December 1995
Google Scholar
T.. Briscoe, A. Copestake, and V. De Pavia. Default inheritance in unification-based approaches to the lexicon. Technical report, Cambridge University Computer Laboratory, 1991.
Google Scholar
R. Bruce and L. Guthrie. Genus disambiguation: A study in weighted preference. In Proceesings of COLING-92, pages 1187–1191, Nantes, Prance, 1992.
Google Scholar
P. Buitelaar. A lexicon for underspecified semantic tagging. In Proceedings of the ACL-Siglex Workshop on Tagging Text with Lexical Semantics, Washington, D.C., 1997.
Google Scholar
Claire Cardie. Empirical methods in information extraction. AI Magazine, 18(4), 1997. Special Issue on Empirical Natural Language Processing.
Google Scholar
N. Chinchor . The statistical significance of the MUC-5 results. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 79–83. Morgan Kaufmann, 1993.
Chapter Google Scholar
N. Chinchor and Sundheim B. MUC-5 Evaluation Metrics. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 69–78. Morgan Kaufmann, 1993.
Chapter Google Scholar
N. Chinchor, L. Hirschman, and D.D. Lewis. Evaluating message understanding systems: An analysis of the third message understanding conference (muc-3). Computational Linguistics, 19 (3): 409–449, 1993.
Google Scholar
R. Collier. Automatic Template Creation for Information Extraction. PhD thesis, UK, 1998.
Google Scholar
J. Cowie, L. Guthrie, W. Jin, W. Odgen, J. Pustejowsky, R. Wanf, T. Wakao, S. Waterman, and Y. Wilks. CRL/Brandeis: The Diderot System. In Proceedings of Tipster Text Program (Phase I). Morgan Kaufmann, 1993.
Google Scholar
J. Cowie and W. Lehnert. Information extraction. Special NLP Issue of the Communications of the ACM, 1996.
Google Scholar
H. Cunningham . JAPE — a Jolly Advanced Pattern Engine. 1997.
Google Scholar
H. Cunningham, S. Azzam, and Y. Wilks. Domain Modelling for AVENTINUS (WP 4.2). LE project LE1-2238 AVENTINUS internal technical report, University of Sheffield, UK, 1996.
Google Scholar
H. Cunningham, R.G. Gaizauskas, and Y. Wilks. A General Architecture for Text Engineering (GATE) — a new approach to Language Engineering R&D. Technical Report CS — 95 — 21, Department of Computer Science, University of Sheffield, 1995. Also available as http://xxx.lanl.gov/ps/cmp-lg/9601009.
Google Scholar
W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg memory based learner version 1.0. Technical report, ILK Technical Report 98–03, 1998.
Google Scholar
D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson, and M. Vilain. Mixed-Initiative Development of Language Processing Systems. In Proceedings of the 5th Conference on Applied NLP Systems (ANLP-97), 1997.
Google Scholar
R. Evans and G. Gazdar. DATR: A Language for Lexical Knowledge Representation. Computational Linguistics, 22 (2): 167–216, 1996.
Google Scholar
R. Gaizauskas . XI: A Knowledge Representation Language Based on Cross-Classification and Inheritance. Technical Report CS-95-24, Department of Computer Science, University of Sheffield, 1995.
Google Scholar
R. Gaizauskas and Y. Wilks. Information Extraction: Beyond Document Retrieval. Journal of Documentation, 1997. In press (Also available as Technical Report CS-97-10).
Google Scholar
G. Gazdar and C. Mellish. Natural Language Processing in Prolog. Addison-Wesley, 1989.
Google Scholar
T. Givon . Transformations of ellipsis, sense development and rules of lexical derivation. Technical Report SP-2896, Systems Development Corp., Sta Monica, CA, 1967.
Google Scholar
R. Grishman . Information extraction: Techniques and challenges. In M-T. Pazienza, editor, Proceedings of the Summer School on Information Extraction (SCIE-97), LNCS/LNAI. Springer-Verlag, 1997.
Google Scholar
R. Grishman and J. Sterling. Generalizing automatically generated patterns. In Proceedings of COLING-92, 1992.
Google Scholar
R. Grishman and J. Sterling. Description of the Proteus system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 181–194. Morgan Kaufmann, 1993.
Chapter Google Scholar
G. Hirst. Semantic Interpretation and the Resolution of Ambiguity. CUP, Cambridge, England, 1987.
Book Google Scholar
J.R. Hobbs . The generic information extraction system. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 87–91. Morgan Kaufman, 1993.
Chapter Google Scholar
W.J. Hutchins. Machine Translation: past, present, future. Chichester: Ellis Horwood, 1986.
Google Scholar
Stephen Muggleton James Cussens, David Page and Ashwin Srinivasan. Using inductive logic programming for natural language processing. In Proceedings of in ECML, pages 25–34, Prague, 1997. Springer-Verlag. Workshop Notes on Empirical Learning of Natural Language Tasks.
Google Scholar
H. Khosravi and Y. Wilks. Extracting pragmatic content from e-mail. Journal of Natural Language Engineering, 1997. submitted.
Google Scholar
R. Krovetz and B. Croft. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10 (2): 115 – 141, 1992.
Article Google Scholar
S. Carberry K. Samuel and K. Vijay-Shanker. Dialogue act tagging with transofrmation-based learning. In Proceedings of the COLING-ACL 1998 Conference, volume 2, pages 1150–1156, Montreal, Canada, 1998.
Google Scholar
W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, and E. Riloff. University of massachusetts: Description of the CIRCUS system as used for MUC-4. In Proceedings of the Fourth Message Understanding Conference MUC-4, pages 282–288. Morgan Kaufmann, 1992.
Chapter Google Scholar
B. Levin . English Verb Calsses and Alternations. Chicago, II, 1993.
Google Scholar
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1: 309 – 317, 1957.
Article MathSciNet Google Scholar
R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicz, R. Collingham, M. Costantino, and C. Cooper. Description of the LOLITA System as used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 71–86, San Francisco, 1995. Morgan Kaufmann.
Chapter Google Scholar
S. Muggleton. Recent advances in inductive logic programming. In Proc. 7th Annu. ACM Workshop on Comput. Learning Theory, pages 3 – 11. ACM Press, New York, NY, 1994.
Google Scholar
J. Pustejovsky . The Generative Lexicon. MIT, 1995.
Google Scholar
J. Pustejovsky and P. Anick. Autmoatically acquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1988.
Google Scholar
Nirenburg S. and V. Raskin. Ten choices for lexical semantics. Technical report, Computing Research Lab, Las Cruces, NM, 1996. MCCS-96-304.
Google Scholar
E. Riloff. Automatically contructing a dictionary for information extraction tasks. In Proceedings of Eleventh National Conference on Artificial Intelligence, 1993.
Google Scholar
E. Riloff and W. Lehnert. Automated dictionary construction for information extraction from text. In Proceedings of Ninth IEEE Conference on Artificial Intelligence for Applications, pages 93–99, 1993.
Chapter Google Scholar
E. Riloff and J. Shoen. Automatically aquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1995.
Google Scholar
E. Roche and Y. Schabes. Deterministic Part-of-Speech Tagging with Finite-State Transducers. Computational Linguistics, 21 (2): 227 – 254, June 1995.
Google Scholar
S. Small and C. Rieger. Parsing and comprehending with word experts (a theory and it’s realiastion). In W. Lehnert and M. Ringle, editors, Strategies for Natural Language Processing. Lawrence Erlbaum Associates, Hillsdale, NJ, 1982.
Google Scholar
Jin Wang T. Strzalkowski, Fang Lin and Jose Perez-Caballo. Natural Language Information Retrieval, chapter Evaluating Natural Language Processing Techniques in Information Retrieval, pages 113–146. Kluwer Academic Publishers, 1997.
Google Scholar
Mark Vilain . Validation of terminological inference in an information extraction task. In Proceedings of the 1993 ARPA Human Language Workshop, 1993.
Google Scholar
Y. Wilks . Grammar, Meaning and the Machine Analysis of Meaning. Routledge and Kegan Paul, 1972.
Google Scholar
Y. Wilks, L. Guthrie, J. Guthrie, and J. Cowie. Combining Weak Methods in Large-Scale Text Processing, in Jacobs 1992, Text-Based Intelligent Systems. Lawrence Erlbaum, 1992
Google Scholar
Y. Wilks and M. Stevenson. Sense tagging: Semantic tagging with a lexicon. In Proceedings of the SIGLEX Workshop “Tagging Text with Lexical Semantics: What, why and how?”, Washington, D.C., April 1997. Available as http://xxx.lanl.gov/ps/cmp-lg/9705016.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science Regent Court, The University of Sheffield, 211 Portobello Street, Sheffield, UK
Yorick Wilks & Roberta Catizone

Authors

Yorick Wilks
View author publications
You can also search for this author in PubMed Google Scholar
Roberta Catizone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng
International Teledemocracy Centre, Napier University, Edinburgh, UK
Ann Macintosh BSc, CEng
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen PhD

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilks, Y., Catizone, R. (2000). Can we make Information Extraction more adaptive?. In: Bramer, M., Macintosh, A., Coenen, F. (eds) Research and Development in Intelligent Systems XVI. Springer, London. https://doi.org/10.1007/978-1-4471-0745-3_1

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0745-3_1
Publisher Name: Springer, London
Print ISBN: 978-1-85233-231-0
Online ISBN: 978-1-4471-0745-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics