Skip to main content

Can we make Information Extraction more adaptive?

  • Conference paper
Research and Development in Intelligent Systems XVI

Abstract

It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains.

In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the bench-mark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have pre-existing templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus.

We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user’s needs with feedback at an interface can be transferred to IE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Aberdeen, J. Burger, D. Day, L. Hirschman, P. Robinson, and M. Vilain. MITRE — Description of the AlembicSystem used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 141–156, 1995.

    Chapter  Google Scholar 

  2. S. Azzam, K. Humphreys., and R. Gaizauskas. Using corefernece chains for text summarization. In Proceedings of the ACL’99 Workshop on Corefernce and its Applications, Maryland, 1999.

    Google Scholar 

  3. R. Basili, M. Pazienza, and P. Velardi. Aquisition of selectional patterns from sub-langauges. Machine Translation, 8, 1993.

    Google Scholar 

  4. R. Catizone Basili, R. Catizone Basili M.T. Pazienza, M. Stevenson, P. Velardi, M. Vindigni, and Y. Wilks. An empirical approach to lexical tuning. In Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, LREC, First International Conference on Language Resources and Evaluation, Granada, Spain, 1998.

    Google Scholar 

  5. D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a High- Performance Learning Name-finder. In Proceedings of the Fifth conference on Applied Natural Language Processing, 1997.

    Google Scholar 

  6. D.G. Bobrow and T. Winograd. An overview of krl, a knowledge representation language. Cognitive Science 1, pages 3–46, 1977.

    Article  Google Scholar 

  7. A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Description of the mene named entity system as used in muc-7 muc-7. In Proceedings of the MUC-7 Conference, NYU. Proceedings available at http://muc.www.saic.com/.

    Google Scholar 

  8. E. Brill. Some Advances in Transformation-Based Part of Speech Tagging. In Proceedings ofthe Twelfth National Conference on AI (AAAI-94), Seattle, Washington, 1994.

    Google Scholar 

  9. E. Brill . Transformation-Based Error-Driven Learning and Natural Language. Computational Linguistics, 21(4), December 1995

    Google Scholar 

  10. T.. Briscoe, A. Copestake, and V. De Pavia. Default inheritance in unification-based approaches to the lexicon. Technical report, Cambridge University Computer Laboratory, 1991.

    Google Scholar 

  11. R. Bruce and L. Guthrie. Genus disambiguation: A study in weighted preference. In Proceesings of COLING-92, pages 1187–1191, Nantes, Prance, 1992.

    Google Scholar 

  12. P. Buitelaar. A lexicon for underspecified semantic tagging. In Proceedings of the ACL-Siglex Workshop on Tagging Text with Lexical Semantics, Washington, D.C., 1997.

    Google Scholar 

  13. Claire Cardie. Empirical methods in information extraction. AI Magazine, 18(4), 1997. Special Issue on Empirical Natural Language Processing.

    Google Scholar 

  14. N. Chinchor . The statistical significance of the MUC-5 results. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 79–83. Morgan Kaufmann, 1993.

    Chapter  Google Scholar 

  15. N. Chinchor and Sundheim B. MUC-5 Evaluation Metrics. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 69–78. Morgan Kaufmann, 1993.

    Chapter  Google Scholar 

  16. N. Chinchor, L. Hirschman, and D.D. Lewis. Evaluating message understanding systems: An analysis of the third message understanding conference (muc-3). Computational Linguistics, 19 (3): 409–449, 1993.

    Google Scholar 

  17. R. Collier. Automatic Template Creation for Information Extraction. PhD thesis, UK, 1998.

    Google Scholar 

  18. J. Cowie, L. Guthrie, W. Jin, W. Odgen, J. Pustejowsky, R. Wanf, T. Wakao, S. Waterman, and Y. Wilks. CRL/Brandeis: The Diderot System. In Proceedings of Tipster Text Program (Phase I). Morgan Kaufmann, 1993.

    Google Scholar 

  19. J. Cowie and W. Lehnert. Information extraction. Special NLP Issue of the Communications of the ACM, 1996.

    Google Scholar 

  20. H. Cunningham . JAPE — a Jolly Advanced Pattern Engine. 1997.

    Google Scholar 

  21. H. Cunningham, S. Azzam, and Y. Wilks. Domain Modelling for AVENTINUS (WP 4.2). LE project LE1-2238 AVENTINUS internal technical report, University of Sheffield, UK, 1996.

    Google Scholar 

  22. H. Cunningham, R.G. Gaizauskas, and Y. Wilks. A General Architecture for Text Engineering (GATE) — a new approach to Language Engineering R&D. Technical Report CS — 95 — 21, Department of Computer Science, University of Sheffield, 1995. Also available as http://xxx.lanl.gov/ps/cmp-lg/9601009.

    Google Scholar 

  23. W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg memory based learner version 1.0. Technical report, ILK Technical Report 98–03, 1998.

    Google Scholar 

  24. D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson, and M. Vilain. Mixed-Initiative Development of Language Processing Systems. In Proceedings of the 5th Conference on Applied NLP Systems (ANLP-97), 1997.

    Google Scholar 

  25. R. Evans and G. Gazdar. DATR: A Language for Lexical Knowledge Representation. Computational Linguistics, 22 (2): 167–216, 1996.

    Google Scholar 

  26. R. Gaizauskas . XI: A Knowledge Representation Language Based on Cross-Classification and Inheritance. Technical Report CS-95-24, Department of Computer Science, University of Sheffield, 1995.

    Google Scholar 

  27. R. Gaizauskas and Y. Wilks. Information Extraction: Beyond Document Retrieval. Journal of Documentation, 1997. In press (Also available as Technical Report CS-97-10).

    Google Scholar 

  28. G. Gazdar and C. Mellish. Natural Language Processing in Prolog. Addison-Wesley, 1989.

    Google Scholar 

  29. T. Givon . Transformations of ellipsis, sense development and rules of lexical derivation. Technical Report SP-2896, Systems Development Corp., Sta Monica, CA, 1967.

    Google Scholar 

  30. R. Grishman . Information extraction: Techniques and challenges. In M-T. Pazienza, editor, Proceedings of the Summer School on Information Extraction (SCIE-97), LNCS/LNAI. Springer-Verlag, 1997.

    Google Scholar 

  31. R. Grishman and J. Sterling. Generalizing automatically generated patterns. In Proceedings of COLING-92, 1992.

    Google Scholar 

  32. R. Grishman and J. Sterling. Description of the Proteus system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 181–194. Morgan Kaufmann, 1993.

    Chapter  Google Scholar 

  33. G. Hirst. Semantic Interpretation and the Resolution of Ambiguity. CUP, Cambridge, England, 1987.

    Book  Google Scholar 

  34. J.R. Hobbs . The generic information extraction system. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 87–91. Morgan Kaufman, 1993.

    Chapter  Google Scholar 

  35. W.J. Hutchins. Machine Translation: past, present, future. Chichester: Ellis Horwood, 1986.

    Google Scholar 

  36. Stephen Muggleton James Cussens, David Page and Ashwin Srinivasan. Using inductive logic programming for natural language processing. In Proceedings of in ECML, pages 25–34, Prague, 1997. Springer-Verlag. Workshop Notes on Empirical Learning of Natural Language Tasks.

    Google Scholar 

  37. H. Khosravi and Y. Wilks. Extracting pragmatic content from e-mail. Journal of Natural Language Engineering, 1997. submitted.

    Google Scholar 

  38. R. Krovetz and B. Croft. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10 (2): 115 – 141, 1992.

    Article  Google Scholar 

  39. S. Carberry K. Samuel and K. Vijay-Shanker. Dialogue act tagging with transofrmation-based learning. In Proceedings of the COLING-ACL 1998 Conference, volume 2, pages 1150–1156, Montreal, Canada, 1998.

    Google Scholar 

  40. W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, and E. Riloff. University of massachusetts: Description of the CIRCUS system as used for MUC-4. In Proceedings of the Fourth Message Understanding Conference MUC-4, pages 282–288. Morgan Kaufmann, 1992.

    Chapter  Google Scholar 

  41. B. Levin . English Verb Calsses and Alternations. Chicago, II, 1993.

    Google Scholar 

  42. H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1: 309 – 317, 1957.

    Article  MathSciNet  Google Scholar 

  43. R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicz, R. Collingham, M. Costantino, and C. Cooper. Description of the LOLITA System as used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 71–86, San Francisco, 1995. Morgan Kaufmann.

    Chapter  Google Scholar 

  44. S. Muggleton. Recent advances in inductive logic programming. In Proc. 7th Annu. ACM Workshop on Comput. Learning Theory, pages 3 – 11. ACM Press, New York, NY, 1994.

    Google Scholar 

  45. J. Pustejovsky . The Generative Lexicon. MIT, 1995.

    Google Scholar 

  46. J. Pustejovsky and P. Anick. Autmoatically acquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1988.

    Google Scholar 

  47. Nirenburg S. and V. Raskin. Ten choices for lexical semantics. Technical report, Computing Research Lab, Las Cruces, NM, 1996. MCCS-96-304.

    Google Scholar 

  48. E. Riloff. Automatically contructing a dictionary for information extraction tasks. In Proceedings of Eleventh National Conference on Artificial Intelligence, 1993.

    Google Scholar 

  49. E. Riloff and W. Lehnert. Automated dictionary construction for information extraction from text. In Proceedings of Ninth IEEE Conference on Artificial Intelligence for Applications, pages 93–99, 1993.

    Chapter  Google Scholar 

  50. E. Riloff and J. Shoen. Automatically aquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1995.

    Google Scholar 

  51. E. Roche and Y. Schabes. Deterministic Part-of-Speech Tagging with Finite-State Transducers. Computational Linguistics, 21 (2): 227 – 254, June 1995.

    Google Scholar 

  52. S. Small and C. Rieger. Parsing and comprehending with word experts (a theory and it’s realiastion). In W. Lehnert and M. Ringle, editors, Strategies for Natural Language Processing. Lawrence Erlbaum Associates, Hillsdale, NJ, 1982.

    Google Scholar 

  53. Jin Wang T. Strzalkowski, Fang Lin and Jose Perez-Caballo. Natural Language Information Retrieval, chapter Evaluating Natural Language Processing Techniques in Information Retrieval, pages 113–146. Kluwer Academic Publishers, 1997.

    Google Scholar 

  54. Mark Vilain . Validation of terminological inference in an information extraction task. In Proceedings of the 1993 ARPA Human Language Workshop, 1993.

    Google Scholar 

  55. Y. Wilks . Grammar, Meaning and the Machine Analysis of Meaning. Routledge and Kegan Paul, 1972.

    Google Scholar 

  56. Y. Wilks, L. Guthrie, J. Guthrie, and J. Cowie. Combining Weak Methods in Large-Scale Text Processing, in Jacobs 1992, Text-Based Intelligent Systems. Lawrence Erlbaum, 1992

    Google Scholar 

  57. Y. Wilks and M. Stevenson. Sense tagging: Semantic tagging with a lexicon. In Proceedings of the SIGLEX Workshop “Tagging Text with Lexical Semantics: What, why and how?”, Washington, D.C., April 1997. Available as http://xxx.lanl.gov/ps/cmp-lg/9705016.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag London Limited

About this paper

Cite this paper

Wilks, Y., Catizone, R. (2000). Can we make Information Extraction more adaptive?. In: Bramer, M., Macintosh, A., Coenen, F. (eds) Research and Development in Intelligent Systems XVI. Springer, London. https://doi.org/10.1007/978-1-4471-0745-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0745-3_1

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-231-0

  • Online ISBN: 978-1-4471-0745-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics