Skip to main content

Handling Conjunctions in Named Entities

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4394))

Abstract

Although the literature contains reports of very high accuracy figures for the recognition of named entities in text, there are still some named entity phenomena that remain problematic for existing text processing systems. One of these is the ambiguity of conjunctions in candidate named entity strings, an all-too-prevalent problem in corporate and legal documents. In this paper, we distinguish four uses of the conjunction in these strings, and explore the use of a supervised machine learning approach to conjunction disambiguation trained on a very limited set of ‘name internal’ features that avoids the need for expensive lexical or semantic resources. We achieve 84% correctly classified examples using k-fold evaluation on a data set of 600 instances. Further improvements are likely to require the use of wider domain knowledge and name external features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grishman, R., Sundheim, B.: Design of the MUC-6 Evaluation. In: Proceedings of Sixth Message Understanding Conference (MUC-6), Columbia, Maryland, November 6-8, Morgan Kaufmann, Los Altos (1995)

    Google Scholar 

  2. Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, Morgan Kaufmann, Los Altos (1996)

    Google Scholar 

  3. Sang, E.F.T.K.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan, pp. 155–158 (2002)

    Google Scholar 

  4. Sang, E.F.T.K., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of the 7th Conference on Natural Language Learning, Edmonton, Canada, pp. 142–147 (2003)

    Google Scholar 

  5. Rau, L.F.: Extracting company names from text. In: Proceedings of the Seventh Conference on Artificial Intelligence Applications, February 1991, pp. 189–194. IEEE Computer Society Press, Los Alamitos (1991)

    Google Scholar 

  6. Coates-Stephens, S.: The analysis and acquisition of proper names for the understanding of free text. Computers and the Humanities V26(5), 441–456 (1992), http://dx.doi.org/10.1007/BF00136985

    Article  Google Scholar 

  7. McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus processing for lexical acquisition, pp. 21–39 (1996)

    Google Scholar 

  8. Mikheev, A., Grover, C., Moens, M.: Description of the LTG System Used for MUC-7. In: Proc. of MUC-7 Conf. (1998)

    Google Scholar 

  9. McDonald, R., Crammer, K., Pereira, F.: Flexible text segmentation with structured multilabel classification. In: EMNLP (2005)

    Google Scholar 

  10. Solorio, T.: Improvement of Named Entity Tagging by Machine Learning. Technical Report CCC-04-004, Coordinación de Ciencias Computacionales (2004)

    Google Scholar 

  11. Steedman, M.: Dependency and Coordination in the Grammar of Dutch and English. Language 61, 523–568 (1985)

    Article  Google Scholar 

  12. Dale, R., Calvo, R., Tilbrook, M.: Key Element Summarisation: Extracting Information from Company Announcements. In: Proc. of the 17th Australian Joint Conf. on AI, Australia, 7th-10th Dec. (2004)

    Google Scholar 

  13. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  14. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  15. Rojas, R.: Neural networks: a systematic introduction. Springer, New York (1996)

    Book  MATH  Google Scholar 

  16. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  17. Cleary, J.G., Trigg, L.E.: K*: An Instance-based Learner Using an Entropic Distance Measure. In: Proceedings of the 12th International Conference on Machine Learning, pp. 108–114. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  18. Landwehr, N., Hall, M., Frank, E.: Logistic Model Trees. Machine Learning 59(1/2), 161–205 (2005)

    Article  MATH  Google Scholar 

  19. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  20. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods: Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dale, R., Mazur, P. (2007). Handling Conjunctions in Named Entities. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70939-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70938-1

  • Online ISBN: 978-3-540-70939-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics