Skip to main content

Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages

  • Chapter
Book cover Advances in Computational Intelligence and Learning

Abstract

This work compares two alternative approaches to the problem of acquiring named-entity recognition and classification systems from training corpora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, effective methods to acquire such systems automatically from data are very desirable. In this paper we compare two popular learning methods on this task: a decision-tree induction method and a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate data representation for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on large corpora of English and Greek texts and present the results. In addition to the good performance of both methods, one very interesting result is the fact that a simple representation of the data, which ignores the order of the words within a named entity, leads to improved results over a more complex approach that preserves word order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bennett, S.W., Aone, C. and Lovell, C. “Learning to Tag Multilingual Texts Through Observation.” In Proceedings of the Second Conference on Empirical Methods in NLP, pp. 109–116, 1997.

    Google Scholar 

  • Bikel D., Miller S., Schwartz R. and Weischedel R., 1997, Nymble: a High-Performance Learning Name-finder. Proc. of 5th Conference on Applied Natural Language Processing, Washington.

    Google Scholar 

  • Borthwick A., Sterling J., Agichten E. and Grishman R., 1998, NYU: Description of the MENE named Entity system as Used in MUC-7. Proc. of MUC-7.

    Google Scholar 

  • Clark P. and Niblett T., 1989, The CN2 algorithm. Machine Learning, 3(4), pp. 261–283.

    Google Scholar 

  • Cowie J., 1995, Description of the CRL/NMSU System Used for MUC-6. Proceedings of MUC-6.

    Google Scholar 

  • Cucchiarelli A. and Velardi P., 1998, Finding a Domain-Appropriate Sense Inventory for Semantically Tagging a Corpus. Int. Journal on Natural Language Engineering.

    Google Scholar 

  • Cucchiarelli A. and Velardi P, 1998, Using Corpus Evidence for Automatic Gazetteer Extension. Proc. of Conf on Language Resources and Evaluation, Granada, Spain, 28–30 May 1998.

    Google Scholar 

  • DARPA (Defense Advanced Research Projects Agency), 1995. Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufmann.

    Google Scholar 

  • Day, D., Robinson, P., Vilain, M., and Yeh, A, 1998, Description of the ALEMBIC system as used for MUC-7. Proc. of MUC-7.

    Google Scholar 

  • Farmakiotou D., Karkaletsis V., Koutsias J., Sigletos G., Spyropoulos CD and Stamatopoulos P., 2000, Rule-based Named Entity Recognition for Greek Financial Texts. In Proceedings of the Workshop on Computational lexicography and Multimedia Dictionaries (COMLEX 2000), pp. 75–78, Greece, September 22–23.

    Google Scholar 

  • Hornik K., Stinchcombe M., and White H., 1989, Multilayer feedforward networks are universal approximators. Neural Networks, vol. 2, pp.359–366.

    Article  Google Scholar 

  • Humphreys K., Gaizauskas R., Cunningham H., and Azzam S., 1997, VIE Technical Specifications. Department of Computer Science, University of Sheffield.

    Google Scholar 

  • Karras D. A. and Perantonis S. J., 1995, An efficient constrained training algorithm for feedforward networks, IEEE Transactions on Neural Networks, 6, 1420–1434.

    Article  Google Scholar 

  • Kohonen T., 1989, Self organisation and associative memory. 3rd edition, Springer-Verlag, Berlin.

    Google Scholar 

  • Langley P., 1987, Machine learning and Grammar induction. Machine Learning, v. 2, pp. 5–8.

    Article  Google Scholar 

  • Langley P. & Stromsten, S., 2000, Learning context-free grammars with a simplicity bias. In Proceedings of the Eleventh European Conference on Machine Learning. Barcelona: Springer-Verlag.

    Google Scholar 

  • Lari K. and Young S. J., 1990, The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech and Language, 4.

    Google Scholar 

  • Mannes C, 1993, Self-organising grammar induction using a neural network model. In New trends in neural computation, Lecture Notes in Computer Science, 686, eds. J. Mira et al. pp. 198–203.

    Google Scholar 

  • McDonald D., 1996, Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In B. Boguraev & J. Pustejovski (eds.) Corpus Processing for Lexical Acquisition, MIT Press, pp 21–39.

    Google Scholar 

  • Michalski R. S., Mozetic I., Hong J. and Lavrac N., 1986, The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of the National Conference on Artificial Intelligence, pp. 1041–1045

    Google Scholar 

  • Petasis G., Cucchiarelli A., Velardi P., Paliouras G., Karkaletsis V., Spyropoulos CD., 2000, Automatic adaptation of Proper Noun Dictionaries through cooperation of machine learning and probabilistic methods. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in information Retrieval, July 24–28, Athens.

    Google Scholar 

  • Perantonis S. J., Ampazis N. and Virvilis V., A Constrained Optimization Framework for Feedforward Neural Networks, Annals of Operations Research, in print.

    Google Scholar 

  • Quinlan, J. R., 1983, Learning Efficient Classification Procedures and Their Application to Chess End Games, In Machine learning: an artificial intelligence approach, eds. Michalski, R.S., Carbonell, J.G. and Mitchell, T.M., Kaufmann, Palo Alto, CA, pp. 463–482.

    Google Scholar 

  • Quinlan, J. R., 1993, C4.5: Programs for machine learning, Morgan-Kaufmann, CA.

    Google Scholar 

  • Sekine S., 1998, NYU System for Japanese NE-MET2. Proceedings of MUC-7.

    Google Scholar 

  • Vilain, M., and Day, D., 1996, Finite-state phrase parsing by rule sequences. Proceedings of COLING-96. vol. 1, pp. 274–279.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hans-Jürgen Zimmermann Georgios Tselentis Maarten van Someren Georgios Dounias

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Petasis, G., Petridis, S., Paliouras, G., Karkaletsis, V., Perantonis, S.J., Spyropoulos, C.D. (2002). Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages. In: Zimmermann, HJ., Tselentis, G., van Someren, M., Dounias, G. (eds) Advances in Computational Intelligence and Learning. International Series in Intelligent Technologies, vol 18. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0324-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0324-7_14

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-3872-0

  • Online ISBN: 978-94-010-0324-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics