Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages

Petasis, G.; Petridis, S.; Paliouras, G.; Karkaletsis, V.; Perantonis, S. J.; Spyropoulos, C. D.

doi:10.1007/978-94-010-0324-7_14

G. Petasis²,
S. Petridis²,
G. Paliouras²,
V. Karkaletsis²,
S. J. Perantonis² &
…
C. D. Spyropoulos²

Part of the book series: International Series in Intelligent Technologies ((ISIT,volume 18))

342 Accesses
1 Citations

Abstract

This work compares two alternative approaches to the problem of acquiring named-entity recognition and classification systems from training corpora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, effective methods to acquire such systems automatically from data are very desirable. In this paper we compare two popular learning methods on this task: a decision-tree induction method and a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate data representation for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on large corpora of English and Greek texts and present the results. In addition to the good performance of both methods, one very interesting result is the fact that a simple representation of the data, which ignores the order of the words within a named entity, leads to improved results over a more complex approach that preserves word order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, S.W., Aone, C. and Lovell, C. “Learning to Tag Multilingual Texts Through Observation.” In Proceedings of the Second Conference on Empirical Methods in NLP, pp. 109–116, 1997.
Google Scholar
Bikel D., Miller S., Schwartz R. and Weischedel R., 1997, Nymble: a High-Performance Learning Name-finder. Proc. of 5th Conference on Applied Natural Language Processing, Washington.
Google Scholar
Borthwick A., Sterling J., Agichten E. and Grishman R., 1998, NYU: Description of the MENE named Entity system as Used in MUC-7. Proc. of MUC-7.
Google Scholar
Clark P. and Niblett T., 1989, The CN2 algorithm. Machine Learning, 3(4), pp. 261–283.
Google Scholar
Cowie J., 1995, Description of the CRL/NMSU System Used for MUC-6. Proceedings of MUC-6.
Google Scholar
Cucchiarelli A. and Velardi P., 1998, Finding a Domain-Appropriate Sense Inventory for Semantically Tagging a Corpus. Int. Journal on Natural Language Engineering.
Google Scholar
Cucchiarelli A. and Velardi P, 1998, Using Corpus Evidence for Automatic Gazetteer Extension. Proc. of Conf on Language Resources and Evaluation, Granada, Spain, 28–30 May 1998.
Google Scholar
DARPA (Defense Advanced Research Projects Agency), 1995. Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufmann.
Google Scholar
Day, D., Robinson, P., Vilain, M., and Yeh, A, 1998, Description of the ALEMBIC system as used for MUC-7. Proc. of MUC-7.
Google Scholar
Farmakiotou D., Karkaletsis V., Koutsias J., Sigletos G., Spyropoulos CD and Stamatopoulos P., 2000, Rule-based Named Entity Recognition for Greek Financial Texts. In Proceedings of the Workshop on Computational lexicography and Multimedia Dictionaries (COMLEX 2000), pp. 75–78, Greece, September 22–23.
Google Scholar
Hornik K., Stinchcombe M., and White H., 1989, Multilayer feedforward networks are universal approximators. Neural Networks, vol. 2, pp.359–366.
Article Google Scholar
Humphreys K., Gaizauskas R., Cunningham H., and Azzam S., 1997, VIE Technical Specifications. Department of Computer Science, University of Sheffield.
Google Scholar
Karras D. A. and Perantonis S. J., 1995, An efficient constrained training algorithm for feedforward networks, IEEE Transactions on Neural Networks, 6, 1420–1434.
Article Google Scholar
Kohonen T., 1989, Self organisation and associative memory. 3^rd edition, Springer-Verlag, Berlin.
Google Scholar
Langley P., 1987, Machine learning and Grammar induction. Machine Learning, v. 2, pp. 5–8.
Article Google Scholar
Langley P. & Stromsten, S., 2000, Learning context-free grammars with a simplicity bias. In Proceedings of the Eleventh European Conference on Machine Learning. Barcelona: Springer-Verlag.
Google Scholar
Lari K. and Young S. J., 1990, The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech and Language, 4.
Google Scholar
Mannes C, 1993, Self-organising grammar induction using a neural network model. In New trends in neural computation, Lecture Notes in Computer Science, 686, eds. J. Mira et al. pp. 198–203.
Google Scholar
McDonald D., 1996, Internal and External Evidence in the Identification and Semantic Categorization of Proper Names. In B. Boguraev & J. Pustejovski (eds.) Corpus Processing for Lexical Acquisition, MIT Press, pp 21–39.
Google Scholar
Michalski R. S., Mozetic I., Hong J. and Lavrac N., 1986, The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of the National Conference on Artificial Intelligence, pp. 1041–1045
Google Scholar
Petasis G., Cucchiarelli A., Velardi P., Paliouras G., Karkaletsis V., Spyropoulos CD., 2000, Automatic adaptation of Proper Noun Dictionaries through cooperation of machine learning and probabilistic methods. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in information Retrieval, July 24–28, Athens.
Google Scholar
Perantonis S. J., Ampazis N. and Virvilis V., A Constrained Optimization Framework for Feedforward Neural Networks, Annals of Operations Research, in print.
Google Scholar
Quinlan, J. R., 1983, Learning Efficient Classification Procedures and Their Application to Chess End Games, In Machine learning: an artificial intelligence approach, eds. Michalski, R.S., Carbonell, J.G. and Mitchell, T.M., Kaufmann, Palo Alto, CA, pp. 463–482.
Google Scholar
Quinlan, J. R., 1993, C4.5: Programs for machine learning, Morgan-Kaufmann, CA.
Google Scholar
Sekine S., 1998, NYU System for Japanese NE-MET2. Proceedings of MUC-7.
Google Scholar
Vilain, M., and Day, D., 1996, Finite-state phrase parsing by rule sequences. Proceedings of COLING-96. vol. 1, pp. 274–279.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics and Telecommunications, National Centre for Scientific Research “Demokritos”, 153 10 Ag. Paraskevi, Athens, Greece
G. Petasis, S. Petridis, G. Paliouras, V. Karkaletsis, S. J. Perantonis & C. D. Spyropoulos

Authors

G. Petasis
View author publications
You can also search for this author in PubMed Google Scholar
S. Petridis
View author publications
You can also search for this author in PubMed Google Scholar
G. Paliouras
View author publications
You can also search for this author in PubMed Google Scholar
V. Karkaletsis
View author publications
You can also search for this author in PubMed Google Scholar
S. J. Perantonis
View author publications
You can also search for this author in PubMed Google Scholar
C. D. Spyropoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hans-Jürgen Zimmermann Georgios Tselentis Maarten van Someren Georgios Dounias

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Petasis, G., Petridis, S., Paliouras, G., Karkaletsis, V., Perantonis, S.J., Spyropoulos, C.D. (2002). Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages. In: Zimmermann, HJ., Tselentis, G., van Someren, M., Dounias, G. (eds) Advances in Computational Intelligence and Learning. International Series in Intelligent Technologies, vol 18. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0324-7_14

Download citation

DOI: https://doi.org/10.1007/978-94-010-0324-7_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3872-0
Online ISBN: 978-94-010-0324-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics