Abstract
This paper presents a Named Entity Recognition (NER) system for Spanish which combines the learning and knowledge approaches. Our contribution focuses on two matters: first, a discussion about selecting the best features for a machine learning NER system. Second, an error study of this system which lead us to the creation of a set of general post-processing rules. These issues are explained in detail and then evaluated. The selection of features provides an improvement of around 2.3% over the results of our previous system while the application of the set of post-processing rules provides an increment of performance which is around 3.6%, reaching finally 83.37% f-score.
This research has been partially funded by the Spanish Government under project CICyT number TIC2003-07158-C04-01.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arevalo, M., Civit, M., Martí, M.A.: Mice: A module for named entity recognition and clasification. International Journal of Corpus Linguistics 9(1), 53–68 (2004)
Bogers, T.: Dutch named entity recognition: Optimizing features, algorithms, and output. Master’s thesis, Tilburg University (September 2004)
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Proceedings of the 6th Workshop on Very Large Corpora, WVLC 1998, Montreal, Canada (1998)
Carreras, X., Màrques, L., Padró, L.: Named entity extraction using adaboost. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 167–170 (2002)
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner. Technical Report ILK 03-10, Tilburg University (November 2003)
Ferrández, Ó., Kozareva, Z., Montoyo, A., Muñoz, R.: Nerua: sistema de detección y clasificación de entidades utilizando aprendizaje automático. Procesamiento del Lenguaje Natural 35, 37–44 (2005)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Proceedings of the 16th Conference on Computational Linguistics, Copenhagen, Denmark, pp. 466–471 (1996)
Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named entity recognition from diverse text types. In: Mitkov, R., Nicolov, N., Angelova, G., Bontcheva, K., Nikolov, N. (eds.) Recent Advances in Natural Language Processing, RANLP 2001, Tzigov Chark, Bulgaria (2001)
Rössler, M.: Using markov models for named entity recognition in german newspapers. In: Proceedings of the Workshop on Machine Learning Aproaches in Computational Linguistics, Trento, Italy, pp. 29–37 (2002)
Schröder, I.: A case study in part-of-speech tagging using the icopost toolkit. Technical Report FBI-HH-M-314/02, Department of Computer Science, University of Hamburg (2002)
Suárez, A., Palomar, M.: A maximum entropy-based word sense disambiguation system. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002, pp. 960–966 (August 2002)
Tjong Kim Sang, E.F.: Introduction to the conll 2002 shared task: Language-independent named entity recognition. In: Roth, D., van den Bosch, A. (eds.) Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 155–158 (2002)
Toral, A.: DRAMNERI: a free knowledge based tool to Named Entity Recognition. In: Proceedings of the 1st Free Software Technologies Conference (2005)
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, pp. 473–480 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferrández, Ó., Toral, A., Muñoz, R. (2006). Fine Tuning Features and Post-processing Rules to Improve Named Entity Recognition. In: Kop, C., Fliedl, G., Mayr, H.C., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2006. Lecture Notes in Computer Science, vol 3999. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11765448_16
Download citation
DOI: https://doi.org/10.1007/11765448_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34616-6
Online ISBN: 978-3-540-34617-3
eBook Packages: Computer ScienceComputer Science (R0)