Abstract
Named Entity Recognition is an important and challenging task of Information Extraction. Conditional Random Fields (CRF) is a probabilistic method for structured prediction, which can be used in the Named Entity Recognition task. This paper presents the use of Conditional Random Fields for Named Entity Recognition in Portuguese texts considering an additional feature informed by a Local Grammar. Local grammars are handmade rules to identify named entities within the text. Moreover, we also present a study about the boundaries of CRF’s performance when using a result coming from any other classifier as an additional feature. Two well-known collections in Portuguese were used as training and test sets respectively. The results obtained outperform results of state-of-the-art systems reported in the literature for the Portuguese.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Jiang, J.: Information extraction from text. In: Mining Text Data, pp. 11–41. Springer, Boston (2012)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996)
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 142–147 (2003)
Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área. Linguateca (2007). http://www.linguateca.pt/aval_conjunta/LivroHAREM/Livro-SantosCardoso2007.pdf. ISBN 978-989-20-0731-1
Mota, C., Santos, D.: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca (2008). ISBN 978-989-20-1656-6
Pirovani, J.P.C., Oliveira, E.: Extração de Nomes de Pessoas em Textos em Português: uma Abordagem Usando Gramáticas Locais. In: Computer on the Beach 2015. SBC, Florianópolis, March 2015
Pellucci, P.R.S., de Paula, R.R., de Oliveira Silva, W.B., Ladeira, A.P.: Utilização de técnicas de aprendizado de máquina no reconhecimento de entidades nomeadas no português. e-Xacta 4(1), 73–81 (2011)
Oudah, M., Shaalan, K.F.: A pipeline Arabic named entity recognition using a hybrid approach. In: COLING, pp. 2159–2176 (2012)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, vol. 1, pp. 282–289 (2001)
Gross, M.: The construction of local grammars. In: Roche, E., Schabs, Y. (eds.) Finite-State Language Processing, Language, Speech, and Communication, pp. 329–354. The MIT Press, Cambridge (1997)
Milidiú, R.L., Duarte, J.C., Cavalcante, R.: Machine learning algorithms for Portuguese named entity recognition. Inteligencia Artif. 11(36), 67–75 (2007). Revista Iberoamericana de Inteligencia Artificial
do Amaral, D.O.F.: O reconhecimento de entidades nomeadas por meio de conditional random fields para a língua portuguesa. Master’s thesis, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil (2013)
dos Santos, C.N., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the Fifth Named Entities Workshop, ACL 2015, pp. 25–33 (2015)
Konkol, M., Konopík, M.: Segment representations in named entity recognition. In: International Conference on Text, Speech, and Dialogue, pp. 61–70. Springer (2015)
Amaral, D.O., Fonseca, E.B., Lopes, L., Vieira, R.: Comparative analysis of Portuguese named entities recognition tools. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2554–2558 (2014)
dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. Association for Computational Linguistics (2011)
do Amaral, D.O.F., Buffet, M., Vieira, R.: Comparative analysis between notations to classify named entities using conditional random fields (2015)
Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends® Mach. Learn. 4(4), 267–373 (2012)
Bussab, W.d.O., Morettin, P.A.: Estatística básica. Saraiva (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Pirovani, J.P.C., de Oliveira, E. (2018). CRF+LG: A Hybrid Approach for the Portuguese Named Entity Recognition. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-76348-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76347-7
Online ISBN: 978-3-319-76348-4
eBook Packages: EngineeringEngineering (R0)