Skip to main content

CRF+LG: A Hybrid Approach for the Portuguese Named Entity Recognition

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 736))

Abstract

Named Entity Recognition is an important and challenging task of Information Extraction. Conditional Random Fields (CRF) is a probabilistic method for structured prediction, which can be used in the Named Entity Recognition task. This paper presents the use of Conditional Random Fields for Named Entity Recognition in Portuguese texts considering an additional feature informed by a Local Grammar. Local grammars are handmade rules to identify named entities within the text. Moreover, we also present a study about the boundaries of CRF’s performance when using a result coming from any other classifier as an additional feature. Two well-known collections in Portuguese were used as training and test sets respectively. The results obtained outperform results of state-of-the-art systems reported in the literature for the Portuguese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.linguateca.pt/HAREM/.

  2. 2.

    http://unitexgramlab.org/.

  3. 3.

    http://opennlp.apache.org/.

  4. 4.

    http://mallet.cs.umass.edu/.

  5. 5.

    http://www.linguateca.pt/HAREM/.

  6. 6.

    http://www.inf.pucrs.br/linatural/recursos_para_reconhecimento_de_entidades_nomeadas/NERP_CRF.xml.

  7. 7.

    http://www.cnts.ua.ac.be/conll2002/ner/bin/conlleval.txt.

References

  1. Jiang, J.: Information extraction from text. In: Mining Text Data, pp. 11–41. Springer, Boston (2012)

    Google Scholar 

  2. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  3. Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996)

    Google Scholar 

  4. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL-2003, Edmonton, Canada, pp. 142–147 (2003)

    Google Scholar 

  5. Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área. Linguateca (2007). http://www.linguateca.pt/aval_conjunta/LivroHAREM/Livro-SantosCardoso2007.pdf. ISBN 978-989-20-0731-1

  6. Mota, C., Santos, D.: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. Linguateca (2008). ISBN 978-989-20-1656-6

    Google Scholar 

  7. Pirovani, J.P.C., Oliveira, E.: Extração de Nomes de Pessoas em Textos em Português: uma Abordagem Usando Gramáticas Locais. In: Computer on the Beach 2015. SBC, Florianópolis, March 2015

    Google Scholar 

  8. Pellucci, P.R.S., de Paula, R.R., de Oliveira Silva, W.B., Ladeira, A.P.: Utilização de técnicas de aprendizado de máquina no reconhecimento de entidades nomeadas no português. e-Xacta 4(1), 73–81 (2011)

    Google Scholar 

  9. Oudah, M., Shaalan, K.F.: A pipeline Arabic named entity recognition using a hybrid approach. In: COLING, pp. 2159–2176 (2012)

    Google Scholar 

  10. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, vol. 1, pp. 282–289 (2001)

    Google Scholar 

  11. Gross, M.: The construction of local grammars. In: Roche, E., Schabs, Y. (eds.) Finite-State Language Processing, Language, Speech, and Communication, pp. 329–354. The MIT Press, Cambridge (1997)

    Google Scholar 

  12. Milidiú, R.L., Duarte, J.C., Cavalcante, R.: Machine learning algorithms for Portuguese named entity recognition. Inteligencia Artif. 11(36), 67–75 (2007). Revista Iberoamericana de Inteligencia Artificial

    Google Scholar 

  13. do Amaral, D.O.F.: O reconhecimento de entidades nomeadas por meio de conditional random fields para a língua portuguesa. Master’s thesis, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil (2013)

    Google Scholar 

  14. dos Santos, C.N., Guimaraes, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the Fifth Named Entities Workshop, ACL 2015, pp. 25–33 (2015)

    Google Scholar 

  15. Konkol, M., Konopík, M.: Segment representations in named entity recognition. In: International Conference on Text, Speech, and Dialogue, pp. 61–70. Springer (2015)

    Google Scholar 

  16. Amaral, D.O., Fonseca, E.B., Lopes, L., Vieira, R.: Comparative analysis of Portuguese named entities recognition tools. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2554–2558 (2014)

    Google Scholar 

  17. dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)

    Google Scholar 

  18. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. Association for Computational Linguistics (2011)

    Google Scholar 

  19. do Amaral, D.O.F., Buffet, M., Vieira, R.: Comparative analysis between notations to classify named entities using conditional random fields (2015)

    Google Scholar 

  20. Sutton, C., McCallum, A.: An introduction to conditional random fields. Found. Trends® Mach. Learn. 4(4), 267–373 (2012)

    Article  MATH  Google Scholar 

  21. Bussab, W.d.O., Morettin, P.A.: Estatística básica. Saraiva (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juliana P. C. Pirovani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pirovani, J.P.C., de Oliveira, E. (2018). CRF+LG: A Hybrid Approach for the Portuguese Named Entity Recognition. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76348-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76347-7

  • Online ISBN: 978-3-319-76348-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics