Skip to main content

Comparing NERP-CRF with Publicly Available Portuguese Named Entities Recognition Tools

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

Abstract

This paper presents the evaluation of NERP-CRF, a Conditional Random Fields (CRF) based tool for Portuguese Named Entities Recognition (NER) against other publicly available NER tools. The presented evaluation is based on the comparison with three other NER tools for Portuguese. The comparison is made observing Recall and Precision measures obtained by each tool over the HAREM corpus, a golden standard for NER for Portuguese texts. The experiments were initially conducted considering ten categories and then, considering a reduced number of categories. The results show that NERP CRF outperforms the others tools when sufficiently trained for four entity categories.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jiang, J.: Information extraction from text. In: Mining Text Data, ch. 2, pp. 11–41. Springer, New York (2012)

    Google Scholar 

  2. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, pp. 104–107 (2004)

    Google Scholar 

  3. Suakkaphong, N., Zhang, Z., Chen, H.: Disease Named Entity Recognition Using Semisupervised Learning and Conditional Random Fields. Journal of the American Society for Information Science and Technology, 727–737 (2011)

    Google Scholar 

  4. Batista, S., Silva, J., Couto, F., Behera, B.: Geographic Signatures for Semantic Retrieval. In: 6th Workshop on Geographic Information Retrieval, pp. 18–19. ACM (2010)

    Google Scholar 

  5. Freitas, C., Mota, C., Santos, D., Oliveira, H.G., Carvalho, P.: Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese. In: 7th International Conference on Language Resources and Evaluation, pp. 363–3637. LREC. European Language Resources Association. ELRA, Valletta (2010)

    Google Scholar 

  6. Amaral, D.O.F.: Reconhecimento de entidades nomeadas por meio de conditional random fields para a língua portuguesa. M.sc. dissertation, PUCRS, Porto Alegre, Brazil (2012)

    Google Scholar 

  7. Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-Source Language Processing Tools. In: 7th International Conference on Language Resources and Evaluation, LREC, pp. 3485–3490 (2010)

    Google Scholar 

  8. LTasks – Language Tasks, http://ltasks.com

  9. Bick, E.: Functional aspects in portuguese NER. In: Vieira, R., Quaresma, P., das Nunes, M.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS, vol. 3960, pp. 80–89. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e atas do HAREM, a primeira avaliação conjunta na área. In: Santos, D., Cardoso, N. (eds.) ch. 1, pp. 1–16 (2008)

    Google Scholar 

  11. Santos, D.: Caminhos percorridos no mapa da portuguesificação: A linguateca em perspectiva. Linguateca 1, 25–59 (2009)

    Google Scholar 

  12. Carvalho, P., Oliveira, H.G., Mota, C., Santos, D., Freitas, C.: Desafios na avaliação conjunta do reconhecimento de entidades mencionadas: O Segundo HAREM. In: Mota, C., Santos, D. (eds.) Linguateca, ch. 1, pp. 11–31 (2008)

    Google Scholar 

  13. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: 18th International Conference on Machine Learning ICML, pp. 282–289 (2001)

    Google Scholar 

  14. Santos, D., Cardoso, N.: Reconhecimento de entidades mencionadas em português: Documentação e atas do HAREM, a primeira avaliação conjunta na área, ch. 20, pp. 307–326 (2007)

    Google Scholar 

  15. Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: 13th Conference on Computational Natural Language Learning, CONLL, pp. 147–155 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

do Amaral, D.O.F., Fonseca, E., Lopes, L., Vieira, R. (2014). Comparing NERP-CRF with Publicly Available Portuguese Named Entities Recognition Tools. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09761-9_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09760-2

  • Online ISBN: 978-3-319-09761-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics