Skip to main content

A Golden Resource for Named Entity Recognition in Portuguese

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3960))

Abstract

This paper presents a collection of texts manually annotated with named entities in context, which was used for HAREM, the first evaluation contest for named entity recognizers for Portuguese. We discuss the options taken and the originality of our approach compared with previous evaluation initiatives in the area. We document the choice of categories, their quantitative weight in the overall collection and how we deal with vagueness and underspecification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: an Advanced NER Evaluation Contest for Portuguese. In: Proceedings of LREC 2006, Genoa, Italy (2006)

    Google Scholar 

  2. Hirschman, L.: The evolution of Evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language 12(4), 281–305 (1998)

    Article  Google Scholar 

  3. Santos, D.: Avaliação conjunta. In: Santos, D. (ed.) Avaliação conjunta: um novo paradigma no processamento computacional da língua portuguesa (In press)

    Google Scholar 

  4. Santos, D., Barreiro, A.: On the problems of creating a consensual golden standard of inflected forms in Portuguese. In: Lino, et al. (eds.) Proceedings of LREC 2004, pp. 483–486 (2004)

    Google Scholar 

  5. Santos, D., Costa, L., Rocha, P.: Cooperatively evaluating Portuguese morphology. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V., et al. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 259–266. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Grisham, R., Sundheim, B.: Message Understaning Conference - 6: A Brief History. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING 1996), Copenhagen, pp. 466–471 (1996)

    Google Scholar 

  7. Mota, C., Santos, D., Ranchhod, E.: Avaliação de reconhecimento de entidades mencionadas: princípio de AREM. In: Santos, D. (ed.) Avaliação conjunta: um novo paradigma no processamento computacional da língua portuguesa (In press)

    Google Scholar 

  8. Rocha, P., Santos, D.: CLEF: Abrindo a porta à participação internacional em avaliação de RI do português. In: Santos, D., ed.: Avaliação conjunta: um novo paradigma no processamento computacional da língua portuguesa (In press)

    Google Scholar 

  9. Sang, E.F.T.K.: Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2002, Taipei, pp. 155–158 (2002)

    Google Scholar 

  10. Sang, E.F.T.K., Meulder, F.D.: Introduction to the CoNLL-2003 Shared Task: Language- Independent Named Entity Recognition. In: Proc. of CoNLL-2003, Edmonton, pp. 142–147 (2003)

    Google Scholar 

  11. Ferro, L., et al.: TIDES 2003 Standard for the Annotation of Temporal Expressions. Technical report, MITRE (2004)

    Google Scholar 

  12. Doddington, G., et al.: The Automatic Content Extraction (ACE) Program. Tasks, Data and Evaluation. In: Lino, et al. (eds.) Proc. LREC 2004, Lisbon, pp. 837–840 (2004)

    Google Scholar 

  13. Guthrie, L., Basili, R., Hajicova, E., Jelinek, F.: Beyond Entity Recognition – Semantic Labelling for NLP Tasks. In: Workshop proceedings, ELRA, Lisboa (2004)

    Google Scholar 

  14. Sekine, S., Sudo, K., Nobata, C.: Extended Named Entity Hierarchy. In: González Rodríguez, M., Araujo, C.P.S. (eds.) Proceedings LREC 2002, Las Palmas, pp. 1818–1824 (2002)

    Google Scholar 

  15. Bering, C., et al.: Corpora and evaluation tools for multilingual named entity grammar development. In: Newman, S., Schirra, S.H. (eds.) Proceedings of Multilingual Corpora Workshop at Corpus Linguistics 2003, Lancaster, pp. 43–52 (2003)

    Google Scholar 

  16. Merchant, R., Okurowski, M.E., Chinchor, N.: The Multilingual Entity Task (met) overview. In: Proceedings of TIPSTER Text Program (Phase II), Tysons Corner, Virginia (1996)

    Google Scholar 

  17. Callmeier et al: COLLATE-Annotationsschema. Technical report, DFKI (2003), http://www.coli.uni-sb.de/~erbach/pub/collate/AnnotationScheme.pdf

  18. Arévalo, M., Carreras, X., Márquez, L., Martí, M.A., Padró, L., Simón, M.J.: A Proposal for Wide-Coverage Spanish Named Entity Recognition. Revista da SEPLN 1(3), 1–15 (2002)

    Google Scholar 

  19. Kokkinakis, D.: Reducing the effect of name explosion. In: Guthrie, L., Basili, R., Hajicova, E., Jelinek, F. (eds.) Beyond Named Entity Recognition - Semantic Labelling for NLP Tasks. Pre-conference Workshop at LREC 2004, Lisboa, Portugal, pp. 1–6 (2004)

    Google Scholar 

  20. Karlgren, J., Cutting, D.: Recognizing Text Genres with Simple Metrics Using Discriminant Analysis. In: Proceedings of COLING 1994, Kyoto, Japan, pp. 1071–1075 (1994)

    Google Scholar 

  21. Santos, D.: Towards language-specific applications. Machine Translation 14(2), 83–112 (1999)

    Article  Google Scholar 

  22. Palmer, D.D., Day, D.S.: A Statistical Profile of the Named Entity Task. In: Proceedings of ANLP 1997, Washington D.C, pp. 190–193 (1997)

    Google Scholar 

  23. Bick, E.: Multi-level NER for Portuguese in a CG framework. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 118–125. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  24. Mikheev, A., Moens, M., Grover, C.: Named Entity recognition without Gazetteers. In: Proceedings of EACL 1999, Bergen, pp. 1–8 (1999)

    Google Scholar 

  25. Santos, D.: The importance of vagueness in translation: Examples from English to Portuguese. Romansk Forum 5, 43–69 (1997)

    Google Scholar 

  26. Calzolari, N., Corazzari, O.: Senseval/Romanseva: The Framework for Italian. Computers and the Humanities 34(1-2), 61–78 (2000)

    Article  Google Scholar 

  27. Macklovitch, E.: Where the Tagger Falters. In: Proc. of the 4th International Coference on Theoretical amd Methodological Issues in Machine Translation, Montréal, pp. 113–126 (1992)

    Google Scholar 

  28. Voorhees, E.M., Tice, D.M.: Building a Question Answering Test Collection. In: Belkin, N., et al. (eds.) Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, pp. 200–207 (2000)

    Google Scholar 

  29. Cardoso, N.: Avaliação de Sistemas de Reconhecimento de Entidades Mencionadas. Master’s thesis, FEUP, Porto, Portugal (2006) (In preparation)

    Google Scholar 

  30. Seco, N., Santos, D., Cardoso, N., Vilela, R.: A complex evaluation architecture for HAREM. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 260–263. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Santos, D., Cardoso, N. (2006). A Golden Resource for Named Entity Recognition in Portuguese. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_8

Download citation

  • DOI: https://doi.org/10.1007/11751984_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34045-4

  • Online ISBN: 978-3-540-34046-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics