Skip to main content

REPENTINO – A Wide-Scope Gazetteer for Entity Recognition in Portuguese

  • Conference paper
Book cover Computational Processing of the Portuguese Language (PROPOR 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3960))

Abstract

In this paper we describe REPENTINO, a publicly available gazetteer intended to help the development of named entity recognition systems for Portuguese. REPENTINO wishes to minimize the problems developers face due to the limited availability of this type of lexical-semantic resources for Portuguese. The data stored in REPENTINO was mostly extracted from corpora and from the web using simple semi-automated methods. Currently, REPENTINO stores nearly 450k instances of named entities divided in more than 100 categories and subcategories covering a much wider set of domains than those usually included in traditional gazetteers. We will present some figures regarding the current content of the gazetteer and describe future work regarding the evaluation of this resource and its enrichment with additional information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grishman, R., Sundheim, B.: Message Understanding Conference - 6: A Brief History. In: Proc. Int. Conf. on Computational Linguistics, Copenhagen, pp. 466–471 (1996)

    Google Scholar 

  2. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, l., Strassel, S., Weischedel, R.: The Automatic Content Extraction (ACE) Program: Tasks, Data, and Evaluation. In: Proc. 4th Int. Conf. on Language Resources and Evaluation, Lisboa, pp. 837–840 (2004)

    Google Scholar 

  3. Santos, D., Seco, N., Cardoso, N., Vilela, R.: HAREM: An Advanced NER Evaluation Contest for Portuguese. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, Genoa, Italy (2006)

    Google Scholar 

  4. NIST. EDT Guidelines for English V4.2.6 (2004), http://www.ldc.upenn.edu/Projects/ACE/docs/EnglishEDTV4-2-6.PDF

  5. Sekine, S., Sudo, K., Nobata, C.: Extended Named Entity Hierarchy. In: Proc. 3rd Int. Conf. on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain (2002)

    Google Scholar 

  6. Sekine, S., Nobata, C.: Definition, dictionaries and tagger for Extended Named Entity Hierarchy. In: Proc. 4th Int. Conf. on Language Resources and Evaluation, Lisboa, Portugal, pp. 1977–1980 (2004)

    Google Scholar 

  7. Pasca, M.: Acquisition of categorized named entities for web search. In: Proc. 13th ACM Conf. on Information & Knowledge management, Washington, D.C., USA, pp. 137–145 (2004)

    Google Scholar 

  8. Sarmento, L.: BACO – A large database of text and co-occurrences. In: Proc. 5th Int. Conf., on Language Resources and Evaluation, Genoa, Italy (2006)

    Google Scholar 

  9. Sarmento, L.: SIEMÊS – a Named-Entity Recognizer for Portuguese Relying on Similarity Rules. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 90–99. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Hasegawa, T., Sekine, S., Grishman, R.: Discovering Relations among Named Entities from Large Corpora. In: Proc. Annual Meeting of Association of Computational Linguistics (ACL 2004), Barcelona, Spain, pp. 415–422 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sarmento, L., Pinto, A.S., Cabral, L. (2006). REPENTINO – A Wide-Scope Gazetteer for Entity Recognition in Portuguese. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_4

Download citation

  • DOI: https://doi.org/10.1007/11751984_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34045-4

  • Online ISBN: 978-3-540-34046-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics