Skip to main content

Lexical Semantics Annotation for Enriched Portuguese Corpora

  • Conference paper
  • First Online:
  • 587 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Abstract

The semantic annotation of corpora has an important role to play in ensuring that sentences occurring in natural language texts are correctly understood based on their intended context. Two examples of lexical semantic units that contribute to this knowledge are word senses – which allow words with multiple meanings to be understood based on the context in which they are used – and named entities – which can be disambiguated and linked back to the specific encyclopedic resources that describe them.

In this paper, we describe the construction of lexical semantically-annotated corpora for Portuguese, annotated with both word senses linked to senses in a Portuguese wordnet and named entities linked to Portuguese Wikipedia entries using DBpedia. The result is a gold-standard lexical semantically-annotated resource that is useful in supporting the training and evaluation of tools for the disambiguation of these lexical units in Portuguese.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Wikipedia, the free encyclopedia: http://en.wikipedia.org.

  2. 2.

    Available from: http://brat.nlplab.org.

  3. 3.

    In this first version of the word sense annotation task, fewer sentences were distributed to annotators than in the named entity disambiguation task. These gaps will be addressed in future versions of the word sense annotation task.

  4. 4.

    Accessible from: http://www.meta-share.eu/.

References

  1. Barreto, F., Branco, A., Ferreira, E., Mendes, A., Nascimento, M.F.B., Nunes, F., Silva, J.: Open resources and tools for the shallow processing of Portuguese: the TagShare Project. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006, pp. 1438–1443 (2006)

    Google Scholar 

  2. Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: the CINTIL-PropBank. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)

    Google Scholar 

  3. Costa, F., Branco, A.: LXGram: a deep linguistic processing grammar for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 86–89. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Branco, A., Costa, F., Silva, J., Silveira, S., Castro, S., Avelãs, M., Pinto, C., Graça, J.: Developing a deep linguistic databank supporting a collection of treebanks: the CINTIL deepgrambank. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta (2010)

    Google Scholar 

  5. Branco, A., Silva, J.: A suite of shallow processing tools for Portuguese: LX-suite. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics: Posters and Demonstrations, EACL 2006, pp. 179–182. Association for Computational Linguistics, Trento (2006)

    Google Scholar 

  6. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  7. Ferreira, E., Balsa, J., Branco, A.: Combining rule-based and statistical methods for named entity recognition in Portuguese. In: V Workshop em Tecnologia da Informação e da Linguagem Humana, TIL 2007, pp. 1615–1624 (2007)

    Google Scholar 

  8. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6(2), 167–195 (2012)

    Google Scholar 

  9. MultiWordNet: The MultiWordNet project. http://multiwordnet.fbk.eu/english/home.php (nd). Accessed 13 Jan 2015

  10. Neale, S., Silva, J., Branco, A.: A flexible interface tool for manual word sense annotation. In: Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, ISA-11, pp. 67–71. Association for Computational Linguistics, London (2015)

    Google Scholar 

  11. Nóbrega, F.A.A., Pardo, T.A.S.: General purpose word sense disambiguation methods for nouns in Portuguese. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 94–101. Springer, Heidelberg (2014)

    Google Scholar 

  12. Cardoso, P.C.F., Maziero, E.G., Jorge, M.L.R.C., Seno, E.M.R., di Felippo, A., Rino, L.H.M., das Nunes, M.G.V., Pardo, T.A.S.: CSTNews - a discourse-annotated corpus forsingle and multi-document summarization of news texts in Brazilian Portuguese. In: Proceedings of the Third Annual RST and Text Studies Workshop, pp. 88–105 (2011)

    Google Scholar 

  13. Santos, J., Anastacio, I., Martins, B.: Named entity disambiguation over texts written in the Portuguese or Spanish languages. Lat. Am. Trans. IEEE (Rev. IEEE Am. Lat.) 13(3), 856–862 (2015)

    Article  Google Scholar 

  14. Stenetorp, P., Pyysalo, S., Topić, G., Ananiadou, S., Aizawa, A.: Normalisation with the BRAT rapid annotation tool. In: Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine, Zürich, Switzerland (2012)

    Google Scholar 

  15. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: Brat: a web-based tool for nlp-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics, Avignon (2012)

    Google Scholar 

Download references

Acknowledgements

The results reported in this paper were partially supported by the Portuguese Government’s P2020 program under the grant 08/SI/2015/3279: ASSET-Intelligent Assistance for Everyone Everywhere, by FCT-Fundao para a Cincioa e Tecnologia under the grant PTDC/EEI-SII/1940/2012: DP4LT-Deep Language Processing for Language Technology, and by the ECs FP7 program under the grant number 610516: QTLeap-Quality Translation by Deep Language Engineering Approaches.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Silva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Neale, S., Pereira, R.V., Silva, J., Branco, A. (2016). Lexical Semantics Annotation for Enriched Portuguese Corpora. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics