Lexical Semantics Annotation for Enriched Portuguese Corpora

Neale, Steven; Pereira, Rita Valadas; Silva, João; Branco, António

doi:10.1007/978-3-319-41552-9_30

Lexical Semantics Annotation for Enriched Portuguese Corpora

Steven Neale¹⁸,
Rita Valadas Pereira¹⁸,
João Silva¹⁸ &
…
António Branco¹⁸

Conference paper
First Online: 21 June 2016

587 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9727))

Abstract

The semantic annotation of corpora has an important role to play in ensuring that sentences occurring in natural language texts are correctly understood based on their intended context. Two examples of lexical semantic units that contribute to this knowledge are word senses – which allow words with multiple meanings to be understood based on the context in which they are used – and named entities – which can be disambiguated and linked back to the specific encyclopedic resources that describe them.

In this paper, we describe the construction of lexical semantically-annotated corpora for Portuguese, annotated with both word senses linked to senses in a Portuguese wordnet and named entities linked to Portuguese Wikipedia entries using DBpedia. The result is a gold-standard lexical semantically-annotated resource that is useful in supporting the training and evaluation of tools for the disambiguation of these lexical units in Portuguese.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Wikipedia, the free encyclopedia: http://en.wikipedia.org.
2.
Available from: http://brat.nlplab.org.
3.
In this first version of the word sense annotation task, fewer sentences were distributed to annotators than in the named entity disambiguation task. These gaps will be addressed in future versions of the word sense annotation task.
4.
Accessible from: http://www.meta-share.eu/.

References

Barreto, F., Branco, A., Ferreira, E., Mendes, A., Nascimento, M.F.B., Nunes, F., Silva, J.: Open resources and tools for the shallow processing of Portuguese: the TagShare Project. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC 2006, pp. 1438–1443 (2006)
Google Scholar
Branco, A., Carvalheiro, C., Pereira, S., Silveira, S., Silva, J., Castro, S., Graça, J.: A PropBank for Portuguese: the CINTIL-PropBank. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul (2012)
Google Scholar
Costa, F., Branco, A.: LXGram: a deep linguistic processing grammar for Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 86–89. Springer, Heidelberg (2010)
Chapter Google Scholar
Branco, A., Costa, F., Silva, J., Silveira, S., Castro, S., Avelãs, M., Pinto, C., Graça, J.: Developing a deep linguistic databank supporting a collection of treebanks: the CINTIL deepgrambank. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta (2010)
Google Scholar
Branco, A., Silva, J.: A suite of shallow processing tools for Portuguese: LX-suite. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics: Posters and Demonstrations, EACL 2006, pp. 179–182. Association for Computational Linguistics, Trento (2006)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Ferreira, E., Balsa, J., Branco, A.: Combining rule-based and statistical methods for named entity recognition in Portuguese. In: V Workshop em Tecnologia da Informação e da Linguagem Humana, TIL 2007, pp. 1615–1624 (2007)
Google Scholar
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6(2), 167–195 (2012)
Google Scholar
MultiWordNet: The MultiWordNet project. http://multiwordnet.fbk.eu/english/home.php (nd). Accessed 13 Jan 2015
Neale, S., Silva, J., Branco, A.: A flexible interface tool for manual word sense annotation. In: Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation, ISA-11, pp. 67–71. Association for Computational Linguistics, London (2015)
Google Scholar
Nóbrega, F.A.A., Pardo, T.A.S.: General purpose word sense disambiguation methods for nouns in Portuguese. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.G. (eds.) PROPOR 2014. LNCS, vol. 8775, pp. 94–101. Springer, Heidelberg (2014)
Google Scholar
Cardoso, P.C.F., Maziero, E.G., Jorge, M.L.R.C., Seno, E.M.R., di Felippo, A., Rino, L.H.M., das Nunes, M.G.V., Pardo, T.A.S.: CSTNews - a discourse-annotated corpus forsingle and multi-document summarization of news texts in Brazilian Portuguese. In: Proceedings of the Third Annual RST and Text Studies Workshop, pp. 88–105 (2011)
Google Scholar
Santos, J., Anastacio, I., Martins, B.: Named entity disambiguation over texts written in the Portuguese or Spanish languages. Lat. Am. Trans. IEEE (Rev. IEEE Am. Lat.) 13(3), 856–862 (2015)
Article Google Scholar
Stenetorp, P., Pyysalo, S., Topić, G., Ananiadou, S., Aizawa, A.: Normalisation with the BRAT rapid annotation tool. In: Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine, Zürich, Switzerland (2012)
Google Scholar
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: Brat: a web-based tool for nlp-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics, Avignon (2012)
Google Scholar

Download references

Acknowledgements

The results reported in this paper were partially supported by the Portuguese Government’s P2020 program under the grant 08/SI/2015/3279: ASSET-Intelligent Assistance for Everyone Everywhere, by FCT-Fundao para a Cincioa e Tecnologia under the grant PTDC/EEI-SII/1940/2012: DP4LT-Deep Language Processing for Language Technology, and by the ECs FP7 program under the grant number 610516: QTLeap-Quality Translation by Deep Language Engineering Approaches.

Author information

Authors and Affiliations

NLX - Natural Language and Speech Group, Department of Informatics, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
Steven Neale, Rita Valadas Pereira, João Silva & António Branco

Authors

Steven Neale
View author publications
You can also search for this author in PubMed Google Scholar
Rita Valadas Pereira
View author publications
You can also search for this author in PubMed Google Scholar
João Silva
View author publications
You can also search for this author in PubMed Google Scholar
António Branco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Silva .

Editor information

Editors and Affiliations

Universidade de Lisbon, Portugal
João Silva
ISCTE-IUL, Lisbon, Portugal
Ricardo Ribeiro
Universidade de Évora, Évora, Portugal
Paulo Quaresma
Universidade de Caxias do Sul, Caxias do Suö, Brazil
André Adami
Universidade de Lisbon, Lisboa, Portugal
António Branco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neale, S., Pereira, R.V., Silva, J., Branco, A. (2016). Lexical Semantics Annotation for Enriched Portuguese Corpora. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-41552-9_30
Published: 21 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics