Improving Portuguese Term Extraction

Lopes, Lucelene; Vieira, Renata

doi:10.1007/978-3-642-28885-2_9

Lucelene Lopes²³ &
Renata Vieira²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7243))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

1161 Accesses
1 Citations

Abstract

This paper presents the evaluation of a set of heuristics to improve the quality of extracted terms from an annotated domain corpus written in Portuguese. The proposed heuristics start from part-of-speech and grammatical functional annotation of texts, identifying nouns and noun phrases that are the best candidates to be considered terms of the domain. These nouns and noun phrases are submitted to a set of approximative rules (heuristics) that may either discard some, accept others (removing words or not), or even discover implicit terms that can be inferred. The effectiveness of these heuristics is verified through a corpus experiment, on the basis of a reference list for which usual metrics are computed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Banerjee, S., Pedersen, T.: The design, implementation and use of the ngram statistics package. In: 4th ITPCL, pp. 370–381 (2003)
Google Scholar
Bick, E.: The parsing system PALAVRAS: automatic grammatical analysis of portuguese in constraint grammar framework. PhD thesis, Arhus University (2000)
Google Scholar
Buitelaar, P., Cimiano, P., Magnini, B.: Ontology learning from text: An overview. In: Buitelaar, P., Cimiano, P., Magnini, B. (eds.) Ontology Learning from Text. Front. in Art. Intel. and Apllic., vol. 123. IOS Press (2005)
Google Scholar
Chung, T.M.: A corpus comparison approach for terminology extraction. Terminology 9, 221–246 (2003)
Article Google Scholar
Coulthard, R.J.: The application of Corpus Methodology to Translation: the JPED parallel corpus and the Pediatrics comparable corpus. Master’s thesis, UFSC (2005)
Google Scholar
Fortuna, B., Lavrač, N., Velardi, P.: Advancing Topic Ontology Learning through Term Extraction. In: Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 626–635. Springer, Heidelberg (2008)
Chapter Google Scholar
Lopes, L., Fernandes, P., Vieira, R., Fedrizzi, G.: ExATO lp – An Automatic Tool for Term Extraction from Portuguese Language Corpora. In: Proc. of the 4th Language & Tech. Conf., LTC 2009, pp. 427–431. Adam Mickiewicz Univ. (2009)
Google Scholar
Lopes, L., Oliveira, L.H., Vieira, R.: Portuguese term extraction methods: Comparing linguistic and statistical approaches. In: PROPOR 2010 (2010)
Google Scholar
Lopes, L., Vieira, R., Finatto, M.J., Martins, D.: Extracting compound terms from domain corpora. Journal of the Brazilian Computer Society 16, 247–259 (2010)
Article Google Scholar
Lopes, L., Vieira, R., Finatto, M.J., Zanette, A., Martins, D., Ribeiro Jr., L.C.: Automatic extraction of composite terms for construction of ontologies: an experiment in the health care area. RECIIS 3(1), 72–84 (2009)
Google Scholar
Maedche, A., Staab, S.: Learning ontologies for the semantic web. In: SemWeb (2001)
Google Scholar
Maia, L.C., Souza, R.R.: Uso de sintagmas nominais na classificação automática de documentos eletrônicos. Perspec. em Ciência da Inform. 15, 154–172 (2010)
Article Google Scholar
Ribeiro, L.C.: OntoLP: Construção semi-automática de ontologias a partir de textos da língua portuguesa. Master’s thesis, UNISINOS (2008)
Google Scholar
Silva, J., Branco, A., Castro, S., Reis, R.: Out-of-the-Box Robust Parsing of Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 75–85. Springer, Heidelberg (2010)
Chapter Google Scholar
van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1975)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculdade de Informática,FACIN, PUCRS, Porto Alegre, RS, Brazil
Lucelene Lopes & Renata Vieira

Authors

Lucelene Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Renata Vieira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

UFSCAR, Rod. Washington Luís, 13565-905, São Carlos, Brazil
Helena Caseli
UFRGS, Av. Bento Gonçalves, 9500, 91501-970, Porto Alegre, Brazil
Aline Villavicencio
DETI/IEETA, Universidade de Aveiro, Campus Universitário de Santiago, 3810-193, Aveiro, Portugal
António Teixeira
UC/ IT, DEEC, Universidade de Coimbra, Polo 2, 3030-290, Coimbra, Portugal
Fernando Perdigão

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lopes, L., Vieira, R. (2012). Improving Portuguese Term Extraction. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-28885-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics