Abstract
This paper presents the evaluation of a set of heuristics to improve the quality of extracted terms from an annotated domain corpus written in Portuguese. The proposed heuristics start from part-of-speech and grammatical functional annotation of texts, identifying nouns and noun phrases that are the best candidates to be considered terms of the domain. These nouns and noun phrases are submitted to a set of approximative rules (heuristics) that may either discard some, accept others (removing words or not), or even discover implicit terms that can be inferred. The effectiveness of these heuristics is verified through a corpus experiment, on the basis of a reference list for which usual metrics are computed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Banerjee, S., Pedersen, T.: The design, implementation and use of the ngram statistics package. In: 4th ITPCL, pp. 370–381 (2003)
Bick, E.: The parsing system PALAVRAS: automatic grammatical analysis of portuguese in constraint grammar framework. PhD thesis, Arhus University (2000)
Buitelaar, P., Cimiano, P., Magnini, B.: Ontology learning from text: An overview. In: Buitelaar, P., Cimiano, P., Magnini, B. (eds.) Ontology Learning from Text. Front. in Art. Intel. and Apllic., vol. 123. IOS Press (2005)
Chung, T.M.: A corpus comparison approach for terminology extraction. Terminology 9, 221–246 (2003)
Coulthard, R.J.: The application of Corpus Methodology to Translation: the JPED parallel corpus and the Pediatrics comparable corpus. Master’s thesis, UFSC (2005)
Fortuna, B., Lavrač, N., Velardi, P.: Advancing Topic Ontology Learning through Term Extraction. In: Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 626–635. Springer, Heidelberg (2008)
Lopes, L., Fernandes, P., Vieira, R., Fedrizzi, G.: ExATO lp – An Automatic Tool for Term Extraction from Portuguese Language Corpora. In: Proc. of the 4th Language & Tech. Conf., LTC 2009, pp. 427–431. Adam Mickiewicz Univ. (2009)
Lopes, L., Oliveira, L.H., Vieira, R.: Portuguese term extraction methods: Comparing linguistic and statistical approaches. In: PROPOR 2010 (2010)
Lopes, L., Vieira, R., Finatto, M.J., Martins, D.: Extracting compound terms from domain corpora. Journal of the Brazilian Computer Society 16, 247–259 (2010)
Lopes, L., Vieira, R., Finatto, M.J., Zanette, A., Martins, D., Ribeiro Jr., L.C.: Automatic extraction of composite terms for construction of ontologies: an experiment in the health care area. RECIIS 3(1), 72–84 (2009)
Maedche, A., Staab, S.: Learning ontologies for the semantic web. In: SemWeb (2001)
Maia, L.C., Souza, R.R.: Uso de sintagmas nominais na classificação automática de documentos eletrônicos. Perspec. em Ciência da Inform. 15, 154–172 (2010)
Ribeiro, L.C.: OntoLP: Construção semi-automática de ontologias a partir de textos da língua portuguesa. Master’s thesis, UNISINOS (2008)
Silva, J., Branco, A., Castro, S., Reis, R.: Out-of-the-Box Robust Parsing of Portuguese. In: Pardo, T.A.S., Branco, A., Klautau, A., Vieira, R., de Lima, V.L.S. (eds.) PROPOR 2010. LNCS, vol. 6001, pp. 75–85. Springer, Heidelberg (2010)
van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1975)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lopes, L., Vieira, R. (2012). Improving Portuguese Term Extraction. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-28885-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)