Semi-supervised Learning for Portuguese Noun Phrase Extraction

Milidiú, Ruy; Santos, Cicero; Duarte, Julio; Rentería, Raúl

doi:10.1007/11751984_21

Semi-supervised Learning for Portuguese Noun Phrase Extraction

Ruy Milidiú²⁴,
Cicero Santos²⁴,
Julio Duarte²⁵ &
…
Raúl Rentería²⁴

Conference paper

434 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3960))

Abstract

Semi-supervised learning is frequently used when we have a small labeled training set but a large set of unlabeled samples. In this paper, we combine Hidden Markov Models and Transformation Based Learning in a semi-supervised learning approach. Self-training and Co-training are the two semi-supervised techniques that we apply to our scheme in order to classify Portuguese noun phrases. Our main goal here is to show that we can achieve effective noun phrase extraction using fewer tagged examples by applying a semi-supervised technique. Our models show good improvement with a small labeled corpus and little with a large one.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Miorelli, S.T.: Extra¸cão do sintagma nominal em senten¸cas em português. Master’s thesis, Pontifícia Universidade Católica, Porto Alegre - RS (2001)
Google Scholar
Santos, C.N.: Aprendizado de máquina na identifica¸cão de sintagmas nominais: o caso do português brasileiro. Master’s thesis, IME, Rio de Janeiro - RJ (2005)
Google Scholar
Pierce, D., Cardie, C.: Limitations of co-training for natural language learning from large datasets. In: Proceedings of the EMNLP (2001)
Google Scholar
Freitas, M.C., Garrão, M., Oliveira, C., Santos, C.N., Silveira, M.: A anota¸cão de um corpus para o aprendizado supervisionado de um modelo de sn. In: Proceedings of the III TIL / XXV Congresso da SBC, São Leopoldo - RS (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Informática, Pontifícia Universidade Católica, Rio de Janeiro, Brazil
Ruy Milidiú, Cicero Santos & Raúl Rentería
Centro Tecnológico do Exército, Rio de Janeiro, Brazil
Julio Duarte

Authors

Ruy Milidiú
View author publications
You can also search for this author in PubMed Google Scholar
Cicero Santos
View author publications
You can also search for this author in PubMed Google Scholar
Julio Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Rentería
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pontifícia Universidade do Rio Grande do Sul, Porto Alegre, Brasil
Renata Vieira
Departamento de Informática, Universidade de Évora, Portugal
Paulo Quaresma
NILC-ICMC, University of São Paulo, CP 668P, 13560-970, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes
L2F/INESC-ID Lisboa, Email: qa-clef@l2f.inesc-id.pt, Rua Alves Redol, 9, 1000-029, Lisboa, Portugal
Nuno J. Mamede
Instituto Militar de Engenharia, Praça General Tibúrcio, 80, Rio de Janeiro, Brazil
Cláudia Oliveira
Pontifícia Universidade Católica do Rio de Janeiro, Rua Marquês de São Vicente, 225, Rio de Janeiro, Brazil
Maria Carmelita Dias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Milidiú, R., Santos, C., Duarte, J., Rentería, R. (2006). Semi-supervised Learning for Portuguese Noun Phrase Extraction. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds) Computational Processing of the Portuguese Language. PROPOR 2006. Lecture Notes in Computer Science(), vol 3960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751984_21

Download citation

DOI: https://doi.org/10.1007/11751984_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34045-4
Online ISBN: 978-3-540-34046-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics