Abstract
This chapter presents the application of ETL to language independent part-of-speech (POS) tagging. The POS tagging task consists in assigning a POS or another lexical class marker to each word in a text. We apply ETL and ETL Committee to four different corpora in three different languages: Portuguese, German and English. ETL system achieves state-of-the-art results for the four corpora. The ETL Committee strategy slightly improves the ETL accuracy for all corpora. This chapter is organized as follows. In Sect. 5.1, we describe the task and the selected corpora. In Sect. 5.2, we detail some modeling configurations used in our POS tagger system. In Sect. 5.3, we show some configurations used in the machine learning algorithms. Section 5.4 presents the application of ETL for the Mac-Morpho Corpus. In Sect. 5.5, we describe the application of ETL for the Tycho Brahe Corpus. Section 5.6 presents the application of ETL for the TIGER Corpus. In Sect. 5.7, we show the application of ETL for the Brown Corpus. Finally, Sect. 5.8 presents some concluding remarks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aluísio, S.M., Pelizzoni, J.M., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V.: An account of the challenge of tagging a reference corpus for brazilian portuguese. In: Proceedings of the Workshop on Computational Processing of Written and Spoken Portuguese, pp. 110–117 (2003)
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol (2002)
Brants, T.: Tnt—a statistical part-of-speech tagger. In: Proceedings of the Applied Natural Language Processing Conference, pp. 224–231 (2000)
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)
dos Santos, C.N., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Proceedings of 8th Workshop on Computational Processing of Written and Spoken Portuguese, pp. 143–152. Aveiro, Portugal (2008)
Francis, W.N., Kucera, H.: Frequency analysis of english usage. Lexicon and grammar. Houghton Mifflin, Boston (1982)
IEL-UNICAMP, IME-USP: Corpus anotado do português histórico tycho brahe. http://www.ime.usp.br/~tycho/corpus/. Accessed 23 Jan 2008
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, New Jersey (2000)
Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—ACL-08: HLT. Columbus, Ohio (2008)
Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of North Americal ACL, pp. 40–47 (2001)
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proceedings of ANLP-97 (1997)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2012 The Author(s)
About this chapter
Cite this chapter
dos Santos, C.N., Milidiú, R.L. (2012). Part-of-Speech Tagging. In: Entropy Guided Transformation Learning: Algorithms and Applications. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2978-3_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2978-3_5
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2977-6
Online ISBN: 978-1-4471-2978-3
eBook Packages: Computer ScienceComputer Science (R0)