Skip to main content

Part-of-Speech Tagging

  • Chapter
  • First Online:
  • 850 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

This chapter presents the application of ETL to language independent part-of-speech (POS) tagging. The POS tagging task consists in assigning a POS or another lexical class marker to each word in a text. We apply ETL and ETL Committee to four different corpora in three different languages: Portuguese, German and English. ETL system achieves state-of-the-art results for the four corpora. The ETL Committee strategy slightly improves the ETL accuracy for all corpora. This chapter is organized as follows. In Sect. 5.1, we describe the task and the selected corpora. In Sect. 5.2, we detail some modeling configurations used in our POS tagger system. In Sect. 5.3, we show some configurations used in the machine learning algorithms. Section 5.4 presents the application of ETL for the Mac-Morpho Corpus. In Sect. 5.5, we describe the application of ETL for the Tycho Brahe Corpus. Section 5.6 presents the application of ETL for the TIGER Corpus. In Sect. 5.7, we show the application of ETL for the Brown Corpus. Finally, Sect. 5.8 presents some concluding remarks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aluísio, S.M., Pelizzoni, J.M., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V.: An account of the challenge of tagging a reference corpus for brazilian portuguese. In: Proceedings of the Workshop on Computational Processing of Written and Spoken Portuguese, pp. 110–117 (2003)

    Google Scholar 

  2. Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol (2002)

    Google Scholar 

  3. Brants, T.: Tnt—a statistical part-of-speech tagger. In: Proceedings of the Applied Natural Language Processing Conference, pp. 224–231 (2000)

    Google Scholar 

  4. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)

    Google Scholar 

  5. dos Santos, C.N., Milidiú, R.L., Rentería, R.P.: Portuguese part-of-speech tagging using entropy guided transformation learning. In: Proceedings of 8th Workshop on Computational Processing of Written and Spoken Portuguese, pp. 143–152. Aveiro, Portugal (2008)

    Google Scholar 

  6. Francis, W.N., Kucera, H.: Frequency analysis of english usage. Lexicon and grammar. Houghton Mifflin, Boston (1982)

    Google Scholar 

  7. IEL-UNICAMP, IME-USP: Corpus anotado do português histórico tycho brahe. http://www.ime.usp.br/~tycho/corpus/. Accessed 23 Jan 2008

  8. Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice Hall, New Jersey (2000)

    Google Scholar 

  9. Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—ACL-08: HLT. Columbus, Ohio (2008)

    Google Scholar 

  10. Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of North Americal ACL, pp. 40–47 (2001)

    Google Scholar 

  11. Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proceedings of ANLP-97 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 The Author(s)

About this chapter

Cite this chapter

dos Santos, C.N., Milidiú, R.L. (2012). Part-of-Speech Tagging. In: Entropy Guided Transformation Learning: Algorithms and Applications. SpringerBriefs in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2978-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2978-3_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2977-6

  • Online ISBN: 978-1-4471-2978-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics