Skip to main content

Implementation and Evaluation of a German HMM for POS Disambiguation

  • Chapter
Natural Language Processing Using Very Large Corpora

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 11))

  • 362 Accesses

Abstract

A German language model for the Xerox HMM tagger is presented. This model’s performance is compared with two other German taggers with partial parameter re-estimation and full adaption of parameters from pre-tagged corpora. The ambiguity types resolved by this model are analysed and compared to ambiguity types of English and French. Finally, the model’s error types are described. I argue that although the overall performance of these models for German is comparable to results for English and French, a more exact analysis demonstrates important differences in the types of disambiguation involved for German.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Briscoe, T., Grefenstette, G., Padró, G. and Serail, I. 1994. Hybrid techniques for training HMM part -of-speech taggers. Acquilex II working paper 45.

    Google Scholar 

  • CELEX. 1993. The CELEX Lexical Database. Dutch, English, German. Max-PlanckInstitute for Psycholinguistics, Centre for Lexical Information, Nijmegen. CD-ROM.

    Google Scholar 

  • Chanod, J. P. and Tapanainen, P. 1994. Statistical and constraint-based taggers for French. Technical Report MLTT - 016, Rank Xerox Research Centre, Grenoble Laboratory, Grenoble.

    Google Scholar 

  • Cutting, D., Kupiec, J., Pedersen, J. and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, pp. 133–140.

    Google Scholar 

  • ECI. 1994. Multilingual Corpus 1. Association for Computational Linguistics, Europeau Corpus Intitiative. CD- ROM.

    Google Scholar 

  • Elworthy, D. 1994. Does Baum-Welch re-estimation help taggers? In Proceedings of the Fourth Conference on Applied Natural Language Processing, Stuttgart, pp. 5358.

    Google Scholar 

  • Feldweg, H. 1993. Stochastische Wortartendisambiguierung für das Deutsche: Untersuchungen mit dem robusten System LIKELY. Technical report, Universität. Tübingen, Seminar für Sprachwissenschaft.

    Google Scholar 

  • Feldweg, H. 1996. Stochastische Wortartendisambiguierung des Deutschen. In Lexikon 6 Text, Tübingen. Max Niemeyer, pp. 241–254.

    Google Scholar 

  • Kupiec, J. and Wilkens, M. 1994. The DDS tagger guide version 1.1. Xerox Palo Alto Research Center, unpublished manuscript.

    Google Scholar 

  • Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics, 20 (2): 155–171.

    Google Scholar 

  • Schmid, H. and Kempe, A. 1996. Tagging von Korpora mit HMM, Entscheidungsbäumen und neuronalen Netzen. In Feldweg and Hinrichs, editors, Lexikon t Text, Tiibingen, Max Niemeyer, pp. 231–244.

    Google Scholar 

  • Thielen, C. and Sailer, M. 1994. Ein Tagset. fiirs Deutsche. Richtlinien für die manuelle Wortarten-Annotierung von Textkorpora. Seminar für Sprachwissenschaft, Universität Tübingen, unpublished Manuscript.

    Google Scholar 

  • Thielen, C. and Schiller, A. 1996. Ein kleines und erweitertes Tagset fürs Deutsche. In Feldweg and Hinrichs, editors, Lexikon PI Text, Max Niemeyer, Tübingen, pp. 215–226.

    Google Scholar 

  • Thielen, C. 1994. Ein Tagset für die Wortartenklassifizierung des Deutschen. In Trost, editor, KONVENS ‘84. Österreichische Gesellschaft für Artificial Intelligence, Wien.

    Google Scholar 

  • Wothke, K., Weck-Ulna, I., Heinecke, J., Mertineit, O. and Pachunke, T. 1993. Statistically based automatic tagging of German text corpora with parts-of-speech -some experiments. Technical report, IBM Germany, Heidelberg Scientific Center, Heidelberg.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Feldweg, H. (1999). Implementation and Evaluation of a German HMM for POS Disambiguation. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2390-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2390-9_1

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5349-7

  • Online ISBN: 978-94-017-2390-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics