Implementation and Evaluation of a German HMM for POS Disambiguation

Feldweg, H.

doi:10.1007/978-94-017-2390-9_1

H. Feldweg

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 11))

362 Accesses

Abstract

A German language model for the Xerox HMM tagger is presented. This model’s performance is compared with two other German taggers with partial parameter re-estimation and full adaption of parameters from pre-tagged corpora. The ambiguity types resolved by this model are analysed and compared to ambiguity types of English and French. Finally, the model’s error types are described. I argue that although the overall performance of these models for German is comparable to results for English and French, a more exact analysis demonstrates important differences in the types of disambiguation involved for German.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Briscoe, T., Grefenstette, G., Padró, G. and Serail, I. 1994. Hybrid techniques for training HMM part -of-speech taggers. Acquilex II working paper 45.
Google Scholar
CELEX. 1993. The CELEX Lexical Database. Dutch, English, German. Max-PlanckInstitute for Psycholinguistics, Centre for Lexical Information, Nijmegen. CD-ROM.
Google Scholar
Chanod, J. P. and Tapanainen, P. 1994. Statistical and constraint-based taggers for French. Technical Report MLTT - 016, Rank Xerox Research Centre, Grenoble Laboratory, Grenoble.
Google Scholar
Cutting, D., Kupiec, J., Pedersen, J. and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, pp. 133–140.
Google Scholar
ECI. 1994. Multilingual Corpus 1. Association for Computational Linguistics, Europeau Corpus Intitiative. CD- ROM.
Google Scholar
Elworthy, D. 1994. Does Baum-Welch re-estimation help taggers? In Proceedings of the Fourth Conference on Applied Natural Language Processing, Stuttgart, pp. 5358.
Google Scholar
Feldweg, H. 1993. Stochastische Wortartendisambiguierung für das Deutsche: Untersuchungen mit dem robusten System LIKELY. Technical report, Universität. Tübingen, Seminar für Sprachwissenschaft.
Google Scholar
Feldweg, H. 1996. Stochastische Wortartendisambiguierung des Deutschen. In Lexikon 6 Text, Tübingen. Max Niemeyer, pp. 241–254.
Google Scholar
Kupiec, J. and Wilkens, M. 1994. The DDS tagger guide version 1.1. Xerox Palo Alto Research Center, unpublished manuscript.
Google Scholar
Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics, 20 (2): 155–171.
Google Scholar
Schmid, H. and Kempe, A. 1996. Tagging von Korpora mit HMM, Entscheidungsbäumen und neuronalen Netzen. In Feldweg and Hinrichs, editors, Lexikon t Text, Tiibingen, Max Niemeyer, pp. 231–244.
Google Scholar
Thielen, C. and Sailer, M. 1994. Ein Tagset. fiirs Deutsche. Richtlinien für die manuelle Wortarten-Annotierung von Textkorpora. Seminar für Sprachwissenschaft, Universität Tübingen, unpublished Manuscript.
Google Scholar
Thielen, C. and Schiller, A. 1996. Ein kleines und erweitertes Tagset fürs Deutsche. In Feldweg and Hinrichs, editors, Lexikon PI Text, Max Niemeyer, Tübingen, pp. 215–226.
Google Scholar
Thielen, C. 1994. Ein Tagset für die Wortartenklassifizierung des Deutschen. In Trost, editor, KONVENS ‘84. Österreichische Gesellschaft für Artificial Intelligence, Wien.
Google Scholar
Wothke, K., Weck-Ulna, I., Heinecke, J., Mertineit, O. and Pachunke, T. 1993. Statistically based automatic tagging of German text corpora with parts-of-speech -some experiments. Technical report, IBM Germany, Heidelberg Scientific Center, Heidelberg.
Google Scholar

Download references

Authors

H. Feldweg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ISSCO, University of Geneva, Switzerland
Susan Armstrong & Sandra Manzi &
AT & T Labs-Research, USA
Kenneth Church
Xerox Research Centre Europe, France
Pierre Isabelle
Bell Laboratories, Lucent, USA
Evelyne Tzoukermann
Johns Hopkins University, Baltimore, Maryland, USA
David Yarowsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Feldweg, H. (1999). Implementation and Evaluation of a German HMM for POS Disambiguation. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2390-9_1

Download citation

DOI: https://doi.org/10.1007/978-94-017-2390-9_1
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5349-7
Online ISBN: 978-94-017-2390-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics