Abstract
A German language model for the Xerox HMM tagger is presented. This model’s performance is compared with two other German taggers with partial parameter re-estimation and full adaption of parameters from pre-tagged corpora. The ambiguity types resolved by this model are analysed and compared to ambiguity types of English and French. Finally, the model’s error types are described. I argue that although the overall performance of these models for German is comparable to results for English and French, a more exact analysis demonstrates important differences in the types of disambiguation involved for German.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Briscoe, T., Grefenstette, G., Padró, G. and Serail, I. 1994. Hybrid techniques for training HMM part -of-speech taggers. Acquilex II working paper 45.
CELEX. 1993. The CELEX Lexical Database. Dutch, English, German. Max-PlanckInstitute for Psycholinguistics, Centre for Lexical Information, Nijmegen. CD-ROM.
Chanod, J. P. and Tapanainen, P. 1994. Statistical and constraint-based taggers for French. Technical Report MLTT - 016, Rank Xerox Research Centre, Grenoble Laboratory, Grenoble.
Cutting, D., Kupiec, J., Pedersen, J. and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, pp. 133–140.
ECI. 1994. Multilingual Corpus 1. Association for Computational Linguistics, Europeau Corpus Intitiative. CD- ROM.
Elworthy, D. 1994. Does Baum-Welch re-estimation help taggers? In Proceedings of the Fourth Conference on Applied Natural Language Processing, Stuttgart, pp. 5358.
Feldweg, H. 1993. Stochastische Wortartendisambiguierung für das Deutsche: Untersuchungen mit dem robusten System LIKELY. Technical report, Universität. Tübingen, Seminar für Sprachwissenschaft.
Feldweg, H. 1996. Stochastische Wortartendisambiguierung des Deutschen. In Lexikon 6 Text, Tübingen. Max Niemeyer, pp. 241–254.
Kupiec, J. and Wilkens, M. 1994. The DDS tagger guide version 1.1. Xerox Palo Alto Research Center, unpublished manuscript.
Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics, 20 (2): 155–171.
Schmid, H. and Kempe, A. 1996. Tagging von Korpora mit HMM, Entscheidungsbäumen und neuronalen Netzen. In Feldweg and Hinrichs, editors, Lexikon t Text, Tiibingen, Max Niemeyer, pp. 231–244.
Thielen, C. and Sailer, M. 1994. Ein Tagset. fiirs Deutsche. Richtlinien für die manuelle Wortarten-Annotierung von Textkorpora. Seminar für Sprachwissenschaft, Universität Tübingen, unpublished Manuscript.
Thielen, C. and Schiller, A. 1996. Ein kleines und erweitertes Tagset fürs Deutsche. In Feldweg and Hinrichs, editors, Lexikon PI Text, Max Niemeyer, Tübingen, pp. 215–226.
Thielen, C. 1994. Ein Tagset für die Wortartenklassifizierung des Deutschen. In Trost, editor, KONVENS ‘84. Österreichische Gesellschaft für Artificial Intelligence, Wien.
Wothke, K., Weck-Ulna, I., Heinecke, J., Mertineit, O. and Pachunke, T. 1993. Statistically based automatic tagging of German text corpora with parts-of-speech -some experiments. Technical report, IBM Germany, Heidelberg Scientific Center, Heidelberg.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Feldweg, H. (1999). Implementation and Evaluation of a German HMM for POS Disambiguation. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds) Natural Language Processing Using Very Large Corpora. Text, Speech and Language Technology, vol 11. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2390-9_1
Download citation
DOI: https://doi.org/10.1007/978-94-017-2390-9_1
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5349-7
Online ISBN: 978-94-017-2390-9
eBook Packages: Springer Book Archive