Advertisement

Implementation and Evaluation of a German HMM for POS Disambiguation

  • H. Feldweg
Part of the Text, Speech and Language Technology book series (TLTB, volume 11)

Abstract

A German language model for the Xerox HMM tagger is presented. This model’s performance is compared with two other German taggers with partial parameter re-estimation and full adaption of parameters from pre-tagged corpora. The ambiguity types resolved by this model are analysed and compared to ambiguity types of English and French. Finally, the model’s error types are described. I argue that although the overall performance of these models for German is comparable to results for English and French, a more exact analysis demonstrates important differences in the types of disambiguation involved for German.

Keywords

Hide Markov Model Word Sense Proper Noun Reference Text Training Iteration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Briscoe, T., Grefenstette, G., Padró, G. and Serail, I. 1994. Hybrid techniques for training HMM part -of-speech taggers. Acquilex II working paper 45.Google Scholar
  2. CELEX. 1993. The CELEX Lexical Database. Dutch, English, German. Max-PlanckInstitute for Psycholinguistics, Centre for Lexical Information, Nijmegen. CD-ROM.Google Scholar
  3. Chanod, J. P. and Tapanainen, P. 1994. Statistical and constraint-based taggers for French. Technical Report MLTT - 016, Rank Xerox Research Centre, Grenoble Laboratory, Grenoble.Google Scholar
  4. Cutting, D., Kupiec, J., Pedersen, J. and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, pp. 133–140.Google Scholar
  5. ECI. 1994. Multilingual Corpus 1. Association for Computational Linguistics, Europeau Corpus Intitiative. CD- ROM.Google Scholar
  6. Elworthy, D. 1994. Does Baum-Welch re-estimation help taggers? In Proceedings of the Fourth Conference on Applied Natural Language Processing, Stuttgart, pp. 5358.Google Scholar
  7. Feldweg, H. 1993. Stochastische Wortartendisambiguierung für das Deutsche: Untersuchungen mit dem robusten System LIKELY. Technical report, Universität. Tübingen, Seminar für Sprachwissenschaft.Google Scholar
  8. Feldweg, H. 1996. Stochastische Wortartendisambiguierung des Deutschen. In Lexikon 6 Text, Tübingen. Max Niemeyer, pp. 241–254.Google Scholar
  9. Kupiec, J. and Wilkens, M. 1994. The DDS tagger guide version 1.1. Xerox Palo Alto Research Center, unpublished manuscript.Google Scholar
  10. Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics, 20 (2): 155–171.Google Scholar
  11. Schmid, H. and Kempe, A. 1996. Tagging von Korpora mit HMM, Entscheidungsbäumen und neuronalen Netzen. In Feldweg and Hinrichs, editors, Lexikon t Text, Tiibingen, Max Niemeyer, pp. 231–244.Google Scholar
  12. Thielen, C. and Sailer, M. 1994. Ein Tagset. fiirs Deutsche. Richtlinien für die manuelle Wortarten-Annotierung von Textkorpora. Seminar für Sprachwissenschaft, Universität Tübingen, unpublished Manuscript.Google Scholar
  13. Thielen, C. and Schiller, A. 1996. Ein kleines und erweitertes Tagset fürs Deutsche. In Feldweg and Hinrichs, editors, Lexikon PI Text, Max Niemeyer, Tübingen, pp. 215–226.Google Scholar
  14. Thielen, C. 1994. Ein Tagset für die Wortartenklassifizierung des Deutschen. In Trost, editor, KONVENS ‘84. Österreichische Gesellschaft für Artificial Intelligence, Wien.Google Scholar
  15. Wothke, K., Weck-Ulna, I., Heinecke, J., Mertineit, O. and Pachunke, T. 1993. Statistically based automatic tagging of German text corpora with parts-of-speech -some experiments. Technical report, IBM Germany, Heidelberg Scientific Center, Heidelberg.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1999

Authors and Affiliations

  • H. Feldweg

There are no affiliations available

Personalised recommendations