Unsupervised Learning of Disambiguation Rules for Part-of-Speech Tagging

  • E. Brill
  • M. Pop
Part of the Text, Speech and Language Technology book series (TLTB, volume 11)


In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate tagger using only a small amount of manually tagged text1.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Baum, L. 1972. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3: 1–8.Google Scholar
  2. Black, E., Jelinek, F., Lafferty, J., Mercer, R. and Roukos, S. 1992. Decision tree models applied to the labeling of text with parts-of-speech. In Darpa Workshop on Speech and Natural Language Harriman, N.Y.Google Scholar
  3. Brill, E. and Resnik, P. 1994. A transformation-based approach to prepositional phrase attachment disambiguation. In Proceedings of the Fifteenth International Conference on Computational Linguistics (COLING-1994),Kyoto, Japan.Google Scholar
  4. Brill, E. 1993. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 31st Meeting of the Association of Computational Linguistics, Columbus, OH, pp. 259–265.Google Scholar
  5. Brill, E. 1995. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21 (4): 543–565.Google Scholar
  6. Charniak, E., Hendrickson, C.,.Jacobson, N. and Perkowitz, M. 1993. Equations for part. of speech tagging. In Proceedings of the Conference of the American A.s.sociation for Artificial Intelligence (AA AI-93) Google Scholar
  7. Church, K. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, ACL, pp. 136–143.Google Scholar
  8. Cutting, D., Kupiec, J., Pedersen, J. and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language. Processing, ACL, Trento, Italy, pp. 133–140.Google Scholar
  9. DeMarcken, C. 1990. Parsing the lob corpus. In Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pp. 243–251.Google Scholar
  10. DeR.ose, S. 1988. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14 (1): 31–39.Google Scholar
  11. Elworthy, D. 1994. Does Baum-Welch re-estimation help taggers. In Proceedings of the Fourth Conference on Applied Natural Language Processing, ACL. Stuttgart, Germany, pp. 53–58.Google Scholar
  12. Francis, W. and Kucera, H. 1982. Frequency analysis of English usage: Lexicon, and grammar. Houghton Mifflin, Boston.Google Scholar
  13. Green, B. and Rubin, G. 1971. Automated grammatical tagging of english. Technical report, Department of Linguistics, Brown University.Google Scholar
  14. Harris, Z. 1962. String Analysis of Language Structure. Mouton and Co., The Hague.Google Scholar
  15. Hindle, D. 1989. Acquiring disambiguation rules from text. In Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, pp. 118 125.Google Scholar
  16. Huang, C’., Son-Bell, M. and Baggett, D. 1994. Generation of pronunciations from orthographies using transformation-based error-driven learning. In International Conference on Speech and Language Processing (ICSLP) Yokohama, Japan.Google Scholar
  17. Ielinek, F. 1985. Self-Organized Language Modelling for Speech Recognition. Nijhoff, Dordrecht. In J. Skwirzinski (ed). Impact of Processing Techniques on Communication Google Scholar
  18. Klein, S. and Simmons, R. 1963. A computational approach to grammatical coding of English words. JA CM, 10.Google Scholar
  19. Kupiec, J. 1992. Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language, 6.Google Scholar
  20. Lin, Y., Chiang, T. and Su, K. 1994. Automatic model refinement with an application to tagging. In Proceedings of the 15th International Conference on Computational Linguistics Google Scholar
  21. Marcus, M., Santorini, B. and Marcinkiewicz, M. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19 (2): 313–330.Google Scholar
  22. Merialdo, B. 1994. Tagging english text with a probabilistic model. Computational Linguistics, 20 (2): 155–171.Google Scholar
  23. R.amshaw, L. and Marcus, M. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In The Balancing Act: Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language, New Mexico State University, pp. 86–95.Google Scholar
  24. Roche, E. and Schabes, Y. 1995. Deterministic part of speech tagging with finite state transducers. Computational Linguistics, 21 (2): 227–253.Google Scholar
  25. Schutze, H. and Singer, Y. 1994. Part of speech tagging using a variable memory Markov model. In Proceedings of the Association for Computational Linguistics, Las Cruces, NM, pp. 181–187.Google Scholar
  26. Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L. and Palmucci, J. 1993. Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics, 19 (2): 359–382.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1999

Authors and Affiliations

  • E. Brill
  • M. Pop

There are no affiliations available

Personalised recommendations