Skip to main content

Applications of Weighted Automata in Natural Language Processing

  • Chapter
  • First Online:
Book cover Handbook of Weighted Automata

Abstract

We explain why weighted automata are an attractive knowledge representation for natural language problems. We first trace the close historical ties between the two fields, then present two complex real-world problems, transliteration and translation. These problems are usefully decomposed into a pipeline of weighted transducers, and weights can be set to maximize the likelihood of a training corpus using standard algorithms. We additionally describe the representation of language models, critical data sources in natural language processing, as weighted automata. We outline the wide range of work in natural language processing that makes use of weighted string and tree automata and describe current work and challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Alshawi, S. Douglas, and S. Bangalore. Learning dependency translation models as collections of finite-state head transducers. Computational Linguistics, 26(1):45–60, 2000.

    Article  MathSciNet  Google Scholar 

  2. J.K. Baker. The DRAGON system—An overview. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23(1):24–29, 1975.

    Article  Google Scholar 

  3. T. Brants, A.C. Popat, P. Xu, F.J. Och, and J. Dean. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, June 2007, pages 858–867. Association for Computational Linguistics, Stroudsburg, 2007.

    Google Scholar 

  4. P.F. Brown, S.A.D. Pietra, V.J.D. Pietra, and R.L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–312, 1993.

    Google Scholar 

  5. E. Charniak. Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, July 2001, pages 116–123. Association for Computational Linguistics, Stroudsburg, 2001.

    Google Scholar 

  6. N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2(3):113–124, 1956.

    Article  Google Scholar 

  7. N. Chomsky. Syntactic Structures. Mouton, The Hague, 1957.

    Google Scholar 

  8. K.W. Church. A stochastic parts program and noun phrase parser for unrestricted text. In Second Conference on Applied Natural Language Processing Proceedings, Austin, TX, February 1988, pages 136–143. Association for Computational Linguistics, Stroudsburg, 1988.

    Chapter  Google Scholar 

  9. A. Clark. Memory-based learning of morphology with stochastic transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, July 2002, pages 513–520. Association for Computational Linguistics, Stroudsburg, 2002.

    Google Scholar 

  10. M. Collins. Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, Philadelphia, PA, 1999.

    Google Scholar 

  11. M. Dalrymple. Lexical Functional Grammar. Academic Press, New York, 2001.

    Google Scholar 

  12. A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.

    MATH  MathSciNet  Google Scholar 

  13. R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.

    MATH  Google Scholar 

  14. A. Echihabi and D. Marcu. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 16–23. Association for Computational Linguistics, Stroudsburg, 2003.

    Google Scholar 

  15. S. Eilenberg. Automata, Languages, and Machines. Academic Press, New York, 1974.

    MATH  Google Scholar 

  16. J. Eisner. Learning non-isomorphic tree mappings for machine translation. In The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 205–208. Association for Computational Linguistics, Stroudsburg, 2003.

    Google Scholar 

  17. D. Eppstein. Finding the k shortest paths. SIAM Journal on Computing, 28(2):652–673, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  18. R.A. Fisher. On the “probable error” of a coefficient of correlation deduced from a small sample. Metron. International Journal of Statistics, 1:3–32, 1921.

    Google Scholar 

  19. M. Galley, M. Hopkins, K. Knight, and D. Marcu. What’s in a translation rule? In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT–NAACL 2004, Boston, MA, May 2004, pages 273–280. Association for Computational Linguistics, Stroudsburg, 2004.

    Google Scholar 

  20. F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.

    MATH  Google Scholar 

  21. F. Gécseg and M. Steinby. Tree languages. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, Chapter 1, pages 1–68. Springer, Berlin, 1997.

    Google Scholar 

  22. D. Gildea. Loosely tree-based alignment for machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 80–87. Association for Computational Linguistics, Stroudsburg, 2003.

    Google Scholar 

  23. J. Graehl. Carmel finite-state toolkit. http://www.isi.edu/licensed-sw/carmel, 1997.

  24. J. Graehl and K. Knight. Training tree transducers. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT–NAACL 2004, Boston, MA, May 2004, pages 105–112. Association for Computational Linguistics, Stroudsburg, 2004.

    Google Scholar 

  25. J. Graehl, K. Knight, and J. May. Training tree transducers. Computational Linguistics, 34(3):391–427, 2008.

    Article  MathSciNet  Google Scholar 

  26. L. Huang and D. Chiang. Better k-best parsing. In Proceedings of the Ninth International Workshop on Parsing Technology, Vancouver, Canada, October 2005, pages 53–64. Association for Computational Linguistics, Stroudsburg, 2005.

    Google Scholar 

  27. E.T. Jaynes. Information theory and statistical mechanics. Physical Review (Series II), 106(4):620–630, 1957.

    MathSciNet  Google Scholar 

  28. F. Jelinek. Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4):532–556, 1976.

    Article  Google Scholar 

  29. F. Jelinek, L.R. Bahl, and R.L. Mercer. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory, IT-21(3):250–256, 1975.

    Article  Google Scholar 

  30. D. Jurafsky and J.H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition. Chapter 4: N-grams. Prentice Hall, Englewood Cliffs, 2009.

    Google Scholar 

  31. R. Kaplan and M. Kay. Regular models of phonological rule systems. Computational Linguistics, 20(3):331–378, 1994.

    Google Scholar 

  32. K. Knight and Y. Al-Onaizan. Translation with finite-state devices. In Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA’98, Langhorne, PA, October 1998, volume 1529 of Lecture Notes in Computer Science, pages 421–437. Springer, Berlin, 1998.

    Google Scholar 

  33. K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4):599–612, 1998.

    Google Scholar 

  34. K. Knight and J. Graehl. An overview of probabilistic tree transducers for natural language processing. In Computational Linguistics and Intelligent Text Processing 6th International Conference, CICLing 2005, Mexico City, Mexico, February 2005, volume 3406 of Lecture Notes in Computer Science, pages 1–24. Springer, Berlin, 2005.

    Google Scholar 

  35. K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach. Artificial Intelligence, 139(1):91–107, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  36. O. Kolak, W. Byrne, and P. Resnik. A generative probabilistic OCR model for NLP applications. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 55–62. Association for Computational Linguistics, Stroudsburg, 2003.

    Chapter  Google Scholar 

  37. S. Kumar and W. Byrne. A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 63–70. Association for Computational Linguistics, Stroudsburg, 2003.

    Chapter  Google Scholar 

  38. A. Maletti. Minimizing deterministic weighted tree automata. In Proceedings of the 2nd International Conference on Language and Automata Theory and Applications, pages 371–382. Universitat Rovira I Virgili, Tarragona, 2008.

    Google Scholar 

  39. A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM Journal on Computing, 39(2):410–430, 2009.

    Article  Google Scholar 

  40. A.A. Markov. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (Example of a statistical investigation of the text of “Eugene Onegin” illustrating the dependence between samples in chain). Izvistia Imperatorskoi Akademii Nauk (Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg), 7:153–162, 1913. English translation by Morris Halle, 1956.

    Google Scholar 

  41. J. May and K. Knight. A better n-best list: Practical determinization of weighted finite tree automata. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, New York, NY, June 2006, pages 351–358. Association for Computational Linguistics, Stroudsburg, 2006.

    Google Scholar 

  42. J. May and K. Knight. Tiburon: A weighted tree automata toolkit. In O.H. Ibarra and H.-C. Yen, editors, Proceedings of the 11th International Conference of Implementation and Application of Automata, CIAA 2006, Taipei, Taiwan, August 2006. volume 4094 of Lecture Notes in Computer Science, pages 102–113. Springer, Berlin, 2006.

    Google Scholar 

  43. I.D. Melamed. Multitext grammars and synchronous parsers. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 79–86. Association for Computational Linguistics, Stroudsburg, 2003.

    Chapter  Google Scholar 

  44. G.A. Miller and N. Chomsky. Finitary models of language users. In R.D. Luce, R.R. Bush, and E. Galanter, editors, Handbook of Mathematical Psychology, volume II, pages 419–491. Wiley, New York, 1963.

    Google Scholar 

  45. M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269–312, 1997.

    MathSciNet  Google Scholar 

  46. M. Mohri, F.C.N. Pereira, and M.D. Riley. AT&T FSM library. http://www.research.att.com/~fsmtools/fsm, 1998. AT&T Labs—Research.

  47. F. Och, C. Tillmann, and H. Ney. Improved alignment models for statistical machine translation. In Proceedings of the 1999 Joint SIGDAT Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, June 1999, pages 20–28. Association for Computational Linguistics, Stroudsburg, 1999.

    Google Scholar 

  48. B. Pang, K. Knight, and D. Marcu. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 102–109. Association for Computational Linguistics, Stroudsburg, 2003.

    Chapter  Google Scholar 

  49. F. Pereira, M. Riley, and R. Sproat. Weighted rational transductions and their application to human language processing. In Human Language Technology, Plainsboro, NJ, March 1994, pages 262–267. Morgan Kaufmann, San Mateo, 1994.

    Google Scholar 

  50. M. Riley, F. Pereira, and E. Chun. Lazy transducer composition: A flexible method for on-the-fly expansion of context-dependent grammar network. In Proceedings, IEEE Automatic Speech Recognition Workshop, Snowbird, UT, December 1995, pages 139–140.

    Google Scholar 

  51. W.C. Rounds. Mappings and grammars on trees. Theory of Computing Systems, 4:257–287, 1970.

    MATH  MathSciNet  Google Scholar 

  52. I.A. Sag, T. Wasow, and E.M. Bender. Syntactic Theory, 2nd edition. CSLI Publications, Stanford, 2003.

    Google Scholar 

  53. C. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 1948. 623–656

    MATH  MathSciNet  Google Scholar 

  54. S.M. Shieber. Synchronous grammars as tree transducers. In Proceedings of the Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+ 7), Vancouver, Canada, May 2004, pages 88–95.

    Google Scholar 

  55. S.M. Shieber. Unifying synchronous tree adjoining grammars and tree transducers via bimorphisms. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, April 2006, pages 377–384. Association for Computational Linguistics, Stroudsburg, 2006.

    Google Scholar 

  56. R. Sproat, W. Gales, C. Shih, and N. Chang. A stochastic finite-state word-segmentation algorithm for Chinese. Computational Linguistics, 22(3):377–404, 1996.

    Google Scholar 

  57. J.W. Thatcher. Generalized2 sequential machine maps. Journal of Computer and System Sciences, 4(4):339–367, 1970.

    MATH  MathSciNet  Google Scholar 

  58. J.W. Thatcher. Tree automata: An informal survey. In A.V. Aho, editor, Currents in the Theory of Computing, pages 143–172. Prentice Hall, Englewood Cliffs, 1973.

    Google Scholar 

  59. W.A. Woods. Transition network grammars for natural language analysis. Communications of the Association for Computing Machinery, 13(10):591–606, 1970.

    MATH  Google Scholar 

  60. D. Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–404, 1997.

    Google Scholar 

  61. K. Yamada and K. Knight. A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, July 2001, pages 523–530. Association for Computational Linguistics, Stroudsburg, 2001.

    Google Scholar 

  62. D. Zajic, B. Dorr, and R. Schwartz. Automatic headline generation for newspaper stories. In Proceedings of the ACL-02 Workshop on Text Summarization (DUC 2002), Philadelphia, PA, July 2002, pages 78–85. Association for Computational Linguistics, Stroudsburg, 2002.

    Google Scholar 

  63. B. Zhou, S.F. Chen, and Y. Gao. Folsom: A fast and memory-efficient phrase-based approach to statistical machine translation. In Proceedings of the IEEE/ACL 2006 Workshop on Spoken Language Technology, Palm Beach, Aruba, December 2006, pages 226–229, IEEE Press, New York, 2006.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin Knight .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Knight, K., May, J. (2009). Applications of Weighted Automata in Natural Language Processing. In: Droste, M., Kuich, W., Vogler, H. (eds) Handbook of Weighted Automata. Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01492-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01492-5_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01491-8

  • Online ISBN: 978-3-642-01492-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics