Applications of Weighted Automata in Natural Language Processing

Knight, Kevin; May, Jonathan

doi:10.1007/978-3-642-01492-5_14

Kevin Knight⁴ &
Jonathan May⁴

Part of the book series: Monographs in Theoretical Computer Science. An EATCS Series ((EATCS))

1729 Accesses
20 Citations

Abstract

We explain why weighted automata are an attractive knowledge representation for natural language problems. We first trace the close historical ties between the two fields, then present two complex real-world problems, transliteration and translation. These problems are usefully decomposed into a pipeline of weighted transducers, and weights can be set to maximize the likelihood of a training corpus using standard algorithms. We additionally describe the representation of language models, critical data sources in natural language processing, as weighted automata. We outline the wide range of work in natural language processing that makes use of weighted string and tree automata and describe current work and challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Alshawi, S. Douglas, and S. Bangalore. Learning dependency translation models as collections of finite-state head transducers. Computational Linguistics, 26(1):45–60, 2000.
Article MathSciNet Google Scholar
J.K. Baker. The DRAGON system—An overview. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23(1):24–29, 1975.
Article Google Scholar
T. Brants, A.C. Popat, P. Xu, F.J. Och, and J. Dean. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, June 2007, pages 858–867. Association for Computational Linguistics, Stroudsburg, 2007.
Google Scholar
P.F. Brown, S.A.D. Pietra, V.J.D. Pietra, and R.L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–312, 1993.
Google Scholar
E. Charniak. Immediate-head parsing for language models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, July 2001, pages 116–123. Association for Computational Linguistics, Stroudsburg, 2001.
Google Scholar
N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2(3):113–124, 1956.
Article Google Scholar
N. Chomsky. Syntactic Structures. Mouton, The Hague, 1957.
Google Scholar
K.W. Church. A stochastic parts program and noun phrase parser for unrestricted text. In Second Conference on Applied Natural Language Processing Proceedings, Austin, TX, February 1988, pages 136–143. Association for Computational Linguistics, Stroudsburg, 1988.
Chapter Google Scholar
A. Clark. Memory-based learning of morphology with stochastic transducers. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, July 2002, pages 513–520. Association for Computational Linguistics, Stroudsburg, 2002.
Google Scholar
M. Collins. Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania, Philadelphia, PA, 1999.
Google Scholar
M. Dalrymple. Lexical Functional Grammar. Academic Press, New York, 2001.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.
MATH MathSciNet Google Scholar
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
MATH Google Scholar
A. Echihabi and D. Marcu. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 16–23. Association for Computational Linguistics, Stroudsburg, 2003.
Google Scholar
S. Eilenberg. Automata, Languages, and Machines. Academic Press, New York, 1974.
MATH Google Scholar
J. Eisner. Learning non-isomorphic tree mappings for machine translation. In The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 205–208. Association for Computational Linguistics, Stroudsburg, 2003.
Google Scholar
D. Eppstein. Finding the k shortest paths. SIAM Journal on Computing, 28(2):652–673, 1998.
Article MATH MathSciNet Google Scholar
R.A. Fisher. On the “probable error” of a coefficient of correlation deduced from a small sample. Metron. International Journal of Statistics, 1:3–32, 1921.
Google Scholar
M. Galley, M. Hopkins, K. Knight, and D. Marcu. What’s in a translation rule? In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT–NAACL 2004, Boston, MA, May 2004, pages 273–280. Association for Computational Linguistics, Stroudsburg, 2004.
Google Scholar
F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.
MATH Google Scholar
F. Gécseg and M. Steinby. Tree languages. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, Chapter 1, pages 1–68. Springer, Berlin, 1997.
Google Scholar
D. Gildea. Loosely tree-based alignment for machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003, pages 80–87. Association for Computational Linguistics, Stroudsburg, 2003.
Google Scholar
J. Graehl. Carmel finite-state toolkit. http://www.isi.edu/licensed-sw/carmel, 1997.
J. Graehl and K. Knight. Training tree transducers. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT–NAACL 2004, Boston, MA, May 2004, pages 105–112. Association for Computational Linguistics, Stroudsburg, 2004.
Google Scholar
J. Graehl, K. Knight, and J. May. Training tree transducers. Computational Linguistics, 34(3):391–427, 2008.
Article MathSciNet Google Scholar
L. Huang and D. Chiang. Better k-best parsing. In Proceedings of the Ninth International Workshop on Parsing Technology, Vancouver, Canada, October 2005, pages 53–64. Association for Computational Linguistics, Stroudsburg, 2005.
Google Scholar
E.T. Jaynes. Information theory and statistical mechanics. Physical Review (Series II), 106(4):620–630, 1957.
MathSciNet Google Scholar
F. Jelinek. Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64(4):532–556, 1976.
Article Google Scholar
F. Jelinek, L.R. Bahl, and R.L. Mercer. Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Transactions on Information Theory, IT-21(3):250–256, 1975.
Article Google Scholar
D. Jurafsky and J.H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition. Chapter 4: N-grams. Prentice Hall, Englewood Cliffs, 2009.
Google Scholar
R. Kaplan and M. Kay. Regular models of phonological rule systems. Computational Linguistics, 20(3):331–378, 1994.
Google Scholar
K. Knight and Y. Al-Onaizan. Translation with finite-state devices. In Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas, AMTA’98, Langhorne, PA, October 1998, volume 1529 of Lecture Notes in Computer Science, pages 421–437. Springer, Berlin, 1998.
Google Scholar
K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4):599–612, 1998.
Google Scholar
K. Knight and J. Graehl. An overview of probabilistic tree transducers for natural language processing. In Computational Linguistics and Intelligent Text Processing 6th International Conference, CICLing 2005, Mexico City, Mexico, February 2005, volume 3406 of Lecture Notes in Computer Science, pages 1–24. Springer, Berlin, 2005.
Google Scholar
K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach. Artificial Intelligence, 139(1):91–107, 2002.
Article MATH MathSciNet Google Scholar
O. Kolak, W. Byrne, and P. Resnik. A generative probabilistic OCR model for NLP applications. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 55–62. Association for Computational Linguistics, Stroudsburg, 2003.
Chapter Google Scholar
S. Kumar and W. Byrne. A weighted finite state transducer implementation of the alignment template model for statistical machine translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 63–70. Association for Computational Linguistics, Stroudsburg, 2003.
Chapter Google Scholar
A. Maletti. Minimizing deterministic weighted tree automata. In Proceedings of the 2nd International Conference on Language and Automata Theory and Applications, pages 371–382. Universitat Rovira I Virgili, Tarragona, 2008.
Google Scholar
A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM Journal on Computing, 39(2):410–430, 2009.
Article Google Scholar
A.A. Markov. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (Example of a statistical investigation of the text of “Eugene Onegin” illustrating the dependence between samples in chain). Izvistia Imperatorskoi Akademii Nauk (Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg), 7:153–162, 1913. English translation by Morris Halle, 1956.
Google Scholar
J. May and K. Knight. A better n-best list: Practical determinization of weighted finite tree automata. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, New York, NY, June 2006, pages 351–358. Association for Computational Linguistics, Stroudsburg, 2006.
Google Scholar
J. May and K. Knight. Tiburon: A weighted tree automata toolkit. In O.H. Ibarra and H.-C. Yen, editors, Proceedings of the 11th International Conference of Implementation and Application of Automata, CIAA 2006, Taipei, Taiwan, August 2006. volume 4094 of Lecture Notes in Computer Science, pages 102–113. Springer, Berlin, 2006.
Google Scholar
I.D. Melamed. Multitext grammars and synchronous parsers. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 79–86. Association for Computational Linguistics, Stroudsburg, 2003.
Chapter Google Scholar
G.A. Miller and N. Chomsky. Finitary models of language users. In R.D. Luce, R.R. Bush, and E. Galanter, editors, Handbook of Mathematical Psychology, volume II, pages 419–491. Wiley, New York, 1963.
Google Scholar
M. Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269–312, 1997.
MathSciNet Google Scholar
M. Mohri, F.C.N. Pereira, and M.D. Riley. AT&T FSM library. http://www.research.att.com/~fsmtools/fsm, 1998. AT&T Labs—Research.
F. Och, C. Tillmann, and H. Ney. Improved alignment models for statistical machine translation. In Proceedings of the 1999 Joint SIGDAT Conference of Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, June 1999, pages 20–28. Association for Computational Linguistics, Stroudsburg, 1999.
Google Scholar
B. Pang, K. Knight, and D. Marcu. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, May–June 2003, pages 102–109. Association for Computational Linguistics, Stroudsburg, 2003.
Chapter Google Scholar
F. Pereira, M. Riley, and R. Sproat. Weighted rational transductions and their application to human language processing. In Human Language Technology, Plainsboro, NJ, March 1994, pages 262–267. Morgan Kaufmann, San Mateo, 1994.
Google Scholar
M. Riley, F. Pereira, and E. Chun. Lazy transducer composition: A flexible method for on-the-fly expansion of context-dependent grammar network. In Proceedings, IEEE Automatic Speech Recognition Workshop, Snowbird, UT, December 1995, pages 139–140.
Google Scholar
W.C. Rounds. Mappings and grammars on trees. Theory of Computing Systems, 4:257–287, 1970.
MATH MathSciNet Google Scholar
I.A. Sag, T. Wasow, and E.M. Bender. Syntactic Theory, 2nd edition. CSLI Publications, Stanford, 2003.
Google Scholar
C. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 1948. 623–656
MATH MathSciNet Google Scholar
S.M. Shieber. Synchronous grammars as tree transducers. In Proceedings of the Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+ 7), Vancouver, Canada, May 2004, pages 88–95.
Google Scholar
S.M. Shieber. Unifying synchronous tree adjoining grammars and tree transducers via bimorphisms. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, April 2006, pages 377–384. Association for Computational Linguistics, Stroudsburg, 2006.
Google Scholar
R. Sproat, W. Gales, C. Shih, and N. Chang. A stochastic finite-state word-segmentation algorithm for Chinese. Computational Linguistics, 22(3):377–404, 1996.
Google Scholar
J.W. Thatcher. Generalized² sequential machine maps. Journal of Computer and System Sciences, 4(4):339–367, 1970.
MATH MathSciNet Google Scholar
J.W. Thatcher. Tree automata: An informal survey. In A.V. Aho, editor, Currents in the Theory of Computing, pages 143–172. Prentice Hall, Englewood Cliffs, 1973.
Google Scholar
W.A. Woods. Transition network grammars for natural language analysis. Communications of the Association for Computing Machinery, 13(10):591–606, 1970.
MATH Google Scholar
D. Wu. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–404, 1997.
Google Scholar
K. Yamada and K. Knight. A syntax-based statistical translation model. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, July 2001, pages 523–530. Association for Computational Linguistics, Stroudsburg, 2001.
Google Scholar
D. Zajic, B. Dorr, and R. Schwartz. Automatic headline generation for newspaper stories. In Proceedings of the ACL-02 Workshop on Text Summarization (DUC 2002), Philadelphia, PA, July 2002, pages 78–85. Association for Computational Linguistics, Stroudsburg, 2002.
Google Scholar
B. Zhou, S.F. Chen, and Y. Gao. Folsom: A fast and memory-efficient phrase-based approach to statistical machine translation. In Proceedings of the IEEE/ACL 2006 Workshop on Spoken Language Technology, Palm Beach, Aruba, December 2006, pages 226–229, IEEE Press, New York, 2006.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

USC Information Sciences Institute, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA, 90292, USA
Kevin Knight & Jonathan May

Authors

Kevin Knight
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan May
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin Knight .

Editor information

Editors and Affiliations

Inst. Informatik, Universität Leipzig, Augustusplatz 10-11, Leipzig, 04109, Germany
Manfred Droste
Institut für Diskrete, TU Wien, Wiedner Hauptstr. 8-10, Wien, 1040, Austria
Werner Kuich
Fak. Informatik, TU Dresden, Nöthnitzer Str. 46, Dresden, 01187, Germany
Heiko Vogler

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Knight, K., May, J. (2009). Applications of Weighted Automata in Natural Language Processing. In: Droste, M., Kuich, W., Vogler, H. (eds) Handbook of Weighted Automata. Monographs in Theoretical Computer Science. An EATCS Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01492-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-01492-5_14
Published: 16 September 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01491-8
Online ISBN: 978-3-642-01492-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics