Tiburon: A Weighted Tree Automata Toolkit

May, Jonathan; Knight, Kevin

doi:10.1007/11812128_11

Jonathan May¹⁸ &
Kevin Knight¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4094))

Included in the following conference series:

International Conference on Implementation and Application of Automata

475 Accesses
17 Citations

Abstract

The availability of weighted finite-state string automata toolkits made possible great advances in natural language processing. However, recent advances in syntax-based NLP model design are unsuitable for these toolkits. To combat this problem, we introduce a weighted finite-state tree automata toolkit, which incorporates recent developments in weighted tree automata theory and is useful for natural language applications such as machine translation, sentence compression, question answering, and many more.

The authors wish to thank Steve DeNeefe, Jonathan Graehl, Mark Hopkins, Liang Huang, Daniel Marcu, and Magnus Steinby for their advice and comments. This work was partially supported by NSF grant IIS-0428020 and by GALE-DARPA Contract HR0011-06-C-0022.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kaplan, R.M., Kay, M.: Phonological rules and finite-state transducers. In: Linguistic Society of America Meeting Handbook, Fifty-Sixth Annual Meeting (1981) (abstract)
Google Scholar
Koskenniemi, K.: Two-level morphology: A general computational model for word-form recognition and production. Publication 11, University of Helsinki, Department of General Linguistics, Helsinki (1983)
Google Scholar
Karttunen, L., Beesley, K.R.: A short history of two-level morphology. In: ESSLLI 2001, Special Event titled Twenty Years of Finite-State Morphology, Helsinki, Finland (2001)
Google Scholar
Karttunen, L., Beesley, K.R.: Two-level rule compiler. Technical Report ISTL-92-2, Xerox Palo Alto Research Center, Palo Alto, CA (1992)
Google Scholar
Karttunen, L., Kaplan, R.M., Zaenen, A.: Two-level morphology with composition. In: COLING Proceedings (1992)
Google Scholar
Karttunen, L.: The replace operator. In: ACL Proceedings (1995)
Google Scholar
Karttunen, L.: Directed replacement. In: ACL Proceedings (1996)
Google Scholar
Riccardi, G., Pieraccini, R., Bocchieri, E.: Stochastic automata for language modeling. Computer Speech & Language 10(4) (1996)
Google Scholar
Ljolje, A., Riley, M.D.: Optimal speech recognition using phone recognition and lexical access. In: ICSLP Proceedings (1992)
Google Scholar
Mohri, M., Pereira, F.C.N., Riley, M.: The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231 (2000)
Google Scholar
Mohri, M., Pereira, F.C.N., Riley, M.: A rational design for a weighted finite-state transducer library. In: Proceedings of the 7th Annual AT&T Software Symposium (1997)
Google Scholar
van Noord, G., Gerdemann, D.: An extendible regular expression compiler for finite-state approaches in natural language processing. In: 4th International Workshop on Implementing Automata (2000)
Google Scholar
Kanthak, S., Ney, H.: Fsa: An efficient and flexible c++ toolkit for finite state automata using on-demand computation. In: ACL Proceedings (2004)
Google Scholar
Graehl, J.: Carmel finite-state toolkit (1997), http://www.isi.edu/licensed-sw/carmel
Kaiser, E., Schalkwyk, J.: Building a robust, skipping parser within the AT&T FSM toolkit. Technical report, Center for Human Computer Communication, Oregon Graduate Institute of Science and Technology (2001)
Google Scholar
van Noord, G.: Treatment of epsilon moves in subset construction. Comput. Linguist. 26(1) (2000)
Google Scholar
Koehn, P., Knight, K.: Feature-rich statistical translation of noun phrases. In: ACL Proceedings (2003)
Google Scholar
Pereira, F., Riley, M.: Speech recognition by composition of weighted finite automata. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing. MIT Press, Cambridge (1997)
Google Scholar
Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23(2) (1997)
Google Scholar
Rounds, W.C.: Mappings and grammars on trees. Mathematical Systems Theory 4 (1970)
Google Scholar
Och, F.J., Tillmann, C., Ney, H.: Improved alignment models for statistical machine translation. In: EMNLP/VLC Proceedings (1999)
Google Scholar
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: ACL Proceedings (2001)
Google Scholar
Eisner, J.: Learning non-isomorphic tree mappings for machine translation. In: ACL Proceedings (companion volume) (2003)
Google Scholar
Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139 (2002)
Google Scholar
Pang, B., Knight, K., Marcu, D.: Syntax-based alignment of multiple translations extracting paraphrases and generating new sentences. In: NAACL Proceedings (2003)
Google Scholar
Charniak, E.: Immediate-head parsing for language models. In: ACL Proceedings (2001)
Google Scholar
Yamada, K.: A Syntax-Based Translation Model. PhD thesis, University of Southern California (2002)
Google Scholar
Allauzen, C., Mohri, M., Roark, B.: A general weighted grammar library. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds.) CIAA 2004. LNCS, vol. 3317, pp. 23–34. Springer, Heidelberg (2005)
Chapter Google Scholar
Knight, K., Graehl, J.: An overview of probabilistic tree transducers for natural language processing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 1–24. Springer, Heidelberg (2005)
Chapter Google Scholar
Thatcher, J.W.: Generalized² sequential machines. J. Comput. System Sci. 4 (1970)
Google Scholar
Gécseg, F., Steinby, M.: Tree Automata. Akadémiai Kiadó, Budapest (1984)
MATH Google Scholar
Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications (1997) (release October 1, 2002), Available on: http://www.grappa.univ-lille3.fr/tata
Genet, T., Tong, V.V.T.: Reachability analysis of term rewriting systems with timbuk. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS, vol. 2250, p. 695. Springer, Heidelberg (2001)
Chapter Google Scholar
Borovansky, P., Kirchner, C., Kirchner, H., Moreau, P., Vittek, M.: Elan: A logical framework based on computational systems. In: Proceedings of the first international workshop on rewriting logic (1996)
Google Scholar
Henriksen, J., Jensen, J., Jørgensen, M., Klarlund, N., Paige, B., Rauhe, T., Sandholm, A.: Mona: Monadic second-order logic in practice. In: Brinksma, E., Steffen, B., Cleaveland, W.R., Larsen, K.G., Margaria, T. (eds.) TACAS 1995. LNCS, vol. 1019. Springer, Heidelberg (1995)
Google Scholar
Magidor, M., Moran, G.: Probabilistic tree automata. Israel Journal of Mathematics 8 (1969)
Google Scholar
Fülöp, Z., Vogler, H.: Weighted tree transducers. J. Autom. Lang. Comb. 9(1) (2004)
Google Scholar
Kuich, W.: Tree transducers and formal tree series. Acta Cybernet 14 (1999)
Google Scholar
Brainerd, W.S.: Tree generating regular systems. Inform. and Control 14 (1969)
Google Scholar
Knuth, D.: A generalization of Dijkstra’s algorithm. Inform. Process. Lett. 6(1) (1977)
Google Scholar
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1 (1959)
Google Scholar
Huang, L., Chiang, D.: Better k-best parsing. In: IWPT Proceedings (2005)
Google Scholar
Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: HLT-NAACL Proceedings (2004)
Google Scholar
Bod, R.: An efficient implementation of a new DOP model. In: EACL Proceedings (2003)
Google Scholar
May, J., Knight, K.: A better n-best list: Practical determinization of weighted finite tree automata. In: NAACL Proceedings (2006)
Google Scholar
Siztus, A., Ortmanns, S.: High quality word graphs using forward-backward pruning. In: Proceedings of the IEEE Conference on Acoustic, Speech and Signal Processing (1999)
Google Scholar
Graehl, J.: Context-free algorithms (unpublished handout) (2005)
Google Scholar
Lari, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4 (1990)
Google Scholar
Aho, A.V., Ullman, J.D.: Translations of a context-free grammar. Inform. and Control 19 (1971)
Google Scholar
Shieber, S.M.: Synchronous grammars as tree transducers. In: TAG+7 Proceedings (2004)
Google Scholar
Schabes, Y.: Mathematical and Computational Aspects of Lexicalized Grammars. PhD thesis, Univ. of Pennsylvania, Phila., PA (1990)
Google Scholar
Engelfriet, J.: Bottom-up and top-down tree transformations. a comparison. Mathematical Systems Theory 9 (1976)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1) (1977)
Google Scholar
Graehl, J., Knight, K.: Training tree transducers. In: HLT-NAACL Proceedings (2004)
Google Scholar
Graehl, J., Knight, K., May, J.: Training tree transducers. Comput. Linguist. (submitted)
Google Scholar
Knight, K., Graehl, J.: Machine transliteration. Comput. Linguist. 24(4) (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Sciences Institute, University of Southern California, Marina Del Rey, CA, 90292
Jonathan May & Kevin Knight

Authors

Jonathan May
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Knight
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, Santa Barbara
Oscar H. Ibarra
Dept. of Computer Science, Kainan University, Taoyuan, Taiwan, R.O.C.
Hsu-Chun Yen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

May, J., Knight, K. (2006). Tiburon: A Weighted Tree Automata Toolkit. In: Ibarra, O.H., Yen, HC. (eds) Implementation and Application of Automata. CIAA 2006. Lecture Notes in Computer Science, vol 4094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11812128_11

Download citation

DOI: https://doi.org/10.1007/11812128_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37213-4
Online ISBN: 978-3-540-37214-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics