Abstract
Recent approaches for building syntactic language models include the combination of Probabilistic Tree Substitution Grammars (PTSGs) and Bayesian learning methods. While PTSGs have appealing features for syntax modeling, Bayesian methods provide a framework for inducing compact grammars that do not overfit the training corpus. In this paper, we apply these approaches to learn syntactic language models from a Brazilian Portuguese treebank.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bick, E.: The Parsing System ”Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Ph.D. thesis, Aarhus University (2000)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009), http://nltk.org/book
Bod, R.: Do all fragments count? In: Natural Language Engineering, pp. 1–20 (2003)
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 132–139. Morgan Kaufmann Publishers Inc. (2000)
Charniak, E.: Immediate-head parsing for language models. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 124–131. Association for Computational Linguistics, Morristown (2001)
Charniak, E., Knight, K., Yamada, K.: Syntax-based language models for statistical machine translation. In: MT Summit IX. pp. 40–46 (2003)
Chelba, C., Jelinek, F.: Exploiting Syntactic Structure for Language Modeling. In: COLING-ACL (1998)
Cohn, T., Blunsom, P., Goldwater, S.: Inducing tree-substitution grammars. The Journal of Machine Learning 11, 3053–3096 (2010)
Cohn, T., Goldwater, S., Blunsom, P.: Inducing compact but accurate tree-substitution grammars. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 548–556. Association for Computational Linguistics, Morristown (2009)
Collins, M.: Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania (1999)
Collins, M., Roark, B., Saraclar, M.: Discriminative syntactic language modeling for speech recognition. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 507–514. Association for Computational Linguistics (2005)
DeNero, J., Bouchard-Côté, A., Klein, D.: Sampling alignment structure under a Bayesian translation model. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 314–323. Association for Computational Linguistics (2008)
Goldwater, S., Griffiths, T.L., Johnson, M.: A Bayesian framework for word segmentation: Exploring the effects of context. Cognition 112(1), 21–54 (2009)
Joshi, A., Schabes, Y.: Tree-adjoining grammars. Handbook of Formal Languages, Beyond Words 3, 69–123 (1997)
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall (2000)
Och, F., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., et al.: A smorgasbord of features for statistical machine translation. In: Proceedings of HLT-NAACL, pp. 161–168 (2004)
Post, M.: Syntax-based Language Models for Statistical Machine Translation. Ph.D. thesis, University of Rochester (2010)
Post, M., Gildea, D.: Bayesian learning of a tree substitution grammar. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 45–48. Association for Computational Linguistics, Morristown (2009)
Post, M., Gildea, D.: Language modeling with tree substitution grammars. In: NIPS Workshop on Grammar Induction, Representation of Language, and Language Learning, pp. 1–8 (2009)
Roark, B.: Probabilistic top-down parsing and language modeling. Computational Linguistics 27(2), 249–276 (2001)
Sima’an, K.: Computational complexity of probabilistic disambiguation by means of tree-grammars. In: Proceedings of the 16th Conference on Computational Linguistics, pp. 1175–1180. Association for Computational Linguistics, Morristown (1996)
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proceedings of the International Conference on Spoken Language Processing, Citeseer, pp. 901–904 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Beck, D.E., de Medeiros Caseli, H. (2012). Bayesian Induction of Syntactic Language Models for Brazilian Portuguese. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-28885-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)