Bayesian Induction of Syntactic Language Models for Brazilian Portuguese

Beck, Daniel Emilio; de Medeiros Caseli, Helena

doi:10.1007/978-3-642-28885-2_18

Daniel Emilio Beck²³ &
Helena de Medeiros Caseli²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7243))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

Abstract

Recent approaches for building syntactic language models include the combination of Probabilistic Tree Substitution Grammars (PTSGs) and Bayesian learning methods. While PTSGs have appealing features for syntax modeling, Bayesian methods provide a framework for inducing compact grammars that do not overfit the training corpus. In this paper, we apply these approaches to learn syntactic language models from a Brazilian Portuguese treebank.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bick, E.: The Parsing System ”Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Ph.D. thesis, Aarhus University (2000)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009), http://nltk.org/book
Bod, R.: Do all fragments count? In: Natural Language Engineering, pp. 1–20 (2003)
Google Scholar
Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, pp. 132–139. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Charniak, E.: Immediate-head parsing for language models. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 124–131. Association for Computational Linguistics, Morristown (2001)
Google Scholar
Charniak, E., Knight, K., Yamada, K.: Syntax-based language models for statistical machine translation. In: MT Summit IX. pp. 40–46 (2003)
Google Scholar
Chelba, C., Jelinek, F.: Exploiting Syntactic Structure for Language Modeling. In: COLING-ACL (1998)
Google Scholar
Cohn, T., Blunsom, P., Goldwater, S.: Inducing tree-substitution grammars. The Journal of Machine Learning 11, 3053–3096 (2010)
MathSciNet Google Scholar
Cohn, T., Goldwater, S., Blunsom, P.: Inducing compact but accurate tree-substitution grammars. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 548–556. Association for Computational Linguistics, Morristown (2009)
Google Scholar
Collins, M.: Head-driven statistical models for natural language parsing. Ph.D. thesis, University of Pennsylvania (1999)
Google Scholar
Collins, M., Roark, B., Saraclar, M.: Discriminative syntactic language modeling for speech recognition. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 507–514. Association for Computational Linguistics (2005)
Google Scholar
DeNero, J., Bouchard-Côté, A., Klein, D.: Sampling alignment structure under a Bayesian translation model. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 314–323. Association for Computational Linguistics (2008)
Google Scholar
Goldwater, S., Griffiths, T.L., Johnson, M.: A Bayesian framework for word segmentation: Exploring the effects of context. Cognition 112(1), 21–54 (2009)
Article Google Scholar
Joshi, A., Schabes, Y.: Tree-adjoining grammars. Handbook of Formal Languages, Beyond Words 3, 69–123 (1997)
Article MathSciNet Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall (2000)
Google Scholar
Och, F., Gildea, D., Khudanpur, S., Sarkar, A., Yamada, K., Fraser, A., Kumar, S., Shen, L., Smith, D., Eng, K., et al.: A smorgasbord of features for statistical machine translation. In: Proceedings of HLT-NAACL, pp. 161–168 (2004)
Google Scholar
Post, M.: Syntax-based Language Models for Statistical Machine Translation. Ph.D. thesis, University of Rochester (2010)
Google Scholar
Post, M., Gildea, D.: Bayesian learning of a tree substitution grammar. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 45–48. Association for Computational Linguistics, Morristown (2009)
Chapter Google Scholar
Post, M., Gildea, D.: Language modeling with tree substitution grammars. In: NIPS Workshop on Grammar Induction, Representation of Language, and Language Learning, pp. 1–8 (2009)
Google Scholar
Roark, B.: Probabilistic top-down parsing and language modeling. Computational Linguistics 27(2), 249–276 (2001)
Article MathSciNet Google Scholar
Sima’an, K.: Computational complexity of probabilistic disambiguation by means of tree-grammars. In: Proceedings of the 16th Conference on Computational Linguistics, pp. 1175–1180. Association for Computational Linguistics, Morristown (1996)
Chapter Google Scholar
Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proceedings of the International Conference on Spoken Language Processing, Citeseer, pp. 901–904 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science – LaLiC/NILC, Federal University of São Carlos (UFSCar), São Carlos, SP, Brazil
Daniel Emilio Beck & Helena de Medeiros Caseli

Authors

Daniel Emilio Beck
View author publications
You can also search for this author in PubMed Google Scholar
Helena de Medeiros Caseli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

UFSCAR, Rod. Washington Luís, 13565-905, São Carlos, Brazil
Helena Caseli
UFRGS, Av. Bento Gonçalves, 9500, 91501-970, Porto Alegre, Brazil
Aline Villavicencio
DETI/IEETA, Universidade de Aveiro, Campus Universitário de Santiago, 3810-193, Aveiro, Portugal
António Teixeira
UC/ IT, DEEC, Universidade de Coimbra, Polo 2, 3030-290, Coimbra, Portugal
Fernando Perdigão

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beck, D.E., de Medeiros Caseli, H. (2012). Bayesian Induction of Syntactic Language Models for Brazilian Portuguese. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-28885-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics