Highly-Inflected Language Generation Using Factored Language Models

de Novais, Eder Miranda; Paraboni, Ivandré; Ferreira, Diogo Takaki

doi:10.1007/978-3-642-19400-9_34

Eder Miranda de Novais¹⁷,
Ivandré Paraboni¹⁷ &
Diogo Takaki Ferreira¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6608))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2180 Accesses
1 Citations

Abstract

Statistical language models based on n-gram counts have been shown to successfully replace grammar rules in standard 2-stage (or ‘generate-and-select’) Natural Language Generation (NLG). In highly-inflected languages, however, the amount of training data required to cope with n-gram sparseness may be simply unobtainable, and the benefits of a statistical approach become less obvious. In this work we address the issue of text generation in a highly-inflected language by making use of factored language models (FLM) that take morphological information into account. We present a number of experiments involving the use of simple FLMs applied to various surface realisation tasks, showing that FLMs may implement 2-stage generation with results that are far superior to standard n-gram models alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gatt, A., Reiter, E.: SimpleNLG: A realization engine for practical applications. In: European Natural Language Generation Workshop, ENLG 2009 (2009)
Google Scholar
Reiter, E.: An Architecture for Data-to-Text Systems. In: European Natural Language Generation Workshop (ENLG 2007), pp. 97–104 (2007)
Google Scholar
Langkilde, I.: Forest-based statistical sentence generation. In: Proceedings of ANLP-NAACL 2000, pp. 170–177 (2000)
Google Scholar
Belz, A.: Automatic Generation of Weather Forecast Texts using Comprehensive Probabilistic Generation-Space Models. Natural Language Engineering 14(4), 431–455 (2008)
Article Google Scholar
de Novais, E.M., Dias Tadeu, T., Paraboni, I.: Improved Text Generation Using N-gram Statistics. In: Kuri-Morales, A., Simari, G.R. (eds.) IBERAMIA 2010. LNCS (LNAI), vol. 6433, pp. 316–325. Springer, Heidelberg (2010)
Chapter Google Scholar
Nunes, M.G.V., Vieira, F.M.C., Zavaglia, C., Sossolote, C.R.C., Hernandez, J.: A construcao de um lexico para o portugues do Brasil: licoes aprendidas e perspectivas. II Encontro para o processamento de portugues escrito e Falado, 61–70 (1996)
Google Scholar
Reiter, E., Sripada, S.: Human Variation and Lexical Choice. Computational Linguistics 28(4) (2002)
Google Scholar
Bangalore, S., Rambow, O.: Corpus-based lexical choice in natural language generation. In: 38th Meeting of the ACL, Hong Kong, pp. 464–471 (2000)
Google Scholar
Malouf, R.: The order of prenominal adjectives in natural language generation. In: Proceedings of ACL 2000, Hong Kong (2000)
Google Scholar
Mitchell, M.: Class-Based Ordering of Prenominal Modifiers. In: Proceedings of the 12th European Workshop on Natural Language Generation, Athens, pp. 50–57 (2009)
Google Scholar
Bilmes, J., Kirchhoff, K.: Factored Language Models and Generalized Parallel Backoff. In: Proceedings of HLT-NAACL 2003, vol. 2 (2003)
Google Scholar
NIST: Automatic Evaluation of Machine Translation Quality using n-gram Co-occurrence Statistics (2002), http://www.nist.gov/speech/tests/mt/doc/ngram-study.pdf
Papineni, S., Roukos, T., Ward, W., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: ACL 2002, pp. 311–318 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Arts, Sciences and Humanities, University of São Paulo (USP / EACH), Av. Arlindo Bettio, 1000, São Paulo, Brazil
Eder Miranda de Novais, Ivandré Paraboni & Diogo Takaki Ferreira

Authors

Eder Miranda de Novais
View author publications
You can also search for this author in PubMed Google Scholar
Ivandré Paraboni
View author publications
You can also search for this author in PubMed Google Scholar
Diogo Takaki Ferreira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander F. Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Novais, E.M., Paraboni, I., Ferreira, D.T. (2011). Highly-Inflected Language Generation Using Factored Language Models. In: Gelbukh, A.F. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19400-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-19400-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19399-6
Online ISBN: 978-3-642-19400-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics