Abstract
Natural Language Generation systems usually require substantial knowledge about the structure of the target language in order to perform the final task in the generation process – the mapping from semantic representation to text known as surface realisation. Designing knowledge bases of this kind, typically represented as sets of grammar rules, may however become a costly, labour-intensive enterprise. In this work we take a statistical approach to surface realisation in which no linguistic knowledge is hard-coded, but rather trained automatically from large corpora. Results of a small experiment in the generation of referring expressions show significant levels of similarity between our (computer-generated) text and those produced by humans, besides the usual benefits commonly associated with statistical NLP such as low development costs, domain- and language-independency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brown, P.E., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 16(2), 79–85 (1993)
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 359–394 (1999)
Gale, W.A., Sampson, G.: Good-Turing frequency estimation without tears. Journal of Quantitative Linguistics 2, 217–237 (1995)
Gatt, A., van der Sluis, I., van Deemter, K.: Evaluating algorithms for the generation of referring expressions using a balanced corpus. In: Proceedings of the 11th European Workshop on Natural Language Generation, pp. 49–56 (2007)
Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Proc. of the Workshop Pattern Recognition in Practice, pp. 381–397. North-Holland, Amsterdam (1980)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2003)
Reiter, E., Robertson, R., Osman, L.M.: Lessons from a Failure: Generating Tailored Smoking Cessation Letters. Artificial Intelligence 144, 41–58 (2003)
Reiter, E., Dale, R.: Building natural language generation systems. Cambridge University Press, Cambridge (2000)
van Deemter, K., van der Sluis, I., Gatt, A.: Building a semantically transparent corpus for the generation of referring expressions. In: 4th International Conference on Natural Language Generation, INLG-2004 Special session on Data Sharing and Evaluation (2006)
Nunes, M.d.G.V., Vieira, F.M.C., Zavaglia, C., Sossolote, C.R.C., Hernandez, J.: A construção de um léxico para o português do Brasil: lições aprendidas e perspectivas. II Encontro para o processamento de português escrito e Falado. Curitiba, 61–70 (1996)
Jordan, P.W.: Can Nominal Expressions Achieve Multiple Goals?: An Empirical Study. ACL-2000, Hong Kong (2000)
Pereira, Bastos, D., Paraboni, I.: A Language Modelling Tool for Statistical NLP. In: 5th Workshop on Information and Human Language Technology (TIL-2007), Rio de Janeiro, 5-6 July, 2007, pp. 1679–1688 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pereira, D.B., Paraboni, I. (2008). Statistical Surface Realisation of Portuguese Referring Expressions. In: Nordström, B., Ranta, A. (eds) Advances in Natural Language Processing. GoTAL 2008. Lecture Notes in Computer Science(), vol 5221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85287-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-85287-2_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85286-5
Online ISBN: 978-3-540-85287-2
eBook Packages: Computer ScienceComputer Science (R0)