Statistically-Guided Controlled Language Authoring

Palmaz, Susana; Cuadros, Montse; Etchegoyhen, Thierry

doi:10.1007/978-3-319-41498-0_4

Susana Palmaz¹⁶,
Montse Cuadros¹⁶ &
Thierry Etchegoyhen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9767))

Included in the following conference series:

International Workshop on Controlled Natural Language

654 Accesses

Abstract

This study presents a series of experiments using contextual next word prediction to aid controlled language authoring. The goal is to assess the capabilities of n-gram language modelling to improve the generation of controlled language in restricted domains with minimal supervision. We evaluate how different dimensions of language model design can impact prediction and textual coherence. In particular, evaluations of suggestion ranking, perplexity gain and language model combination are presented. We show that word prediction can provide adequate suggestions which could offer an alternative to costly manual configuration of rules in controlled language applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Koester, H.H., Levine, S.: Effect of a word prediction feature on user performance. Augmentative Altern. Commun. 12(3), 155–168 (1996)
Article Google Scholar
Trnka, K., McCaw, J., Yarrington, D., McCoy, K.F., Pennington, C.: User interaction with word prediction: the effects of prediction quality. ACM Trans. Accessible Comput. (TACCESS) 1(3), 17 (2009)
Google Scholar
Kuhn, T.: A survey and classification of controlled natural languages. Comput. Linguist. 40(1), 121–170 (2014)
Article Google Scholar
Spaggiari, L., Beaujard, F., Cannesson, E.: A controlled language at Airbus. In: Proceedings of EAMT-CLAW 2003, pp. 151–159 (2003)
Google Scholar
Rychtyckyj, N.: Standard Language at Ford Motor Company: A Case Study in Controlled Language Development and Deployment. Cambridge University Press, Cambridge (2006)
Google Scholar
Ramírez-Polo, L.: Use and evaluation of controlled languages in industrial environments and feasibility study for the implementation of machine translation. Ph.D. thesis, Universidad de Valencia (2012)
Google Scholar
Wu, D., Sui, Z., Zhao, J.: An information-based method for selecting feature types for word prediction. In: EUROSPEECH (1999)
Google Scholar
Garay-Vitoria, N., Abascal, J.: Text prediction systems: a survey. Univ. Access Inf. Soc. 4(3), 188–203 (2006)
Article Google Scholar
Trost, H., Matiasek, J., Baroni, M.: The language component of the FASTY text prediction system. Appl. Artif. Intell. 19(8), 743–781 (2005)
Article Google Scholar
Wandmacher, T., Antoine, J.Y.: Methods to integrate a language model with semantic information for a word prediction component. arXiv preprint (2008). arXiv:0801.4716
Van Den Bosch, A.: Scalable classification-based word prediction and confusible correction. Traitement Automatique des Langues 46(2), 39–63 (2006)
Google Scholar
Even-Zohar, Y., Roth, D.: A classification approach to word prediction. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference, NAACL 2000, Stroudsburg, PA, USA, pp. 124–131. Association for Computational Linguistics (2000)
Google Scholar
Lesher, G.W., Moulton, B.J., Higginbotham, D.J., et al.: Effects of ngram order and training text size on word prediction. In: Proceedings of RESNA 1999, pp. 52–54 (1999)
Google Scholar
Schwitter, R., Ljungberg, A., Hood, D.: Ecole-a look-ahead editor for a controlled language. In: EAMT-CLAW 2003, pp. 141–150 (2003)
Google Scholar
Angelov, K., Ranta, A.: Implementing controlled languages in GF. In: Fuchs, N.E. (ed.) CNL 2009. LNCS, vol. 5972, pp. 82–101. Springer, Heidelberg (2010)
Google Scholar
Kuhn, T.: A principled approach to grammars for controlled natural languages and predictive editors. J. Log. Lang. Inf. 22(1), 33–70 (2013)
Article MATH Google Scholar
Tiedemann, J.: News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing, vol. V, pp. 237–248. John Benjamins, Amsterdam/Philadelphia (2009)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)
Google Scholar
Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland, United Kingdom, pp. 187–197 (2011)
Google Scholar
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv preprint (2016). arXiv:1602.02410

Download references

Author information

Authors and Affiliations

Vicomtech-IK4, Donostia - San Sebastián, Spain
Susana Palmaz, Montse Cuadros & Thierry Etchegoyhen

Authors

Susana Palmaz
View author publications
You can also search for this author in PubMed Google Scholar
Montse Cuadros
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Etchegoyhen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susana Palmaz .

Editor information

Editors and Affiliations

National University of Ireland , Galway, Ireland
Brian Davis
University of Malta , Msida, Malta
Gordon J. Pace
University of Aberdeen , Aberdeen, United Kingdom
Adam Wyner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Palmaz, S., Cuadros, M., Etchegoyhen, T. (2016). Statistically-Guided Controlled Language Authoring. In: Davis, B., Pace, G., Wyner, A. (eds) Controlled Natural Language. CNL 2016. Lecture Notes in Computer Science(), vol 9767. Springer, Cham. https://doi.org/10.1007/978-3-319-41498-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-41498-0_4
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41497-3
Online ISBN: 978-3-319-41498-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics