The Automatic Generation of Nonwords for Lexical Recognition Tests

  • Osama HamedEmail author
  • Torsten Zesch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10930)


Lexical recognition tests are frequently used to assess vocabulary knowledge. In such tests, learners need to differentiate between words and artificial nonwords that look much like real words. Our ultimate goal is to create high quality lexical recognition tests automatically which enables repetitive automated testing for different languages. This task involves both simple (words selection) and complex (nonwords generation) subtasks. Our main goal here is to automatically generate word-like nonwords. We compare different ranking strategy and find that our best strategy (a specialized higher-order character-based language model) creates word-like nonwords. We evaluate our nonwords in a user study and find that our automatically generated test yields scores that are highly correlated with a well-established lexical recognition test which was manually created.


Lexical recognition tests Nonwords generation Words selection Language models 


  1. 1.
    Baayen, R.H., Piepenbrock, R., Gulikers, L.: The Celex Lexical Database (Release 2). Linguistic Data Consortium, Philadelphia (1995)Google Scholar
  2. 2.
    Balota, D.A., Yap, M.J., Hutchison, K.A., Cortese, M.J., Kessler, B., Loftis, B., Neely, J.H., Nelson, D.L., Simpson, G.B., Treiman, R.: The English lexicon project. Behav. Res. Methods 39(3), 445–459 (2007)CrossRefGoogle Scholar
  3. 3.
    Brysbaert, M.: LexTALE_FR a fast, free, and efficient test to measure language proficiency in French. Psychol. Belg. 53(1), 23–37 (2013)CrossRefGoogle Scholar
  4. 4.
    Cavnar, W.B., Trenkle, J.M., et al.: N-gram-based text categorization. Ann. Arbor. MI 48113(2), 161–175 (1994)Google Scholar
  5. 5.
    Duyck, W., Desmet, T., Verbeke, L.P., Brysbaert, M.: Wordgen: a tool for word selection and nonword generation in dutch, english, german, and french. Behav. Res. Methods Instrum. Comput. 36(3), 488–499 (2004)CrossRefGoogle Scholar
  6. 6.
    Francis, W.N., Kuçera, H.: Manual of Information to Accompany a Standard Corpus of Present-day Edited American English, for use with Digital Computers. Brown University, Providence (1964)Google Scholar
  7. 7.
    Greenberg, J.H.: Some generalizations concerning initial and final consonant sequences. Linguistics 3(18), 5–34 (1965)CrossRefGoogle Scholar
  8. 8.
    Huibregtse, I., Admiraal, W., Meara, P.: Scores on a yes-no vocabulary test: correction for guessing and response style. Lang. Test. 19(3), 227–245 (2002)CrossRefGoogle Scholar
  9. 9.
    Izura, C., Cuetos, F., Brysbaert, M.: Lextale-esp: a test to rapidly and efficiently assess the spanish vocabulary size. Psicol. Int. J. Methodol. Exp. Psychol. 35(1), 49–66 (2014)Google Scholar
  10. 10.
    Johnson, R.L., Eisler, M.E.: The importance of the first and last letter in words during sentence reading. Acta Psychol. 141(3), 336–351 (2012)CrossRefGoogle Scholar
  11. 11.
    Keuleers, E., Brysbaert, M.: Wuggy: a multilingual pseudoword generator. Behav. Res. Methods 42(3), 627–633 (2010)CrossRefGoogle Scholar
  12. 12.
    Lemhöfer, K., Broersma, M.: Introducing lextale: a quick and valid lexical test for advanced learners of english. Behav. Res. Methods 44(2), 325–343 (2012)CrossRefGoogle Scholar
  13. 13.
    Meara, P., Jones, G.: Tests of vocabulary size in english as a foreign language. Polyglot 8(1), 1–40 (1987)Google Scholar
  14. 14.
    Nation, P.: Teaching and Learning Vocabulary. Newbury House, Rowley (1990)Google Scholar
  15. 15.
    Rastle, K., Harrington, J., Coltheart, M.: 358,534 nonwords: the arc nonword database. Q. J. Exp. Psychol. Sect. A 55(4), 1339–1362 (2002)CrossRefGoogle Scholar
  16. 16.
    Schmitt, N.: Vocabulary in Language Teaching. Ernst Klett Sprachen, Stuttgart (2000)Google Scholar
  17. 17.
    Vatanen, T., Väyrynen, J.J., Virpioja, S.: Language identification of short text segments with n-gram models. In: LREC. Citeseer (2010)Google Scholar
  18. 18.
    Wang, T.H.: What strategies are effective for formative assessment in an e-learning environment? J. Comput. Assist. Learn. 23(3), 171–186 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Language Technology Lab, Department of Computer Science and Applied Cognitive ScienceUniversity of Duisburg-EssenDuisburgGermany

Personalised recommendations