Skip to main content

Multilingual Age of Exposure

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12748))

Included in the following conference series:

Abstract

The ability to objectively quantify the complexity of a text can be a useful indicator of how likely learners of a given level will comprehend it. Before creating more complex models of assessing text difficulty, the basic building block of a text consists of words and, inherently, its overall difficulty is greatly influenced by the complexity of underlying words. One approach is to measure a word’s Age of Acquisition (AoA), an estimate of the average age at which a speaker of a language understands the semantics of a specific word. Age of Exposure (AoE) statistically models the process of word learning, and in turn an estimate of a given word’s AoA. In this paper, we expand on the model proposed by AoE by training regression models that learn and generalize AoA word lists across multiple languages including English, German, French, and Spanish. Our approach allows for the estimation of AoA scores for words that are not found in the original lists, up to the majority of the target language’s vocabulary. Our method can be uniformly applied across multiple languages though the usage of parallel corpora and helps bridge the gap in the size of AoA word lists available for non-English languages. This effort is particularly important for efforts toward extending AI to languages with fewer resources and benchmarked corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rayner, K., Duffy, S.A.: Lexical complexity and fixation times in reading: effects of word frequency, verb complexity, and lexical ambiguity. Memory Cogn. 14(3), 191–201 (1986)

    Article  Google Scholar 

  2. Rosa, K.D., Eskenazi, M.: Effect of word complexity on L2 vocabulary learning. In: 6th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 76–80. ACL, Portland, Oregon (2011)

    Google Scholar 

  3. Maddela, M., Xu, W.: A word-complexity lexicon and a neural readability ranking model for lexical simplification. arXiv preprint, arXiv:1810.05754 (2018)

  4. Kuperman, V., Stadthagen-Gonzalez, H., Brysbaert, M.: Age-of-acquisition ratings for 30,000 English words. Behav. Res. Methods 44(4), 978–990 (2012)

    Article  Google Scholar 

  5. Dascalu, M., McNamara, D.S., Crossley, S.A., Trausan-Matu, S.: Age of exposure: a model of word learning. In: 30th AAAI Conference on Artificial Intelligence, pp. 2928–2934. AAAI Press, Phoenix, AZ (2016)

    Google Scholar 

  6. Landauer, T.K., Kireyev, K., Panaccione, C.: Word maturity: a new metric for word knowledge. Sci. Stud. Reading 15(1), 92–108 (2011)

    Article  Google Scholar 

  7. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)

    Article  Google Scholar 

  8. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)

    MATH  Google Scholar 

  9. Esplà-Gomis, M., Forcada, M.L., Ramírez-Sánchez, G., Hoang, H.: ParaCrawl: Web-scale parallel corpora for the languages of the EU. In: Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks, pp. 118–119. ACL, Dublin, Ireland (2019)

    Google Scholar 

  10. Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221–233 (1948)

    Article  Google Scholar 

  11. Ferrand, L., Bonin, P., Méot, A., Augustinova, M., New, B., Pallier, C., Brysbaert, M.: Age-of-acquisition and subjective frequency estimates for all generally known monosyllabic French words and their relation with other psycholinguistic variables. Behavior Res. Methods 40(4), 1049–1054 (2008)

    Article  Google Scholar 

  12. Alonso, M.A., Fernandez, A., Díez, E.: Subjective age-of-acquisition norms for 7,039 Spanish words. Behavior Res. Methods 47(1), 268–274 (2015)

    Article  Google Scholar 

  13. Birchenough, J.M., Davies, R., Connelly, V.: Rated age-of-acquisition norms for over 3,200 German words. Behavior Res. Methods 49(2), 484–501 (2017)

    Article  Google Scholar 

  14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. In: Workshop at ICLR, Scottsdale, AZ (2013)

    Google Scholar 

  15. Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26(10), 1340–1347 (2010)

    Article  Google Scholar 

  16. Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: 11th ACM International Conference on Web Search and Data Mining, pp. 673–681. ACM, Marina Del Rey, CA, USA (2018)

    Google Scholar 

  17. Di Carlo, V., Bianchi, F., Palmonari, M.: Training temporal word embeddings with a compass. In: AAAI Conference on Artificial Intelligence, vol. 33, pp. 6326–6334, Honolulu, Hawaii, USA (2019)

    Google Scholar 

  18. Arnon, I., McCauley, S.M., Christiansen, M.H.: Digging up the building blocks of language: age-of-acquisition effects for multiword phrases. J. Memory Lang. 92, 265–280 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS – UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES – “Automated Text Evaluation and Simplification”, the Institute of Education Sciences (R305A180144 and R305A180261), and the Office of Naval Research (N00014-17-1-2300; N00014-20-1-2623). The opinions expressed are those of the authors and do not represent views of the IES or ONR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Dascalu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Botarleanu, RM., Dascalu, M., Watanabe, M., McNamara, D.S., Crossley, S.A. (2021). Multilingual Age of Exposure. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12748. Springer, Cham. https://doi.org/10.1007/978-3-030-78292-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78292-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78291-7

  • Online ISBN: 978-3-030-78292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics