childes-db: A flexible and reproducible interface to the child language data exchange system

  • Alessandro Sanchez
  • Stephan C. MeylanEmail author
  • Mika Braginsky
  • Kyle E. MacDonald
  • Daniel Yurovsky
  • Michael C. Frank


The Child Language Data Exchange System (CHILDES) has played a critical role in research on child language development, particularly in characterizing the early language learning environment. Access to these data can be both complex for novices and difficult to automate for advanced users, however. To address these issues, we introduce childes-db, a database-formatted mirror of CHILDES that improves data accessibility and usability by offering novel interfaces, including browsable web applications and an R application programming interface (API). Along with versioned infrastructure that facilitates reproducibility of past analyses, these interfaces lower barriers to analyzing naturalistic parent–child language, allowing for a wider range of researchers in language and cognitive development to easily leverage CHILDES in their work.


Child language Corpus linguistics Reproducibility R packages Research software 



  1. Ambridge, B., Kidd, E., Rowland, C.F., & Theakston, A.L. (2015). The ubiquity of frequency effects in first language acquisition. Journal of Child Language, 42(2), 239–273.PubMedPubMedCentralCrossRefGoogle Scholar
  2. Barr, D.J., Levy, R., Scheepers, C., & Tily, H.J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.CrossRefGoogle Scholar
  3. Bååth, R. (2010). Childfreq: An online tool to explore word frequencies in child language. Lucs Minor, 16, 1–6.Google Scholar
  4. Bird, S., & Loper, E. (2004). NLTK: The natural language toolkit. In Proceedings of the Association for Computational Linguistics Workshop on Interactive Poster and Demonstration sessions.Google Scholar
  5. Brown, R. (1973) A first language. The early stages. Cambridge: Harvard University Press.CrossRefGoogle Scholar
  6. Chang, F. (2017). The luCID language researcher’s toolkit [computer software]. Retrieved from .
  7. Clark, E.V. (2009) First language acquisition. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  8. Demuth, K., Culbertson, J., & Alter, J. (2006). Word-minimality, epenthesis and CODA licensing in the early acquisition of English. Language and Speech, 49(2), 137–173.PubMedCrossRefGoogle Scholar
  9. Donoho, D.L. (2010). An invitation to reproducible computational research. Biostatistics, 11(3), 385–388.PubMedCrossRefGoogle Scholar
  10. Elman, J.L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48(1), 71–99.PubMedCrossRefGoogle Scholar
  11. Eriksson, M., Marschik, P.B., Tulviste, T., Almgren, M., Pérez Pereira, M., Wehberg, S., ..., Gallego, C. (2012). Differences between girls and boys in emerging language skills: Evidence from 10 language communities. British Journal of Developmental Psychology, 30(2), 326–343.PubMedCrossRefGoogle Scholar
  12. Fausey, C.M., Jayaraman, S., & Smith, L.B. (2016). From faces to hands: Changing visual input in the first two years. Cognition, 152, 101–107.PubMedPubMedCentralCrossRefGoogle Scholar
  13. Fenson, L., Dale, P.S., Reznick, J.S., Bates, E., Thal, D.J., Pethick, S.J., ..., Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, i–185.Google Scholar
  14. Goldwater, S., Griffiths, T.L., & Johnson, M. (2009). A Bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54.PubMedCrossRefGoogle Scholar
  15. Goodman, J.C., Dale, P.S., & Li, P. (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language, 35(3), 515–531.PubMedCrossRefGoogle Scholar
  16. Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M., & Lyons, T. (1991). Early vocabulary growth: Relation to language input and gender. Developmental Psychology, 27(2), 236.CrossRefGoogle Scholar
  17. MacWhinney, B. (2000) The CHILDES project: The Database Vol. 2. Hove: Psychology Press.Google Scholar
  18. MacWhinney, B. (2014) The CHILDES project: Tools for analyzing talk, volume ii: The database. Hove: Psychology Press.CrossRefGoogle Scholar
  19. MacWhinney, B., & Snow, C. (1985). The child language data exchange system. Journal of Child Language, 12(2), 271–295.PubMedCrossRefGoogle Scholar
  20. Malvern, D.D., & Richards, B.J. (1997). A new measure of lexical diversity. British Studies in Applied Linguistics, 12, 58– 71.Google Scholar
  21. Marcus, G.F., Pinker, S., Ullman, M., Hollander, M., Rosen, T.J., Xu, F., & Clahsen, H. (1992). Overregularization in language acquisition. Monographs of the Society for Research in Child Development, i–178.Google Scholar
  22. McCarthy, P.M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD). Dissertation Abstracts International, 66, 12.Google Scholar
  23. McCarthy, P.M., & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.PubMedCrossRefGoogle Scholar
  24. Meylan, S.C., Frank, M.C., Roy, B.C., & Levy, R. (2017). The emergence of an abstract grammatical category in children’s early speech. Psychological Science, 28(2), 181–192.PubMedCrossRefGoogle Scholar
  25. Miller, J.F., & Chapman, R.S. (1981). The relation between age and mean length of utterance in morphemes. Journal of Speech, Language, and Hearing Research, 24(2), 154–161.CrossRefGoogle Scholar
  26. Montag, J.L., Jones, M.N., & Smith, L.B. (2015). The words children hear: Picture books and the statistics for language learning. Psychological Science, 26(9), 1489–1496.PubMedPubMedCentralCrossRefGoogle Scholar
  27. Norrman, G., & Bylund, E. (2015). The irreversibility of sensitive period effects in language development: evidence from second language acquisition in international adoptees. Developmental Science, 19(3), 513–520.PubMedCrossRefGoogle Scholar
  28. R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for statistical computing. Retrieved from
  29. Redington, M., Chater, N., & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4), 425–469.CrossRefGoogle Scholar
  30. Saffran, J.R., Aslin, R.N., & Newport, E.L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926–1928.PubMedCrossRefGoogle Scholar
  31. Snyder, W. (2007) Child language: The parametric approach. London: Oxford University Press.Google Scholar
  32. Song, J.Y., Shattuck-Hufnagel, S., & Demuth, K. (2015). Development of phonetic variants (allophones) in 2-year-olds learning American English: A study of alveolar stop/t, d/codas. Journal of Phonetics, 52, 152–169.CrossRefGoogle Scholar
  33. Stodden, V., McNutt, M., Bailey, D.H., Deelman, E., Gil, Y., Hanson, B., ..., Taufer, M. (2016). Enhancing reproducibility for computational methods. Science, 354(6317), 1240–1241.PubMedCrossRefGoogle Scholar
  34. Tardif, T., Fletcher, P., Liang, W., Zhang, Z., Kaciroti, N., & Marchman, V.A. (2008). Baby’s first 10 words. Developmental Psychology, 44(4), 929.PubMedCrossRefGoogle Scholar
  35. Templin, M. (1957). Certain language skills in children: Their development and interrelationships (monograph series no 26). Minneapolis: University of Minnesota, the Institute of Child Welfare.Google Scholar
  36. Wagner, K., Dobkins, K., & Barner, D. (2013). Slow mapping: Color word learning as a gradual inductive process. Cognition, 127(3), 307–317.PubMedCrossRefGoogle Scholar
  37. Watkins, R.V., Kelly, D.J., Harbers, H.M., & Hollis, W. (1995). Measuring children’s lexical diversity: Differentiating typical and impaired language learners. Journal of Speech, Language, and Hearing Research, 38(6), 1349–1355.CrossRefGoogle Scholar
  38. Wickham, H., & Grolemund, G. (2016) R for data science: Import, tidy, transform, visualize, and model data. Sebastopol: O’Reilly Media, Inc.Google Scholar
  39. Wickham, H., Francois, R., Henry, L., & Müller, K. (2017). Dplyr: A grammar of data manipulation. Retrieved from
  40. Yang, C. (2013). Ontogeny and phylogeny of language. Proceedings of the National Academy of Sciences, 110 (16), 6324–6327.CrossRefGoogle Scholar
  41. Yurovsky, D., Wagner, K., Barner, D., & Frank, M.C. (2015). Signatures of domain-general categorization mechanisms in color word learning. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Google Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  • Alessandro Sanchez
    • 1
  • Stephan C. Meylan
    • 2
    • 3
    Email author
  • Mika Braginsky
    • 3
  • Kyle E. MacDonald
    • 1
  • Daniel Yurovsky
    • 4
  • Michael C. Frank
    • 1
  1. 1.Department of PsychologyStanford UniversityStanfordUSA
  2. 2.Duke UniversityDurhamUSA
  3. 3.MITCambridgeUSA
  4. 4.University of ChicagoChicagoUSA

Personalised recommendations