Automatic speech processing devices have become popular for quantifying amounts of ambient language input to children in their home environments. We assessed error rates for language input estimates for the Language ENvironment Analysis (LENA) audio processing system, asking whether error rates differed as a function of adult talkers’ gender and whether they were speaking to children or adults. Audio was sampled from within LENA recordings from 23 families with children aged 4–34 months. Human coders identified vocalizations by adults and children, counted intelligible words, and determined whether adults’ speech was addressed to children or adults. LENA’s classification accuracy was assessed by parceling audio into 100-ms frames and comparing, for each frame, human and LENA classifications. LENA correctly classified adult speech 67% of the time across families (average false negative rate: 33%). LENA’s adult word count showed a mean +47% error relative to human counts. Classification and Adult Word Count error rates were significantly affected by talkers’ gender and whether speech was addressed to a child or an adult. The largest systematic errors occurred when adult females addressed children. Results show LENA’s classifications and Adult Word Count entailed random – and sometimes large – errors across recordings, as well as systematic errors as a function of talker gender and addressee. Due to systematic and sometimes high error in estimates of amount of adult language input, relying on this metric alone may lead to invalid clinical and/or research conclusions. Further validation studies and circumspect usage of LENA are warranted.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Each conversational block was classified via LENA’s black-box methods as consisting of up to three “primary” participants named by the title corresponding to the block type code. For instance, one conversational block type we selected for sampling was Adult Female with Key Child (AICF), but such blocks may have had a few frames classified as a male adult talker (MAN). The full list of conversational block type included in this study and their named correspondences is listed in Appendix Table 12.
Although the present paper focused on evaluating LENA’s accuracy for adult speech measures, our method gave data on LENA’s classification accuracy for frames classified by humans as child speech vocalization (N = 66,158). Of frames classified by humans as from the target child (N = 51,334), LENA classifications were as follows: 3315 FAN (6%), 322 MAN (1%), 15,179 CXN (30%), 18,660 CHN (36%), 280 NON (1%), 4830 OLN (9%), 426 TVN (1%), 3039 FUZ (6%), and 5283 SIL or “faint” (10%). Of frames classified by humans as from another child (N = 14,824), LENA classifications were as follows: 1246 FAN (8%), 77 MAN (1%), 7020 CXN (47%), 1049 CHN (7%), 75 NON (1%), 2651 OLN (18%), 139 TVN (1%), 839 FUZ (6%), and 1728 SIL or “faint” (12%).
Humans classified frames of target child speech vocalization in all recordings and frames of other child speech vocalization for all but four recordings (for Families 3, 5, 9, and 14). The mean overall percentage of frames correctly classified by LENA within each family’s recording, averaged across families, for target child speech vocalization was 39% (SD = 17%) and for other child speech vocalization was 37% (SD = 27%).
One family did not have adult male speech in the selected audio.
We also conducted one-sample t tests using measures of chance based on prevalence of frames from the four categories (Table 3) as classified (i) by humans (adult female: 18%, adult male: 9%, child: 15%, other: 59%) and (ii) by LENA (FAN: 16%, MAN: 8%, CHN/CXN: 18%, all other codes: 58%). Results remained reliably higher than chance across families for method (i) [adult female: t(22) = 19.06, p < .001; adult male: t(21) = 13.25, p < .001; child: t(22) = 23.03, p < .001; other: t(22) = 14.67, p < .001], and for method (ii) [FAN: t(22) = 19.99, p < .001; MAN: t(21) = 13.71, p < .001; CHN/CXN: t(22) = 21.58, p < .001; all other codes: t(22) = 13.31, p < .001].
With the lower values of chance from Table 3, 11/52 was significantly higher than 9% for human-identified male frames, z = −3.21, p < .01, and 8% for LENA-identified male (MAN) frames, z = −3.52, p < .001.
We also conducted one-sample t tests based on prevalence of assigning frames into categories (Table 3) as classified (i) by humans (speech: 41%, non-speech: 59%) and (ii) by LENA (speech: 42%, non-speech: 58%) as measures of evaluating chance. Results remained significant across families for method (i) [speech: t(22) = 23.98, p < .001; non-speech: t(22) = 14.67, p < .001], and for method (ii) [speech: t(22) = 23.26, p < .001; non-speech: t(22) = 15.31, p < .001] for these values of chance also.
We also conducted one-sample t tests drawing on prevalence of frames in each category (Table 3) classified (i) by humans (adult speech: 26%, everything else: 74%) and (ii) by LENA (adult speech: 24%, everything else: 76%) as measures of evaluating chance. Results remained significant across families for method (i) [adult speech: t(22) = 21.61, p < .001; everything else: t(22) = 15.47, p < .001], and for method (ii) [adult speech: t(22) = 22.67, p < .001; everything else: t(22) = 13.77, p < .001] for these values of chance also.
Frames identified as adult speech but as directed to individuals other than an adult or child, such as pets or oneself, were excluded (6276 frames, or approximately 5% of adult speech frames).
Following current best practices in statistical modeling, we did not include random slopes in the model, due to the fact that these were not warranted under the naturalistic research design (Barr et al., 2013; Matuschek et al., 2017). This is because not all families had observations for both levels of the two factors, and some families had highly imbalanced data across levels of the factors. Thus, including extra complexity in random factors modeling would have led to less reliable estimation of the main factors of interest.
As pointed out in the Introduction, correlations are not optimal tools for comparing methods. However, the correlation is provided for comparison with values from prior LENA reliability studies (see Table 1).
A plot of everything else classification accuracy against Adult Word Count classification accuracy suggested that Family 22 was something of an outlier. To test whether Family 22 was driving significance for the generalized linear model reported in Table 10, we re-ran the model but removing Family 22. The results were similar. The statistically significant effect of everything else classification accuracy on LENA Adult Word Count accuracy persisted (β estimate = −0.523, st. error = 0.20, t = −2.61, p = .018), with no other significant effect or interaction, as before. Further, the effect size for the relationship between everything else classification accuracy to Adult Word Count accuracy remained strong (r = 0.58). These results support the robustness of the statistical relationship between everything else classification accuracy and Adult Word Count accuracy and suggest the results are not due to an outlier.
The architecture of LENA’s algorithms for Adult Word Count calculations entail that Adult Word Count is only incremented when stretches of audio are classified as “adult speech”, as opposed to any kind of “speech” in general. Consistent with this, a generalized linear model was constructed for LENA Adult Word Count accuracy with predictor variables of accuracy of speech and non-speech classification (and their interaction); neither variable, nor the interaction, showed a significant effect (all p’s > 0.58). This additional modeling underscores the dependency of LENA’s Adult Word Count classification accuracy on “adult speech” classification decisions per se, rather than all speech (or speech-like) vocalization decisions.
The coding manual and raw data files for the current project are available at https://osf.io/2dz4y/.
Agresti, A. (2002). Categorical data analysis. Hoboken, NJ: John Wiley & Sons, Inc.
Ambrose, S., VanDam, M., & Moeller, M. P. (2014). Linguistic input, electronic media, and communication outcomes of toddlers with hearing loss. Ear and hearing, 35(2), 139.
Ambrose, S., Walker, E., Unflat-Berry, L., Oleson, J., & Moeller, M. P. (2015). Quantity and quality of caregivers' linguistic input to 18-month and 3-year-old children who are hard of hearing. Ear and Hearing, 36(1), 48S-59S. doi:https://doi.org/10.1097/AUD.0000000000000209
Atal, B., & Rabiner, L. (1976). A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(3), 201–212.
Bachorowski, J. A. (1999). Vocal expression and perception of emotion. Current Directions in Psychologycal Science, 8, 53–57.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255-278. doi:https://doi.org/10.1016/j.jml.2012.11.001
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.
Benders, T. (2013). Mommy is only happy! Dutch mothers' realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent. Infant Behavior and Development, 36(4), 847–862.
Bergelson, E., Casillas, M., Soderstrom, M., Seidl, A., Warlaumont, A. S., & Amatuni, A. (2019). What do north American babies hear? A large-scale cross-corpus analysis. Developmental Science, 22(1), e12724.
Bergeson, T. R., Miller, R. J., & McCune, K. (2006). Mothers' speech to hearing‐impaired infants and children with cochlear implants. Infancy, 10(3), 221–240.
Bland, J. M., & Altman, D. G. J. L (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1(8476), 307–310.
Boersma, D. C., & Weenink, D. (2017). Praat: Doing phonetics by computer (Version 6.0.29) (Version 6.0.29). Retrieved from http://www.praat.org/
Bořil, T., & Skarnitzl, R. (2016). Tools rPraat and mPraat. Paper presented at the International Conference on Text, Speech, and Dialogue.
Breen, M., Dilley, L. C., Kraemer, J., & Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory, 8(2), 277–312. doi:https://doi.org/10.1515/cllt-2012-0011
Burgess, S., Audet, L., & Harjusola-Webb, S. (2013). Quantitative and qualitative characteristics of the school and home language environments of preschool-aged children with ASD. Journal of Communication Disorders, 46(5-6), 428–439. doi:https://doi.org/10.1016/j.jcomdis.2013.09.003
Busch, T., Sangen, A., Vanpoucke, F., & van Wieringen, A. (2017). Correlation and agreement between Language ENvironment Analysis (LENA™) and manual transcription for Dutch natural language recordings. Behavior Research Methods. doi:https://doi.org/10.3758/s13428-017-0960-0
Canault, M., Le Normand, M. T., Foudil, S., Loundon, N., & Thai-Van, H. (2016). Reliability of the Language ENvironment Analysis system (LENA™) in European French. Behavior Research Methods, 48(3), 1109–1124.
Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics, 22(2), 249–254.
Caskey, M., Stephens, B., Tucker, R., & Vohr, B. (2011). Importance of parent talk on the development of preterm infant vocalizations. Pediatrics, 128(5), 910–916. doi:https://doi.org/10.1542/peds.2011-0609
Caskey, M., Stephens, B., Tucker, R., & Vohr, B. (2014). Adult talk in the NICU with preterm infants and developmental outcomes. Pediatrics, 133(3), e578-584. doi:https://doi.org/10.1542/peds.2013-0104
Caskey, M., & Vohr, B. (2013). Assessing language and language environment of high-risk infants and children: A new approach. Acta Paediatrica, 102(5), 451–461. doi:https://doi.org/10.1111/apa.12195
Christakis, D. A., Gilkerson, J., Richards, J. A., Zimmerman, F. J., Garrison, M. M., Xu, D., … Yapanel, U. (2009). Audible television and decreased adult words, infant vocalizations, and conversational turns: a population-based study. Arch Pediatr Adolesc Med, 163(6), 554–558. doi:https://doi.org/10.1001/archpediatrics.2009.61
Cristia, A., & Seidl, A. (2013). The hyperarticulation hypothesis of infant-directed speech. Journal of Child Language, 41(4). doi:https://doi.org/10.1017/S0305000912000669
Cristia, A., Lavechin, M., Scaff, C., Soderstrom, M., Rowland, C., Räsänen, O., Bunce, J. & Bergelson, E. A thorough evaluation of the Language Environment Analysis (LENA) system. Behavior Research Methods, in press.
Deller, J. R., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-time processing of speech signals.
Dubey, H., Sangwan, A., & Hansen, J. H. (2018a). Leveraging Frequency-Dependent Kernel and DIP-Based Clustering for Robust Speech Activity Detection in Naturalistic Audio Streams. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11), 2056–2071.
Dubey, H., Sangwan, A., & Hansen, J. H. (2018b). Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams. arXiv preprint arXiv:1808.06045.
Dykstra, J. R., Sabatos-DeVito, M. G., Irvin, D. W., Boyd, B. A., Hume, K. A., & Odom, S. L. (2013). Using the Language Environment Analysis (LENA) system in preschool classrooms with children with autism spectrum disorders. Autism, 17(5), 582–594.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Fernald, A. (1989). Intonation and communicative intent in mothers' speech to infants: Is the melody the message? Child Development, 60(6), 1497–1510.
Ford, M., Baer, C. T., Xu, D., Yapanel, U., & Gray, S. (2008). The LENATM Language environment analysis system: Audio specifications of the DLP-0121. LENA Foundation.
Garcia-Sierra, A., Ramírez-Esparza, N., & Kuhl, P. K. (2016). Relationships between quantity of language input and brain responses in bilingual and monolingual infants. International Journal of Psychophysiology, 110, 1–17.
Gilkerson, J., Coulter, K., & Richards, J. A. (2008). Transcriptional analyses of the LENA natural language corpus. LENA Foundation.
Gilkerson, J., & Richards, J. A. (2008). The LENA natural language study. LENA Foundation.
Gilkerson, J., Richards, J. A., & Topping, K. J. (2017a). The impact of book reading in the early years on parent–child language interaction. Journal of Early Childhood Literacy, 17(1), 92–110. doi:https://doi.org/10.1177/1468798415608907
Gilkerson, J., Richards, J. A., Warren, S. F., Montgomery, J. K., Greenwood, C. R., Oller, D. K., … Paul, T. D. (2017b). Mapping the Early Language Environment Using All-Day Recordings and Automated Analysis. Am J Speech Lang Pathol, 26(2), 248–265. doi:https://doi.org/10.1044/2016_AJSLP-15-0169
Gilkerson, J., Richards, J. A., Warren, S. F., Oller, D. K., Russo, R., & Vohr, B. (2018). Language experience in the second year of life and language outcomes in late childhood. Pediatrics, 142(4), e20174276. doi:https://doi.org/10.1542/peds.2017-4276
Gilkerson, J., Zhang, Y., Xu, D., Richards, J. A., Xu, X., Jiang, F., … Topping, K. J. (2015). Evaluating Language Environment Analysis system performance for Chinese: A pilot study in Shanghai. Journal of Speech, Language, and Hearing Research, 58(2), 445–452. doi:https://doi.org/10.1044/2015_JSLHR-L-14-0014
Greenwood, C. R., Carta, J. J., Walker, D., Watson-Thompson, J., Gilkerson, J., Larson, A. L., & Schnitz, A. (2017). Conceptualizing a public health prevention intervention for bridging the 30 million word gap. Clinical Child and Family Psychology Review, 20(1), 3–24.
Greenwood, C. R., Thiemann-Bourque, K., Walker, D., Buzhardt, J., & Gilkerson, J. (2011). Assessing children’s home language environments using automatic speech recognition technology. Communication Disorders Quarterly, 32(2), 83–92. doi:https://doi.org/10.1177/1525740110367826
Gries, S. T. (2016). Quantitative corpus linguistics with R: A practical introduction. Taylor & Francis.
Hansen, J. H., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine, 32(6), 74–99.
Hansen, J. H., Joglekar, A., Shekhar, M. C., Kothapally, V., Yu, C., Kaushik, L., & Sangwan, A. (2019). The 2019 inaugural Fearless Steps Challenge: A giant leap for naturalistic audio. In Proceedings of the 20thAnnual Conference of the International Speech Communication Association (Interspeech 2019), 1851–1855.
Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustical Society of America, 101(1), 466–481.
Hanson, H. M., & Chuang, E. S. (1999). Glottal characteristics of male speakers: Acoustic correlates and comparison with female data. Journal of the Acoustical Society of America, 106(2), 1064–1077.
Hart, B., & Risley, T. R. (1995). Meaningful differences in the everyday experience of young American children. Baltimore, MD: Paul H. Brookes.
Hoff, E., & Naigles, L. (2002). How children use input to acquire a lexicon. Child development, 73(2), 418–433. doi:https://doi.org/10.1111/1467-8624.00415
Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M., & Lyons, T. (1991). Early vocabulary growth: Relation to language input and gender. Developmental Psychology, 27(2), 236–248.
Irvin, D. W., Hume, K., Boyd, B. A., McBee, M. T., & Odom, S. L. (2013). Child and classroom characteristics associated with the adult language provided to preschoolers with autism spectrum disorder. Research in Autism Spectrum Disorders, 7(8), 947–955.
Iseli, M., Shue, Y.-L., & Alwan, A. (2006). Age-and gender-dependent analysis of voice source characteristics. Paper presented at the Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on.
Jaeger, F. T. (2008). Categorical data analysis: away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59, 434–446. doi:https://doi.org/10.1016/j.jml.2007.11.007
Johnson, K., Caskey, M., Rand, K., Tucker, R., & Vohr, B. (2014). Gender differences in adult-infant communication in the first months of life. Pediatrics, 134(6), e1603–1610. doi:https://doi.org/10.1542/peds.2013-4289
Kaushik, L., Sangwan, A., & Hansen, J. H. (2018). Speech Activity Detection in Naturalistic Audio Environments: Fearless Steps Apollo Corpus. IEEE Signal Processing Letters, 25(9), 1290–1294.
Ko, E.-S., Seidl, A., Cristia, A., Reimchen, M., & Soderstrom, M. (2016). Entrainment of prosody in the interaction of mothers with their young children. Journal of Child Language, 43(2), 1–26. doi:https://doi.org/10.1017/S0305000915000203
Kondaurova, M. V., Bergeson, T. R., & Dilley, L. C. (2012). Effects of deafness on acoustic characteristics of American English tense/lax vowels in maternal speech to infants. Journal of the Acoustical Society of America, 132(2), 1039-1049. doi:https://doi.org/10.1121/1.4728169
Krippendorff, K. (1980). Content analysis: An introduction to its methodology: Sage Publications.
Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., … Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277, 684–686. doi:https://doi.org/10.1126/science.277.5326.684
Lam, C., & Kitamura, C. (2010). Maternal interactions with a hearing and hearing-impaired twin: Similarities and differences in speech input, interaction quality, and word production. Journal of Speech, Language, and Hearing Research, 53, 543–555.
Lam, C., & Kitamura, C. (2012). Mommy, speak clearly: Induced hearing loss shapes vowel hyperarticulation. Developmental Science, 15(2), 212–221.
Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174. doi:https://doi.org/10.2307/2529310
Ludbrook, J. J. C. a. E. P. a. P. (1997). Special article comparing methods of measurement. 24(2), 193–203.
Marchman, V. A., Martínez, L. Z., Hurtado, N., Grüter, T., & Fernald, A. (2017). Caregiver talk to young Spanish-English bilinguals: comparing direct observation and parent-report measures of dual-language exposure. Developmental science, 20(1).
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.
McCauley, A., Esposito, M., & Cook, M. (2011). Language environment of preschoolers with autism: Validity and applications. Paper presented at the LENA Users Conference, Denver, CO.
Montag, J. L., Jones, M. N., & Smith, L. B. (2018). Quantity and diversity: Simulating early word learning environments. Cognitive science, 42, 375–412.
Oetting, J. B., Hartfield, L. R., & Pruitt, J. S. (2009). Exploring LENA as a tool for researchers and clinicians. The ASHA Leader, 14(6), 20–22. doi:https://doi.org/10.1044/leader.ftr3.14062009.20
Oller, D. K., Niyogi, P., Gray, S., Richards, J. A., Gilkerson, J., Xu, D., … Warren, S. F. (2010). Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. Proceedings of the National Academy of Sciences, 107(30), 13354–13359. doi:https://doi.org/10.1073/pnas.1003882107
Ota, C. L., & Austin, A. M. B. (2013). Training and mentoring: Family child care providers’ use of linguistic inputs in conversations with children. Early Childhood Research Quarterly, 28(4), 972–983.
Pae, S., Yoon, H., Seol, A., Gilkerson, J., Richards, J. A., Ma, L., & Topping, K. J. (2016). Effects of feedback on parent–child language with infants and toddlers in Korea. First Language, 36(6), 549–569. doi:https://doi.org/10.1177/0142723716649273
Pisanski, K., Fraccaro, P. J., Tigue, C. C., O'Connor, J. J., Röder, S., Andrews, P. W., … Feinberg, D. R. (2014). Vocal indicators of body size in men and women: A meta-analysis. Animal Behaviour, 95, 89–99.
Pisanski, K., & Rendall, D. (2011). The prioritization of voice fundamental frequency or formants in listeners’ assessments of speaker size, masculinity, and attractiveness. Journal of Acoustical Society of America, 129(4), 2201–2212. doi:https://doi.org/10.1121/1.3552866
Podesva, R. (2007). Phonation type as a stylistic variable: The use of falsetto in constructing a persona. Journal of Sociolinguistics, 11(4), 478–504. doi:https://doi.org/10.1111/j.1467-9841.2007.00334.x
Porritt, L., Zinser, M., Bachorowski, J.-A., & Kaplan, P. (2014). Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech. In (Vol. 2014, pp. 51–67).
Proakis, J., Deller, J., & Hansen, J. (1993). Discrete-time processing of speech signals. New York, Macrnillan Pub. Co.
Quené, H., & Van den Bergh, H. (2008). Examples of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language, 59(4), 413–425.
R Development Core Team. (2015). R: A language and environment for statistical computing.
Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition (Vol. 14): PTR Prentice Hall Englewood Cliffs.
Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2017). The Impact of Early Social Interactions on Later Language Development in Spanish-English Bilingual Infants. Child Development, 88(4), 1216–1234. doi:https://doi.org/10.1111/cdev.12648
Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2014). Look who's talking: speech style and social context in language input to infants are linked to concurrent and future speech development. Developmental science, 17(6), 880-891.
Richards, J. A., Gilkerson, J., Xu, D., & Topping, K. (2017a). How much do parents think they talk to their child? Journal of Early Intervention, 39(3), 163–179.
Richards, J. A., Xu, D., Gilkerson, J., Yapanel, U., Gray, S., & Paul, T. (2017b). Automated assessment of child vocalization development using LENA. Journal of Speech, Language, and Hearing Research, 60(7), 2047–2063.
Rietveld, T., & van Hout, R. (1993). Statistical techniques for the study of language and language behavior: Mouton de Gruyter.
Roberts, M. Y., & Kaiser, A. P. (2011). The effectiveness of parent-implemented language interventions: A meta-analysis. American Journal of Speech-Language Pathology.
Romeo, R. R., Leonard, J. A., Robinson, S. T., West, M. R., Mackey, A. P., Rowe, M. L., & Gabrieli, J. D. E. (2018). Beyond the 30-million-word gap: Children’s conversational exposure is associated with language-related brain function. Psychological Science, 29(5), 700–710. doi:https://doi.org/10.1177/0956797617742725
Rowe, M. L. (2012). Recording, transcribing, and coding interaction. Research methods in child language: A practical guide, 191–207. doi:https://doi.org/10.1002/9781444344035.ch13
Sacks, C., Shay, S., Repplinger, L., Leffel, K. R., Sapolich, S. G., Suskind, E., … Suskind, D. L. (2014). Pilot testing of a parent-directed intervention (Project ASPIRE) for underserved children who are deaf or hard of hearing. Child Language Teaching and Therapy, 30(1), 91–102. doi:https://doi.org/10.1177/0265659013494873
Sangwan, A., Hansen, J. H. L., Irvin, D. W., Crutchfield, S., & Greenwood, C. R. (2015). Studying the relationship between physical and language environments of children: Who's speaking to whom and where? Paper presented at the Signal Processing and Signal Processing Education Workshop (SP/SPE), 2015 IEEE.
Schwarz, I.-C., Botros, N., Lord, A., Marcusson, A., Tidelius, H., & Marklund, E. (2017). The LENATM system applied to Swedish: Reliability of the Adult Word Count estimate. Paper presented at the Interspeech 2017.
Seidl, A., Cristia, A., Soderstrom, M., Ko, E.-S., Abel, E. A., Kellerman, A., & Schwichtenberg, A. (2018). Infant–mother acoustic–prosodic alignment and developmental risk. Journal of Speech, Language, and Hearing Research, 61(6), 1369–1380.
Sharma, B., Das, R. K., & Li, H. (2019). Multi-level adaptive speech activity detector for speech in naturalistic environments. In Proceedings of the 20thAnnual Conference of the International Speech Communication Association (Interspeech 2019), 2015–2019.
Shneidman, L. A., Arroyo, M. E., Levine, S. C., & Goldin-Meadow, S. (2013). What counts as effective input for word learning? Journal of Child Language, 40(3), 672–686.
Sholokhov, A., Sahidullah, M., & Kinnunen, T. (2018). Semi-supervised speech activity detection with an application to automatic speaker verification. Computer Speech and Language, 47, 132–156.
Soderstrom, M., & Wittebolle, K. (2013). When do caregivers talk? The influences of activity and time of day on caregiver speech and child vocalizations in two childcare environments. PLoS One, 8(11), e80646. doi:https://doi.org/10.1371/journal.pone.0080646
Suskind, D. L., Graf, E., Leffel, K. R., Hernandez, M. W., Suskind, E., Webber, R., … Nevins, M. E. (2016a). Project ASPIRE: Spoken language intervention curriculum for parents of low-socioeconomic status and their Deaf and Hard-of-Hearing Children. Otology & Neurotology, 37(2), e110–e117.
Suskind, D. L., Leffel, K. R., Graf, E., Hernandez, M. W., Gunderson, E. A., Sapolich, S. G., … Levine, S. C. (2016b). A parent-directed language intervention for children of low socioeconomic status: A randomized controlled pilot study. Journal of child language, 43(2), 366–406. doi:https://doi.org/10.1017/S0305000915000033
Syrdal, A. K., & McGory, J. (2000). Inter-transcriber reliability of ToBI prosodic labeling. Paper presented at the International Conference on Spoken Language Processing, Beijing, China.
Talbot, M. (2015). The talking cure. The New Yorker, 90, 43.
Th. Gries, S. (2015). The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models. Corpora, 10(1), 95–125.
Thiemann-Bourque, K., Warren, S. F., Brady, N., Gilkerson, J., & Richards, J. A. (2014). Vocal interaction between children with Down syndrome and their parents. American Journal of Speech-Language Pathology, 23(3), 474–485. doi:https://doi.org/10.1044/2014_AJSLP-12-0010
VanDam, M., Ambrose, S., & Moeller, M. P. (2012). Quantity of parental language in the home environments of hard-of-hearing 2-year-olds. J Deaf Stud Deaf Educ, 17(4), 402–420. doi:https://doi.org/10.1093/deafed/ens025
VanDam, M., & Silbert, N. H. (2013). Precision and error of automatic speech recognition. Paper presented at the Proceedings of Meetings on Acoustics ICA2013.
VanDam, M., & Silbert, N. H. (2016). Fidelity of automatic speech processing for adult and child talker classifications. PLoS One, 11(8), e0160588. doi:https://doi.org/10.1371/journal.pone.0160588
Vigil, D. C., Hodges, J., & Klee, T. (2005). Quantity and quality of parental language input to late-talking toddlers during play. Child Language Teaching and Therapy, 21(2), 107–122.
Wang, Y., Hartman, M., Aziz, N. A. A., Arora, S., Shi, L., & Tunison, E. (2017). A systematic review of the use of LENA technology. American Annals of the Deaf, 162(3), 295–311.
Warlaumont, A. S., Oller, D. K., Dale, R., Richards, J. A., Gilkerson, J., & Xu, D. (2010). Vocal interaction dynamics of children with and without autism. Paper presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.
Warlaumont, A. S., Richards, J. A., Gilkerson, J., & Oller, D. K. (2014). A social feedback loop for speech development and its reduction in autism. Psychological Science, 25(7), 1314–1324. doi:https://doi.org/10.1177/0956797614531023
Warren, S. F., Gilkerson, J., Richards, J. A., Oller, D. K., Xu, D., Yapanel, U., & Gray, S. (2010). What automated vocal analysis reveals about the vocal production and language learning environment of young children with autism. Journal of Autism and Developmental Disorders, 40(5), 555–569. doi:https://doi.org/10.1007/s10803-009-0902-5
Weisleder, A., & Fernald, A. (2013). Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychological Science, 24(11), 2143-2152. doi:https://doi.org/10.1177/0956797613488145
Weizman, Z. O., & Snow, C. E. (2001). Lexical output as related to children's vocabulary acquisition: Effects of sophisticated exposure and support for meaning. Developmental Psychology, 37(2), 265–279. doi:https://doi.org/10.1037/0012-16126.96.36.1995
Wieland, E., Burnham, E., Kondaurova, M. V., Bergeson, T. R., & Dilley, L. C. (2015). Vowel space characteristics of speech directed to children with and without hearing loss. Journal of Speech, Language and Hearing Research, 58(2), 254–267. doi:https://doi.org/10.1044/2015_JSLHR-S-13-0250
Wong, K., Boben, M., & Thomas, C. (2018). Disrupting the early learning status quo: Providence Talks as an innovative policy in diverse urban communities.
Xu, D., Gilkerson, J., Richards, J., Yapanel, U., & Gray, S. (2009a). Child vocalization composition as discriminant information for automatic autism detection. Paper presented at the Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE.
Xu, D., Richards, J. A., Gilkerson, J., Yapanel, U., Gray, S., & Hansen, J. (2009b). Automatic childhood autism detection by vocalization decomposition with phone-like units. Paper presented at the Proceedings of the 2nd Workshop on Child, Computer and Interaction.
Xu, D., Yapanel, U., & Gray, S. (2009c). Reliability of the LENATMLanguage Environment Analysis System in young children's natural home environment (LENA Technical Report LTR-05-2). Retrieved from Boulder, CO: http://lena.org/wp-content/uploads/2016/07/LTR-05-2_Reliability.pdf
Xu, D., Yapanel, U., Gray, S., & Baer, C. T. (2008a). The LENA Language Environment Analysis System: The interpretive time segments (ITS) file. LENA Research Foundation Technical Report LTR-04-2.
Xu, D., Yapanel, U., Gray, S., Gilkerson, J., Richards, J. A., & Hansen, J. H. L. (2008b). Signal processing for young child speech language development. Paper presented at the First Workshop on Child, Computer and Interaction.
Zhang, Y., Xu, X., Jiang, F., Gilkerson, J., Xu, D., Richards, J. A., … Topping, K. J. (2015). Effects of quantitative linguistic feedback to caregivers of young children: A pilot study in China. Communication Disorders Quarterly, 37(1), 16–24. doi:https://doi.org/10.1177/1525740115575771
Zimmerman, F. J., Gilkerson, J., Richards, J. A., Christakis, D. A., Xu, D., Gray, S., & Yapanel, U. (2009). Teaching by listening: the importance of adult-child conversations to language development. Pediatrics, 124(1), 342–349. doi:https://doi.org/10.1542/peds.2008-2267
We gratefully acknowledge the support of NIH grant R01 DC008581 to D. Houston and L. Dilley. The researchers would like to acknowledge help of Jessica Reed and Yuanyuan Wang for their help in data collection and Somnath Roy for assistance with analyses. We would also like to thank James Chen, Elizabeth Remy, Josh Zhao, Chitra Lakshumanan, Courtney Cameron, Sophia Stevens, Nikaela Losievski, Riley Reed, Kayli Silverstein, and Kelsey Dods for their diligent work coding audio. Thanks to Melanie Soderstrom for sharing previous analysis of LENA reliability with us and for many useful discussions.
Open practices statement
This study was not formally preregistered. The data files and coding manual have been made available on a permanent third-party archive at https://osf.io/2dz4y/.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lehet, M., Arjmandi, M.K., Houston, D. et al. Circumspection in using automated measures: Talker gender and addressee affect error rates for adult speech detection in the Language ENvironment Analysis (LENA) system. Behav Res (2020). https://doi.org/10.3758/s13428-020-01419-y
- Automatic processing