Circumspection in using automated measures: Talker gender and addressee affect error rates for adult speech detection in the Language ENvironment Analysis (LENA) system

Abstract

Automatic speech processing devices have become popular for quantifying amounts of ambient language input to children in their home environments. We assessed error rates for language input estimates for the Language ENvironment Analysis (LENA) audio processing system, asking whether error rates differed as a function of adult talkers’ gender and whether they were speaking to children or adults. Audio was sampled from within LENA recordings from 23 families with children aged 4–34 months. Human coders identified vocalizations by adults and children, counted intelligible words, and determined whether adults’ speech was addressed to children or adults. LENA’s classification accuracy was assessed by parceling audio into 100-ms frames and comparing, for each frame, human and LENA classifications. LENA correctly classified adult speech 67% of the time across families (average false negative rate: 33%). LENA’s adult word count showed a mean +47% error relative to human counts. Classification and Adult Word Count error rates were significantly affected by talkers’ gender and whether speech was addressed to a child or an adult. The largest systematic errors occurred when adult females addressed children. Results show LENA’s classifications and Adult Word Count entailed random – and sometimes large – errors across recordings, as well as systematic errors as a function of talker gender and addressee. Due to systematic and sometimes high error in estimates of amount of adult language input, relying on this metric alone may lead to invalid clinical and/or research conclusions. Further validation studies and circumspect usage of LENA are warranted.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    Each conversational block was classified via LENA’s black-box methods as consisting of up to three “primary” participants named by the title corresponding to the block type code. For instance, one conversational block type we selected for sampling was Adult Female with Key Child (AICF), but such blocks may have had a few frames classified as a male adult talker (MAN). The full list of conversational block type included in this study and their named correspondences is listed in Appendix Table 12.

  2. 2.

    Although the present paper focused on evaluating LENA’s accuracy for adult speech measures, our method gave data on LENA’s classification accuracy for frames classified by humans as child speech vocalization (N = 66,158). Of frames classified by humans as from the target child (N = 51,334), LENA classifications were as follows: 3315 FAN (6%), 322 MAN (1%), 15,179 CXN (30%), 18,660 CHN (36%), 280 NON (1%), 4830 OLN (9%), 426 TVN (1%), 3039 FUZ (6%), and 5283 SIL or “faint” (10%). Of frames classified by humans as from another child (N = 14,824), LENA classifications were as follows: 1246 FAN (8%), 77 MAN (1%), 7020 CXN (47%), 1049 CHN (7%), 75 NON (1%), 2651 OLN (18%), 139 TVN (1%), 839 FUZ (6%), and 1728 SIL or “faint” (12%).

  3. 3.

    Humans classified frames of target child speech vocalization in all recordings and frames of other child speech vocalization for all but four recordings (for Families 3, 5, 9, and 14). The mean overall percentage of frames correctly classified by LENA within each family’s recording, averaged across families, for target child speech vocalization was 39% (SD = 17%) and for other child speech vocalization was 37% (SD = 27%).

  4. 4.

    One family did not have adult male speech in the selected audio.

  5. 5.

    We also conducted one-sample t tests using measures of chance based on prevalence of frames from the four categories (Table 3) as classified (i) by humans (adult female: 18%, adult male: 9%, child: 15%, other: 59%) and (ii) by LENA (FAN: 16%, MAN: 8%, CHN/CXN: 18%, all other codes: 58%). Results remained reliably higher than chance across families for method (i) [adult female: t(22) = 19.06, p < .001; adult male: t(21) = 13.25, p < .001; child: t(22) = 23.03, p < .001; other: t(22) = 14.67, p < .001], and for method (ii) [FAN: t(22) = 19.99, p < .001; MAN: t(21) = 13.71, p < .001; CHN/CXN: t(22) = 21.58, p < .001; all other codes: t(22) = 13.31, p < .001].

  6. 6.

    With the lower values of chance from Table 3, 11/52 was significantly higher than 9% for human-identified male frames, z = −3.21, p < .01, and 8% for LENA-identified male (MAN) frames, z = −3.52, p < .001.

  7. 7.

    We also conducted one-sample t tests based on prevalence of assigning frames into categories (Table 3) as classified (i) by humans (speech: 41%, non-speech: 59%) and (ii) by LENA (speech: 42%, non-speech: 58%) as measures of evaluating chance. Results remained significant across families for method (i) [speech: t(22) = 23.98, p < .001; non-speech: t(22) = 14.67, p < .001], and for method (ii) [speech: t(22) = 23.26, p < .001; non-speech: t(22) = 15.31, p < .001] for these values of chance also.

  8. 8.

    We also conducted one-sample t tests drawing on prevalence of frames in each category (Table 3) classified (i) by humans (adult speech: 26%, everything else: 74%) and (ii) by LENA (adult speech: 24%, everything else: 76%) as measures of evaluating chance. Results remained significant across families for method (i) [adult speech: t(22) = 21.61, p < .001; everything else: t(22) = 15.47, p < .001], and for method (ii) [adult speech: t(22) = 22.67, p < .001; everything else: t(22) = 13.77, p < .001] for these values of chance also.

  9. 9.

    Frames identified as adult speech but as directed to individuals other than an adult or child, such as pets or oneself, were excluded (6276 frames, or approximately 5% of adult speech frames).

  10. 10.

    Following current best practices in statistical modeling, we did not include random slopes in the model, due to the fact that these were not warranted under the naturalistic research design (Barr et al., 2013; Matuschek et al., 2017). This is because not all families had observations for both levels of the two factors, and some families had highly imbalanced data across levels of the factors. Thus, including extra complexity in random factors modeling would have led to less reliable estimation of the main factors of interest.

  11. 11.

    As pointed out in the Introduction, correlations are not optimal tools for comparing methods. However, the correlation is provided for comparison with values from prior LENA reliability studies (see Table 1).

  12. 12.

    A plot of everything else classification accuracy against Adult Word Count classification accuracy suggested that Family 22 was something of an outlier. To test whether Family 22 was driving significance for the generalized linear model reported in Table 10, we re-ran the model but removing Family 22. The results were similar. The statistically significant effect of everything else classification accuracy on LENA Adult Word Count accuracy persisted (β estimate = −0.523, st. error = 0.20, t = −2.61, p = .018), with no other significant effect or interaction, as before. Further, the effect size for the relationship between everything else classification accuracy to Adult Word Count accuracy remained strong (r = 0.58). These results support the robustness of the statistical relationship between everything else classification accuracy and Adult Word Count accuracy and suggest the results are not due to an outlier.

  13. 13.

    The architecture of LENA’s algorithms for Adult Word Count calculations entail that Adult Word Count is only incremented when stretches of audio are classified as “adult speech”, as opposed to any kind of “speech” in general. Consistent with this, a generalized linear model was constructed for LENA Adult Word Count accuracy with predictor variables of accuracy of speech and non-speech classification (and their interaction); neither variable, nor the interaction, showed a significant effect (all p’s > 0.58). This additional modeling underscores the dependency of LENA’s Adult Word Count classification accuracy on “adult speech” classification decisions per se, rather than all speech (or speech-like) vocalization decisions.

  14. 14.

    The coding manual and raw data files for the current project are available at https://osf.io/2dz4y/.

References

  1. Agresti, A. (2002). Categorical data analysis. Hoboken, NJ: John Wiley & Sons, Inc.

    Google Scholar 

  2. Ambrose, S., VanDam, M., & Moeller, M. P. (2014). Linguistic input, electronic media, and communication outcomes of toddlers with hearing loss. Ear and hearing, 35(2), 139.

    PubMed  PubMed Central  Article  Google Scholar 

  3. Ambrose, S., Walker, E., Unflat-Berry, L., Oleson, J., & Moeller, M. P. (2015). Quantity and quality of caregivers' linguistic input to 18-month and 3-year-old children who are hard of hearing. Ear and Hearing, 36(1), 48S-59S. doi:https://doi.org/10.1097/AUD.0000000000000209

    PubMed  PubMed Central  Article  Google Scholar 

  4. Atal, B., & Rabiner, L. (1976). A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(3), 201–212.

    Article  Google Scholar 

  5. Bachorowski, J. A. (1999). Vocal expression and perception of emotion. Current Directions in Psychologycal Science, 8, 53–57.

    Article  Google Scholar 

  6. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255-278. doi:https://doi.org/10.1016/j.jml.2012.11.001

    Article  Google Scholar 

  7. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.

    Article  Google Scholar 

  8. Benders, T. (2013). Mommy is only happy! Dutch mothers' realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent. Infant Behavior and Development, 36(4), 847–862.

    PubMed  Article  Google Scholar 

  9. Bergelson, E., Casillas, M., Soderstrom, M., Seidl, A., Warlaumont, A. S., & Amatuni, A. (2019). What do north American babies hear? A large-scale cross-corpus analysis. Developmental Science, 22(1), e12724.

    PubMed  Article  Google Scholar 

  10. Bergeson, T. R., Miller, R. J., & McCune, K. (2006). Mothers' speech to hearing‐impaired infants and children with cochlear implants. Infancy, 10(3), 221–240.

  11. Bland, J. M., & Altman, D. G. J. L (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1(8476), 307–310.

    PubMed  PubMed Central  Article  Google Scholar 

  12. Boersma, D. C., & Weenink, D. (2017). Praat: Doing phonetics by computer (Version 6.0.29) (Version 6.0.29). Retrieved from http://www.praat.org/

  13. Bořil, T., & Skarnitzl, R. (2016). Tools rPraat and mPraat. Paper presented at the International Conference on Text, Speech, and Dialogue.

  14. Breen, M., Dilley, L. C., Kraemer, J., & Gibson, E. (2012). Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory, 8(2), 277–312. doi:https://doi.org/10.1515/cllt-2012-0011

    Article  Google Scholar 

  15. Burgess, S., Audet, L., & Harjusola-Webb, S. (2013). Quantitative and qualitative characteristics of the school and home language environments of preschool-aged children with ASD. Journal of Communication Disorders, 46(5-6), 428–439. doi:https://doi.org/10.1016/j.jcomdis.2013.09.003

    PubMed  Article  Google Scholar 

  16. Busch, T., Sangen, A., Vanpoucke, F., & van Wieringen, A. (2017). Correlation and agreement between Language ENvironment Analysis (LENA™) and manual transcription for Dutch natural language recordings. Behavior Research Methods. doi:https://doi.org/10.3758/s13428-017-0960-0

  17. Canault, M., Le Normand, M. T., Foudil, S., Loundon, N., & Thai-Van, H. (2016). Reliability of the Language ENvironment Analysis system (LENA™) in European French. Behavior Research Methods, 48(3), 1109–1124.

    PubMed  Article  Google Scholar 

  18. Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics, 22(2), 249–254.

    Google Scholar 

  19. Caskey, M., Stephens, B., Tucker, R., & Vohr, B. (2011). Importance of parent talk on the development of preterm infant vocalizations. Pediatrics, 128(5), 910–916. doi:https://doi.org/10.1542/peds.2011-0609

    PubMed  Article  Google Scholar 

  20. Caskey, M., Stephens, B., Tucker, R., & Vohr, B. (2014). Adult talk in the NICU with preterm infants and developmental outcomes. Pediatrics, 133(3), e578-584. doi:https://doi.org/10.1542/peds.2013-0104

    PubMed  Article  Google Scholar 

  21. Caskey, M., & Vohr, B. (2013). Assessing language and language environment of high-risk infants and children: A new approach. Acta Paediatrica, 102(5), 451–461. doi:https://doi.org/10.1111/apa.12195

    PubMed  Article  Google Scholar 

  22. Christakis, D. A., Gilkerson, J., Richards, J. A., Zimmerman, F. J., Garrison, M. M., Xu, D., … Yapanel, U. (2009). Audible television and decreased adult words, infant vocalizations, and conversational turns: a population-based study. Arch Pediatr Adolesc Med, 163(6), 554–558. doi:https://doi.org/10.1001/archpediatrics.2009.61

    PubMed  Article  Google Scholar 

  23. Cristia, A., & Seidl, A. (2013). The hyperarticulation hypothesis of infant-directed speech. Journal of Child Language, 41(4). doi:https://doi.org/10.1017/S0305000912000669

  24. Cristia, A., Lavechin, M., Scaff, C., Soderstrom, M., Rowland, C., Räsänen, O., Bunce, J. & Bergelson, E. A thorough evaluation of the Language Environment Analysis (LENA) system. Behavior Research Methods, in press.

  25. Deller, J. R., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-time processing of speech signals.

  26. Dubey, H., Sangwan, A., & Hansen, J. H. (2018a). Leveraging Frequency-Dependent Kernel and DIP-Based Clustering for Robust Speech Activity Detection in Naturalistic Audio Streams. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(11), 2056–2071.

    Article  Google Scholar 

  27. Dubey, H., Sangwan, A., & Hansen, J. H. (2018b). Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams. arXiv preprint arXiv:1808.06045.

  28. Dykstra, J. R., Sabatos-DeVito, M. G., Irvin, D. W., Boyd, B. A., Hume, K. A., & Odom, S. L. (2013). Using the Language Environment Analysis (LENA) system in preschool classrooms with children with autism spectrum disorders. Autism, 17(5), 582–594.

    PubMed  Article  Google Scholar 

  29. Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.

    Article  Google Scholar 

  30. Fernald, A. (1989). Intonation and communicative intent in mothers' speech to infants: Is the melody the message? Child Development, 60(6), 1497–1510.

    PubMed  Article  Google Scholar 

  31. Ford, M., Baer, C. T., Xu, D., Yapanel, U., & Gray, S. (2008). The LENATM Language environment analysis system: Audio specifications of the DLP-0121. LENA Foundation.

  32. Garcia-Sierra, A., Ramírez-Esparza, N., & Kuhl, P. K. (2016). Relationships between quantity of language input and brain responses in bilingual and monolingual infants. International Journal of Psychophysiology, 110, 1–17.

    PubMed  Article  Google Scholar 

  33. Gilkerson, J., Coulter, K., & Richards, J. A. (2008). Transcriptional analyses of the LENA natural language corpus. LENA Foundation.

  34. Gilkerson, J., & Richards, J. A. (2008). The LENA natural language study. LENA Foundation.

  35. Gilkerson, J., Richards, J. A., & Topping, K. J. (2017a). The impact of book reading in the early years on parent–child language interaction. Journal of Early Childhood Literacy, 17(1), 92–110. doi:https://doi.org/10.1177/1468798415608907

    Article  Google Scholar 

  36. Gilkerson, J., Richards, J. A., Warren, S. F., Montgomery, J. K., Greenwood, C. R., Oller, D. K., … Paul, T. D. (2017b). Mapping the Early Language Environment Using All-Day Recordings and Automated Analysis. Am J Speech Lang Pathol, 26(2), 248–265. doi:https://doi.org/10.1044/2016_AJSLP-15-0169

    PubMed  PubMed Central  Article  Google Scholar 

  37. Gilkerson, J., Richards, J. A., Warren, S. F., Oller, D. K., Russo, R., & Vohr, B. (2018). Language experience in the second year of life and language outcomes in late childhood. Pediatrics, 142(4), e20174276. doi:https://doi.org/10.1542/peds.2017-4276

    PubMed  PubMed Central  Article  Google Scholar 

  38. Gilkerson, J., Zhang, Y., Xu, D., Richards, J. A., Xu, X., Jiang, F., … Topping, K. J. (2015). Evaluating Language Environment Analysis system performance for Chinese: A pilot study in Shanghai. Journal of Speech, Language, and Hearing Research, 58(2), 445–452. doi:https://doi.org/10.1044/2015_JSLHR-L-14-0014

    PubMed  Article  Google Scholar 

  39. Greenwood, C. R., Carta, J. J., Walker, D., Watson-Thompson, J., Gilkerson, J., Larson, A. L., & Schnitz, A. (2017). Conceptualizing a public health prevention intervention for bridging the 30 million word gap. Clinical Child and Family Psychology Review, 20(1), 3–24.

    PubMed  Article  Google Scholar 

  40. Greenwood, C. R., Thiemann-Bourque, K., Walker, D., Buzhardt, J., & Gilkerson, J. (2011). Assessing children’s home language environments using automatic speech recognition technology. Communication Disorders Quarterly, 32(2), 83–92. doi:https://doi.org/10.1177/1525740110367826

    Article  Google Scholar 

  41. Gries, S. T. (2016). Quantitative corpus linguistics with R: A practical introduction. Taylor & Francis.

  42. Hansen, J. H., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine, 32(6), 74–99.

  43. Hansen, J. H., Joglekar, A., Shekhar, M. C., Kothapally, V., Yu, C., Kaushik, L., & Sangwan, A. (2019). The 2019 inaugural Fearless Steps Challenge: A giant leap for naturalistic audio. In Proceedings of the 20thAnnual Conference of the International Speech Communication Association (Interspeech 2019), 1851–1855.

  44. Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustical Society of America, 101(1), 466–481.

    PubMed  Article  Google Scholar 

  45. Hanson, H. M., & Chuang, E. S. (1999). Glottal characteristics of male speakers: Acoustic correlates and comparison with female data. Journal of the Acoustical Society of America, 106(2), 1064–1077.

    PubMed  Article  Google Scholar 

  46. Hart, B., & Risley, T. R. (1995). Meaningful differences in the everyday experience of young American children. Baltimore, MD: Paul H. Brookes.

    Google Scholar 

  47. Hoff, E., & Naigles, L. (2002). How children use input to acquire a lexicon. Child development, 73(2), 418–433. doi:https://doi.org/10.1111/1467-8624.00415

    PubMed  Article  Google Scholar 

  48. Huttenlocher, J., Haight, W., Bryk, A., Seltzer, M., & Lyons, T. (1991). Early vocabulary growth: Relation to language input and gender. Developmental Psychology, 27(2), 236–248.

    Article  Google Scholar 

  49. Irvin, D. W., Hume, K., Boyd, B. A., McBee, M. T., & Odom, S. L. (2013). Child and classroom characteristics associated with the adult language provided to preschoolers with autism spectrum disorder. Research in Autism Spectrum Disorders, 7(8), 947–955.

    Article  Google Scholar 

  50. Iseli, M., Shue, Y.-L., & Alwan, A. (2006). Age-and gender-dependent analysis of voice source characteristics. Paper presented at the Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on.

  51. Jaeger, F. T. (2008). Categorical data analysis: away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59, 434–446. doi:https://doi.org/10.1016/j.jml.2007.11.007

    PubMed  PubMed Central  Article  Google Scholar 

  52. Johnson, K., Caskey, M., Rand, K., Tucker, R., & Vohr, B. (2014). Gender differences in adult-infant communication in the first months of life. Pediatrics, 134(6), e1603–1610. doi:https://doi.org/10.1542/peds.2013-4289

    PubMed  Article  Google Scholar 

  53. Kaushik, L., Sangwan, A., & Hansen, J. H. (2018). Speech Activity Detection in Naturalistic Audio Environments: Fearless Steps Apollo Corpus. IEEE Signal Processing Letters, 25(9), 1290–1294.

    Article  Google Scholar 

  54. Ko, E.-S., Seidl, A., Cristia, A., Reimchen, M., & Soderstrom, M. (2016). Entrainment of prosody in the interaction of mothers with their young children. Journal of Child Language, 43(2), 1–26. doi:https://doi.org/10.1017/S0305000915000203

    Article  Google Scholar 

  55. Kondaurova, M. V., Bergeson, T. R., & Dilley, L. C. (2012). Effects of deafness on acoustic characteristics of American English tense/lax vowels in maternal speech to infants. Journal of the Acoustical Society of America, 132(2), 1039-1049. doi:https://doi.org/10.1121/1.4728169

    PubMed  Article  Google Scholar 

  56. Krippendorff, K. (1980). Content analysis: An introduction to its methodology: Sage Publications.

  57. Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V., Ryskina, V. L., … Lacerda, F. (1997). Cross-language analysis of phonetic units in language addressed to infants. Science, 277, 684–686. doi:https://doi.org/10.1126/science.277.5326.684

    PubMed  Article  Google Scholar 

  58. Lam, C., & Kitamura, C. (2010). Maternal interactions with a hearing and hearing-impaired twin: Similarities and differences in speech input, interaction quality, and word production. Journal of Speech, Language, and Hearing Research, 53, 543–555.

  59. Lam, C., & Kitamura, C. (2012). Mommy, speak clearly: Induced hearing loss shapes vowel hyperarticulation. Developmental Science, 15(2), 212–221.

  60. Landis, J., & Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174. doi:https://doi.org/10.2307/2529310

    Article  Google Scholar 

  61. Ludbrook, J. J. C. a. E. P. a. P. (1997). Special article comparing methods of measurement. 24(2), 193–203.

  62. Marchman, V. A., Martínez, L. Z., Hurtado, N., Grüter, T., & Fernald, A. (2017). Caregiver talk to young Spanish-English bilinguals: comparing direct observation and parent-report measures of dual-language exposure. Developmental science, 20(1).

  63. Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315.

    Article  Google Scholar 

  64. McCauley, A., Esposito, M., & Cook, M. (2011). Language environment of preschoolers with autism: Validity and applications. Paper presented at the LENA Users Conference, Denver, CO.

  65. Montag, J. L., Jones, M. N., & Smith, L. B. (2018). Quantity and diversity: Simulating early word learning environments. Cognitive science, 42, 375–412.

    PubMed  PubMed Central  Article  Google Scholar 

  66. Oetting, J. B., Hartfield, L. R., & Pruitt, J. S. (2009). Exploring LENA as a tool for researchers and clinicians. The ASHA Leader, 14(6), 20–22. doi:https://doi.org/10.1044/leader.ftr3.14062009.20

    Article  Google Scholar 

  67. Oller, D. K., Niyogi, P., Gray, S., Richards, J. A., Gilkerson, J., Xu, D., … Warren, S. F. (2010). Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. Proceedings of the National Academy of Sciences, 107(30), 13354–13359. doi:https://doi.org/10.1073/pnas.1003882107

    Article  Google Scholar 

  68. Ota, C. L., & Austin, A. M. B. (2013). Training and mentoring: Family child care providers’ use of linguistic inputs in conversations with children. Early Childhood Research Quarterly, 28(4), 972–983.

    Article  Google Scholar 

  69. Pae, S., Yoon, H., Seol, A., Gilkerson, J., Richards, J. A., Ma, L., & Topping, K. J. (2016). Effects of feedback on parent–child language with infants and toddlers in Korea. First Language, 36(6), 549–569. doi:https://doi.org/10.1177/0142723716649273

    Article  Google Scholar 

  70. Pisanski, K., Fraccaro, P. J., Tigue, C. C., O'Connor, J. J., Röder, S., Andrews, P. W., … Feinberg, D. R. (2014). Vocal indicators of body size in men and women: A meta-analysis. Animal Behaviour, 95, 89–99.

    Article  Google Scholar 

  71. Pisanski, K., & Rendall, D. (2011). The prioritization of voice fundamental frequency or formants in listeners’ assessments of speaker size, masculinity, and attractiveness. Journal of Acoustical Society of America, 129(4), 2201–2212. doi:https://doi.org/10.1121/1.3552866

    Article  Google Scholar 

  72. Podesva, R. (2007). Phonation type as a stylistic variable: The use of falsetto in constructing a persona. Journal of Sociolinguistics, 11(4), 478–504. doi:https://doi.org/10.1111/j.1467-9841.2007.00334.x

    Article  Google Scholar 

  73. Porritt, L., Zinser, M., Bachorowski, J.-A., & Kaplan, P. (2014). Depression Diagnoses and Fundamental Frequency-Based Acoustic Cues in Maternal Infant-Directed Speech. In (Vol. 2014, pp. 51–67).

  74. Proakis, J., Deller, J., & Hansen, J. (1993). Discrete-time processing of speech signals. New York, Macrnillan Pub. Co.

  75. Quené, H., & Van den Bergh, H. (2008). Examples of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language, 59(4), 413–425.

    Article  Google Scholar 

  76. R Development Core Team. (2015). R: A language and environment for statistical computing.

  77. Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition (Vol. 14): PTR Prentice Hall Englewood Cliffs.

  78. Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2017). The Impact of Early Social Interactions on Later Language Development in Spanish-English Bilingual Infants. Child Development, 88(4), 1216–1234. doi:https://doi.org/10.1111/cdev.12648

    PubMed  Article  Google Scholar 

  79. Ramírez-Esparza, N., García-Sierra, A., & Kuhl, P. K. (2014). Look who's talking: speech style and social context in language input to infants are linked to concurrent and future speech development. Developmental science, 17(6), 880-891.

    PubMed  PubMed Central  Article  Google Scholar 

  80. Richards, J. A., Gilkerson, J., Xu, D., & Topping, K. (2017a). How much do parents think they talk to their child? Journal of Early Intervention, 39(3), 163–179.

    Article  Google Scholar 

  81. Richards, J. A., Xu, D., Gilkerson, J., Yapanel, U., Gray, S., & Paul, T. (2017b). Automated assessment of child vocalization development using LENA. Journal of Speech, Language, and Hearing Research, 60(7), 2047–2063.

    PubMed  Article  Google Scholar 

  82. Rietveld, T., & van Hout, R. (1993). Statistical techniques for the study of language and language behavior: Mouton de Gruyter.

  83. Roberts, M. Y., & Kaiser, A. P. (2011). The effectiveness of parent-implemented language interventions: A meta-analysis. American Journal of Speech-Language Pathology.

  84. Romeo, R. R., Leonard, J. A., Robinson, S. T., West, M. R., Mackey, A. P., Rowe, M. L., & Gabrieli, J. D. E. (2018). Beyond the 30-million-word gap: Children’s conversational exposure is associated with language-related brain function. Psychological Science, 29(5), 700–710. doi:https://doi.org/10.1177/0956797617742725

    PubMed  PubMed Central  Article  Google Scholar 

  85. Rowe, M. L. (2012). Recording, transcribing, and coding interaction. Research methods in child language: A practical guide, 191–207. doi:https://doi.org/10.1002/9781444344035.ch13

  86. Sacks, C., Shay, S., Repplinger, L., Leffel, K. R., Sapolich, S. G., Suskind, E., … Suskind, D. L. (2014). Pilot testing of a parent-directed intervention (Project ASPIRE) for underserved children who are deaf or hard of hearing. Child Language Teaching and Therapy, 30(1), 91–102. doi:https://doi.org/10.1177/0265659013494873

    Article  Google Scholar 

  87. Sangwan, A., Hansen, J. H. L., Irvin, D. W., Crutchfield, S., & Greenwood, C. R. (2015). Studying the relationship between physical and language environments of children: Who's speaking to whom and where? Paper presented at the Signal Processing and Signal Processing Education Workshop (SP/SPE), 2015 IEEE.

  88. Schwarz, I.-C., Botros, N., Lord, A., Marcusson, A., Tidelius, H., & Marklund, E. (2017). The LENATM system applied to Swedish: Reliability of the Adult Word Count estimate. Paper presented at the Interspeech 2017.

  89. Seidl, A., Cristia, A., Soderstrom, M., Ko, E.-S., Abel, E. A., Kellerman, A., & Schwichtenberg, A. (2018). Infant–mother acoustic–prosodic alignment and developmental risk. Journal of Speech, Language, and Hearing Research, 61(6), 1369–1380.

    PubMed  PubMed Central  Article  Google Scholar 

  90. Sharma, B., Das, R. K., & Li, H. (2019). Multi-level adaptive speech activity detector for speech in naturalistic environments. In Proceedings of the 20thAnnual Conference of the International Speech Communication Association (Interspeech 2019), 2015–2019.

  91. Shneidman, L. A., Arroyo, M. E., Levine, S. C., & Goldin-Meadow, S. (2013). What counts as effective input for word learning? Journal of Child Language, 40(3), 672–686.

    PubMed  Article  Google Scholar 

  92. Sholokhov, A., Sahidullah, M., & Kinnunen, T. (2018). Semi-supervised speech activity detection with an application to automatic speaker verification. Computer Speech and Language, 47, 132–156.

    Article  Google Scholar 

  93. Soderstrom, M., & Wittebolle, K. (2013). When do caregivers talk? The influences of activity and time of day on caregiver speech and child vocalizations in two childcare environments. PLoS One, 8(11), e80646. doi:https://doi.org/10.1371/journal.pone.0080646

    PubMed  PubMed Central  Article  Google Scholar 

  94. Suskind, D. L., Graf, E., Leffel, K. R., Hernandez, M. W., Suskind, E., Webber, R., … Nevins, M. E. (2016a). Project ASPIRE: Spoken language intervention curriculum for parents of low-socioeconomic status and their Deaf and Hard-of-Hearing Children. Otology & Neurotology, 37(2), e110–e117.

    Article  Google Scholar 

  95. Suskind, D. L., Leffel, K. R., Graf, E., Hernandez, M. W., Gunderson, E. A., Sapolich, S. G., … Levine, S. C. (2016b). A parent-directed language intervention for children of low socioeconomic status: A randomized controlled pilot study. Journal of child language, 43(2), 366–406. doi:https://doi.org/10.1017/S0305000915000033

    PubMed  Article  Google Scholar 

  96. Syrdal, A. K., & McGory, J. (2000). Inter-transcriber reliability of ToBI prosodic labeling. Paper presented at the International Conference on Spoken Language Processing, Beijing, China.

  97. Talbot, M. (2015). The talking cure. The New Yorker, 90, 43.

    Google Scholar 

  98. Th. Gries, S. (2015). The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models. Corpora, 10(1), 95–125.

    Article  Google Scholar 

  99. Thiemann-Bourque, K., Warren, S. F., Brady, N., Gilkerson, J., & Richards, J. A. (2014). Vocal interaction between children with Down syndrome and their parents. American Journal of Speech-Language Pathology, 23(3), 474–485. doi:https://doi.org/10.1044/2014_AJSLP-12-0010

    PubMed  PubMed Central  Article  Google Scholar 

  100. VanDam, M., Ambrose, S., & Moeller, M. P. (2012). Quantity of parental language in the home environments of hard-of-hearing 2-year-olds. J Deaf Stud Deaf Educ, 17(4), 402–420. doi:https://doi.org/10.1093/deafed/ens025

    PubMed  PubMed Central  Article  Google Scholar 

  101. VanDam, M., & Silbert, N. H. (2013). Precision and error of automatic speech recognition. Paper presented at the Proceedings of Meetings on Acoustics ICA2013.

  102. VanDam, M., & Silbert, N. H. (2016). Fidelity of automatic speech processing for adult and child talker classifications. PLoS One, 11(8), e0160588. doi:https://doi.org/10.1371/journal.pone.0160588

    PubMed  PubMed Central  Article  Google Scholar 

  103. Vigil, D. C., Hodges, J., & Klee, T. (2005). Quantity and quality of parental language input to late-talking toddlers during play. Child Language Teaching and Therapy, 21(2), 107–122.

    Article  Google Scholar 

  104. Wang, Y., Hartman, M., Aziz, N. A. A., Arora, S., Shi, L., & Tunison, E. (2017). A systematic review of the use of LENA technology. American Annals of the Deaf, 162(3), 295–311.

    PubMed  Article  Google Scholar 

  105. Warlaumont, A. S., Oller, D. K., Dale, R., Richards, J. A., Gilkerson, J., & Xu, D. (2010). Vocal interaction dynamics of children with and without autism. Paper presented at the Proceedings of the Annual Meeting of the Cognitive Science Society.

  106. Warlaumont, A. S., Richards, J. A., Gilkerson, J., & Oller, D. K. (2014). A social feedback loop for speech development and its reduction in autism. Psychological Science, 25(7), 1314–1324. doi:https://doi.org/10.1177/0956797614531023

    PubMed  PubMed Central  Article  Google Scholar 

  107. Warren, S. F., Gilkerson, J., Richards, J. A., Oller, D. K., Xu, D., Yapanel, U., & Gray, S. (2010). What automated vocal analysis reveals about the vocal production and language learning environment of young children with autism. Journal of Autism and Developmental Disorders, 40(5), 555–569. doi:https://doi.org/10.1007/s10803-009-0902-5

    PubMed  Article  Google Scholar 

  108. Weisleder, A., & Fernald, A. (2013). Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychological Science, 24(11), 2143-2152. doi:https://doi.org/10.1177/0956797613488145

    PubMed  PubMed Central  Article  Google Scholar 

  109. Weizman, Z. O., & Snow, C. E. (2001). Lexical output as related to children's vocabulary acquisition: Effects of sophisticated exposure and support for meaning. Developmental Psychology, 37(2), 265–279. doi:https://doi.org/10.1037/0012-1649.37.2.265

    PubMed  Article  Google Scholar 

  110. Wieland, E., Burnham, E., Kondaurova, M. V., Bergeson, T. R., & Dilley, L. C. (2015). Vowel space characteristics of speech directed to children with and without hearing loss. Journal of Speech, Language and Hearing Research, 58(2), 254–267. doi:https://doi.org/10.1044/2015_JSLHR-S-13-0250

    Article  Google Scholar 

  111. Wong, K., Boben, M., & Thomas, C. (2018). Disrupting the early learning status quo: Providence Talks as an innovative policy in diverse urban communities.

  112. Xu, D., Gilkerson, J., Richards, J., Yapanel, U., & Gray, S. (2009a). Child vocalization composition as discriminant information for automatic autism detection. Paper presented at the Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE.

  113. Xu, D., Richards, J. A., Gilkerson, J., Yapanel, U., Gray, S., & Hansen, J. (2009b). Automatic childhood autism detection by vocalization decomposition with phone-like units. Paper presented at the Proceedings of the 2nd Workshop on Child, Computer and Interaction.

  114. Xu, D., Yapanel, U., & Gray, S. (2009c). Reliability of the LENATMLanguage Environment Analysis System in young children's natural home environment (LENA Technical Report LTR-05-2). Retrieved from Boulder, CO: http://lena.org/wp-content/uploads/2016/07/LTR-05-2_Reliability.pdf

  115. Xu, D., Yapanel, U., Gray, S., & Baer, C. T. (2008a). The LENA Language Environment Analysis System: The interpretive time segments (ITS) file. LENA Research Foundation Technical Report LTR-04-2.

  116. Xu, D., Yapanel, U., Gray, S., Gilkerson, J., Richards, J. A., & Hansen, J. H. L. (2008b). Signal processing for young child speech language development. Paper presented at the First Workshop on Child, Computer and Interaction.

  117. Zhang, Y., Xu, X., Jiang, F., Gilkerson, J., Xu, D., Richards, J. A., … Topping, K. J. (2015). Effects of quantitative linguistic feedback to caregivers of young children: A pilot study in China. Communication Disorders Quarterly, 37(1), 16–24. doi:https://doi.org/10.1177/1525740115575771

    Article  Google Scholar 

  118. Zimmerman, F. J., Gilkerson, J., Richards, J. A., Christakis, D. A., Xu, D., Gray, S., & Yapanel, U. (2009). Teaching by listening: the importance of adult-child conversations to language development. Pediatrics, 124(1), 342–349. doi:https://doi.org/10.1542/peds.2008-2267

    PubMed  Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the support of NIH grant R01 DC008581 to D. Houston and L. Dilley. The researchers would like to acknowledge help of Jessica Reed and Yuanyuan Wang for their help in data collection and Somnath Roy for assistance with analyses. We would also like to thank James Chen, Elizabeth Remy, Josh Zhao, Chitra Lakshumanan, Courtney Cameron, Sophia Stevens, Nikaela Losievski, Riley Reed, Kayli Silverstein, and Kelsey Dods for their diligent work coding audio. Thanks to Melanie Soderstrom for sharing previous analysis of LENA reliability with us and for many useful discussions.

Open practices statement

This study was not formally preregistered. The data files and coding manual have been made available on a permanent third-party archive at https://osf.io/2dz4y/.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Laura Dilley.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 11 Demographic characteristics of children in participating families; CI = cochlear implant, HA = hearing aid, and NH = normal hearing. For Family 10, the recording was made prior to the child’s device fitting
Table 12 Conversation block code types and designations selected for the present study

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lehet, M., Arjmandi, M.K., Houston, D. et al. Circumspection in using automated measures: Talker gender and addressee affect error rates for adult speech detection in the Language ENvironment Analysis (LENA) system. Behav Res (2020). https://doi.org/10.3758/s13428-020-01419-y

Download citation

Keywords

  • LENA
  • Speech
  • Language
  • Automatic processing
  • Validation
  • Error