Skip to main content

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

Abstract

The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software. In the current study, we used children’s speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study. We analyze correct/incorrect classification of the ASR systems using human transcripts at word level, by means of evaluation measures such as Cohen’s Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures. We observe improvements for the newly developed ASR systems regarding the agreement with human-based judgment and correct rejection (CR). The accuracy of the ASR systems varies for different reading tasks and word types. Our results suggest that, in the current configuration, it is difficult to classify isolated words. We discuss these results, possible ways to improve our systems and avenues for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://hstrik.ruhosting.nl/DART/.

  2. 2.

    http://www.readingassistant.com/.

  3. 3.

    https://www.ibm.com/ibm/responsibility/downloads/initiatives/ReadingCompanion.pdf.

  4. 4.

    http://readingbuddysoftware.com/.

  5. 5.

    http://kaldi-asr.org/doc/.

References

  1. Abdou, S.M., et al.: Computer aided pronunciation learning system using speech recognition techniques. In: Proceedings of the Interspeech, pp. 849–852, Pittsburgh, PA, USA, 17–21 September 2006

    Google Scholar 

  2. Bai, Y., Hubers, F., Cucchiarini, C., Strik, H.: ASR-based evaluation and feedback for individualized reading practice. In: Proceedings of the Interspeech, pp. 3870–3874, 2020–2842 (2020). https://doi.org/10.21437/Interspeech

  3. Bai, Y., Hubers, F., Cucchiarini, C., Strik, H.: An ASR-based reading tutor for practicing reading skills in the first grade: improving performance through threshold adjustment. In: Proceedings of the IberSPEECH 2021, pp. 11–15 (2021). https://doi.org/10.21437/IberSPEECH.2021-3

  4. Banerjee, S., Beck, J., Mostow, J.: Evaluating the effect of predicting oral reading miscues. In: Proceedings of the Interspeech, pp. 3165–3168, Geneva, Switzerland, 1–4 September (2003)

    Google Scholar 

  5. Black, M.P., Tepperman, J., Narayanan, S.S.: Automatic prediction of children’s reading ability for high-level literacy assessment. IEEE Trans. Audio Speech Lang. Process. 19(4), 1015–1028 (2011). https://doi.org/10.1109/TASL.2010.2076389

    Article  Google Scholar 

  6. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. Plos One 12(6), e0177678 (2017). https://doi.org/10.1371/journal.pone.0177678

    Article  Google Scholar 

  7. Castles, A., Rastle, K., Nation, K.: Ending the reading wars: reading acquisition from novice to expert. Psychol. Sci. Public Interest 19(1), 5–51 (2018). https://doi.org/10.1177/1529100618772271

    Article  Google Scholar 

  8. Cucchiarini, C., Van hamme, H.: The JASMIN speech corpus: recordings of children, non-natives and elderly people. In: Spyns, P., Odijk, J. (eds.) Essential Speech and Language Technology for Dutch. TANLP, pp. 43–59. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-30910-6_3

    Chapter  Google Scholar 

  9. Duchateau, J., et al.: Developing a reading tutor: design and evaluation of dedicated speech recognition and synthesis modules. Speech Commun. 51(10), 985–994 (2009). https://doi.org/10.1016/j.specom.2009.04.010

    Article  Google Scholar 

  10. Goldwater, S., Jurafsky, D., Manning, C.D.: Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun. 52(3), 181 (2010). https://doi.org/10.1016/j.specom.2009.10.001

    Article  Google Scholar 

  11. Hagen, A., Pellom, B., Cole, R.: Children’s speech recognition with application to interactive books and tutors. In: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721), pp. 186–191 (2003). https://doi.org/10.1109/ASRU.2003.1318426

  12. Hsu, L.: An empirical examination of EFL learners’ perceptual learning styles and acceptance of ASR-based computer-assisted pronunciation training. Comput. Assist. Lang. Learn. 29(5), 881–900 (2016)

    Article  Google Scholar 

  13. Joshi, V., Zhao, R., Mehta, R.R., Kumar, K., Li, J.: Transfer learning approaches for streaming end-to-end speech recognition system (2020)

    Google Scholar 

  14. Kipyatkova, I., Karpov, A.: DNN-based acoustic modeling for Russian speech recognition using Kaldi. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 246–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_29

    Chapter  Google Scholar 

  15. Kocharov, D.: Automatic alignment of phonetic transcriptions for Russian. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 123–128. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11581-8_15

    Chapter  Google Scholar 

  16. Kraljevski, I., Tan, Z.H., Bissiri, M.P.: Comparison of forced-alignment speech recognition and humans for generating reference VAD. In: Proceedings of the Interspeech, pp. 2937–2941. Dresden, Germany, 6–10 September (2015)

    Google Scholar 

  17. Kuhn, M.R., Schwanenflugel, P.J., Meisinger, E.B., Levy, B.A., Rasinski, T.V.: Aligning theory and assessment of reading fluency: Automaticity, prosody, and definitions of fluency. Read. Res. Q. 45(2), 230–251 (2010). https://doi.org/10.1598/rrq.45.2.4

    Article  Google Scholar 

  18. Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977). https://doi.org/10.2307/2529310

    Article  MATH  Google Scholar 

  19. Lee, A., Kawahara, T., Shikano, K.: Julius–an open source real-time large vocabulary recognition engine. In: EUROSPEECH 2001, pp. 1691–1694 (2001)

    Google Scholar 

  20. Li, X.L., Deng, L., Ju, Y.C., Acero, A.: Automatic children’s reading tutor on hand-held devices. In: Proceedings of the InterSpeech, pp. 1733–1736. International Speech Communication Association, Brisbane, Australia, 22–26 September 2008. https://www.microsoft.com/en-us/research/publication/automatic-childrens-reading-tutor-on-hand-held-devices/

  21. Limonard, S., Cucchiarini, C., van Hout, R., Strik, H.: Analyzing read aloud speech by primary school pupils: insights for research and development. In: Proceedings of the Interspeech, pp. 3710–3714, Shanghai, China, 25–29 October 2020. https://doi.org/10.21437/Interspeech.2020-2804

  22. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002)

    Google Scholar 

  23. McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., Sonderegger, M.: Montreal forced aligner: trainable text-speech alignment using Kaldi. In: Proceedings of the Interspeech, pp. 498–502, Stockholm, Sweden, 20–24 August (2017). https://doi.org/10.21437/Interspeech.2017-1386

  24. Mohri, M., Riley, M.: Weighted finite-state transducers in speech recognition (tutorial). In: Proceedings of the ICSLP, Denver, Colorado, USA, 16–20 September (2002)

    Google Scholar 

  25. Mostow, J.: Is ASR accurate enough for automated reading tutors, and how can we tell? In: Proceedings of the Interspeech, pp. 837–840, Pittsburgh, PA, USA, 17–21 September (2006)

    Google Scholar 

  26. Mostow, J., Nelson-Taylor, J., Beck, J.E.: Computer-guided oral reading versus independent practice: comparison of sustained silent reading to an automated reading tutor that listens. J. Educ. Comput. Res. 49(2), 249–276 (2013). https://doi.org/10.2190/EC.49.2.g

    Article  Google Scholar 

  27. Mostow, J., Roth, S.F., Hauptmann, A.G., Kane, M.: A prototype reading coach that listens. In: Proceedings of the AAAI, pp. 785–792, Seattle, Washington, WA, 31 August – September 4 (1994)

    Google Scholar 

  28. Pikulski, J.J., Chard, D.J.: Fluency: bridge between decoding and reading comprehension. Read. Teach. 58(6), 510–519 (2005). http://www.jstor.org/stable/20205516

    Article  Google Scholar 

  29. Povey et al. D.: The Kaldi speech recognition toolkit. In: Proceedings of the ASRU, pp. 1–4, Waikoloa, Hawaii, HI, USA, 11–15 December 2011

    Google Scholar 

  30. Qian, Y., Evanini, K., Wang, X., Lee, C.M., Mulholland, M.D.: Bidirectional LSTM-RNN for improving automated assessment of non-native children’s speech. In: Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017. 1https://doi.org/10.21437/Interspeech.2017-1386

  31. Rao, P., Swarup, P., Pasad, A., Tulsiani, H., Das, G.G.: Automatic assessment of reading with speech recognition technology. In: Copyright 2016 Asia-Pacific Society for Computers in Education All rights Reserved. No part of this Book May Be Reproduced, Stored in a Retrieval System, Transmitted, in Any Forms or Any Means, Without the Prior permission of the Asia-Pacific Society for Computers in Education, p. 1. ISBN 9789868473591 (2016)

    Google Scholar 

  32. Reeder, K., Shapiro, J., Wakefield, J., D’Silva, R.: Speech recognition software contributes to reading development for young learners of English. Int. J. Comput. Assist. Lang. Learn. Teach. 5(3), 60–74 (2015). https://doi.org/10.4018/ijcallt.2015070104

    Article  Google Scholar 

  33. Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM Transition probabilities. In: Proceedings of the Interspeech, pp. 954–958 (2019)

    Google Scholar 

  34. Tejedor-García, C., Cardeñoso-Payo, V., Escudero-Mancebo, D.: Performance comparison of specific and general-purpose ASR systems for pronunciation assessment of japanese learners of Spanish. In: Proceedings of the IberSPEECH 2021, pp. 6–10 (2021). https://doi.org/10.21437/IberSPEECH.2021-2

  35. Tepperman, J., et al.: A Bayesian network classifier for word-level reading assessment. In: Proceedings of the Interspeech, pp. 2185–2188, ISCA, Antwerp, Belgium, 27–31 August (2007). http://www.isca-speech.org/archive/interspeech_2007/i07_2185.html

  36. Wise, B., et al.: Learning to read with a virtual tutor: foundations to literacy. In: Kinzer, C., Verhoeven, L. (eds.) Interactive Literacy Education: Facilitating Literacy Learning Environments Through Technology. Lawrence Erlbaum, Mahwah (2005) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.7734

    Google Scholar 

Download references

Acknowledgements

The current research is carried out within the ‘Dutch ASR-based Reading Tutor’ (DART) project (http://hstrik.ruhosting.nl/DART). This work is part of the Netherlands Initiative for Education Research (NRO) with project number 40.5.18540.121, which is financed by the Dutch Research Council (NWO). We would like to thank children who used the reading tutor at home during the pandemic, their parents and teachers who gave us informative feedback and advice in questionnaires and interviews and schools that participated in the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Bai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bai, Y., Tejedor-García, C., Hubers, F., Cucchiarini, C., Strik, H. (2021). An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics