Abstract
The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software. In the current study, we used children’s speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study. We analyze correct/incorrect classification of the ASR systems using human transcripts at word level, by means of evaluation measures such as Cohen’s Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures. We observe improvements for the newly developed ASR systems regarding the agreement with human-based judgment and correct rejection (CR). The accuracy of the ASR systems varies for different reading tasks and word types. Our results suggest that, in the current configuration, it is difficult to classify isolated words. We discuss these results, possible ways to improve our systems and avenues for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdou, S.M., et al.: Computer aided pronunciation learning system using speech recognition techniques. In: Proceedings of the Interspeech, pp. 849–852, Pittsburgh, PA, USA, 17–21 September 2006
Bai, Y., Hubers, F., Cucchiarini, C., Strik, H.: ASR-based evaluation and feedback for individualized reading practice. In: Proceedings of the Interspeech, pp. 3870–3874, 2020–2842 (2020). https://doi.org/10.21437/Interspeech
Bai, Y., Hubers, F., Cucchiarini, C., Strik, H.: An ASR-based reading tutor for practicing reading skills in the first grade: improving performance through threshold adjustment. In: Proceedings of the IberSPEECH 2021, pp. 11–15 (2021). https://doi.org/10.21437/IberSPEECH.2021-3
Banerjee, S., Beck, J., Mostow, J.: Evaluating the effect of predicting oral reading miscues. In: Proceedings of the Interspeech, pp. 3165–3168, Geneva, Switzerland, 1–4 September (2003)
Black, M.P., Tepperman, J., Narayanan, S.S.: Automatic prediction of children’s reading ability for high-level literacy assessment. IEEE Trans. Audio Speech Lang. Process. 19(4), 1015–1028 (2011). https://doi.org/10.1109/TASL.2010.2076389
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. Plos One 12(6), e0177678 (2017). https://doi.org/10.1371/journal.pone.0177678
Castles, A., Rastle, K., Nation, K.: Ending the reading wars: reading acquisition from novice to expert. Psychol. Sci. Public Interest 19(1), 5–51 (2018). https://doi.org/10.1177/1529100618772271
Cucchiarini, C., Van hamme, H.: The JASMIN speech corpus: recordings of children, non-natives and elderly people. In: Spyns, P., Odijk, J. (eds.) Essential Speech and Language Technology for Dutch. TANLP, pp. 43–59. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-30910-6_3
Duchateau, J., et al.: Developing a reading tutor: design and evaluation of dedicated speech recognition and synthesis modules. Speech Commun. 51(10), 985–994 (2009). https://doi.org/10.1016/j.specom.2009.04.010
Goldwater, S., Jurafsky, D., Manning, C.D.: Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun. 52(3), 181 (2010). https://doi.org/10.1016/j.specom.2009.10.001
Hagen, A., Pellom, B., Cole, R.: Children’s speech recognition with application to interactive books and tutors. In: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721), pp. 186–191 (2003). https://doi.org/10.1109/ASRU.2003.1318426
Hsu, L.: An empirical examination of EFL learners’ perceptual learning styles and acceptance of ASR-based computer-assisted pronunciation training. Comput. Assist. Lang. Learn. 29(5), 881–900 (2016)
Joshi, V., Zhao, R., Mehta, R.R., Kumar, K., Li, J.: Transfer learning approaches for streaming end-to-end speech recognition system (2020)
Kipyatkova, I., Karpov, A.: DNN-based acoustic modeling for Russian speech recognition using Kaldi. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 246–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_29
Kocharov, D.: Automatic alignment of phonetic transcriptions for Russian. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 123–128. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11581-8_15
Kraljevski, I., Tan, Z.H., Bissiri, M.P.: Comparison of forced-alignment speech recognition and humans for generating reference VAD. In: Proceedings of the Interspeech, pp. 2937–2941. Dresden, Germany, 6–10 September (2015)
Kuhn, M.R., Schwanenflugel, P.J., Meisinger, E.B., Levy, B.A., Rasinski, T.V.: Aligning theory and assessment of reading fluency: Automaticity, prosody, and definitions of fluency. Read. Res. Q. 45(2), 230–251 (2010). https://doi.org/10.1598/rrq.45.2.4
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977). https://doi.org/10.2307/2529310
Lee, A., Kawahara, T., Shikano, K.: Julius–an open source real-time large vocabulary recognition engine. In: EUROSPEECH 2001, pp. 1691–1694 (2001)
Li, X.L., Deng, L., Ju, Y.C., Acero, A.: Automatic children’s reading tutor on hand-held devices. In: Proceedings of the InterSpeech, pp. 1733–1736. International Speech Communication Association, Brisbane, Australia, 22–26 September 2008. https://www.microsoft.com/en-us/research/publication/automatic-childrens-reading-tutor-on-hand-held-devices/
Limonard, S., Cucchiarini, C., van Hout, R., Strik, H.: Analyzing read aloud speech by primary school pupils: insights for research and development. In: Proceedings of the Interspeech, pp. 3710–3714, Shanghai, China, 25–29 October 2020. https://doi.org/10.21437/Interspeech.2020-2804
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002)
McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., Sonderegger, M.: Montreal forced aligner: trainable text-speech alignment using Kaldi. In: Proceedings of the Interspeech, pp. 498–502, Stockholm, Sweden, 20–24 August (2017). https://doi.org/10.21437/Interspeech.2017-1386
Mohri, M., Riley, M.: Weighted finite-state transducers in speech recognition (tutorial). In: Proceedings of the ICSLP, Denver, Colorado, USA, 16–20 September (2002)
Mostow, J.: Is ASR accurate enough for automated reading tutors, and how can we tell? In: Proceedings of the Interspeech, pp. 837–840, Pittsburgh, PA, USA, 17–21 September (2006)
Mostow, J., Nelson-Taylor, J., Beck, J.E.: Computer-guided oral reading versus independent practice: comparison of sustained silent reading to an automated reading tutor that listens. J. Educ. Comput. Res. 49(2), 249–276 (2013). https://doi.org/10.2190/EC.49.2.g
Mostow, J., Roth, S.F., Hauptmann, A.G., Kane, M.: A prototype reading coach that listens. In: Proceedings of the AAAI, pp. 785–792, Seattle, Washington, WA, 31 August – September 4 (1994)
Pikulski, J.J., Chard, D.J.: Fluency: bridge between decoding and reading comprehension. Read. Teach. 58(6), 510–519 (2005). http://www.jstor.org/stable/20205516
Povey et al. D.: The Kaldi speech recognition toolkit. In: Proceedings of the ASRU, pp. 1–4, Waikoloa, Hawaii, HI, USA, 11–15 December 2011
Qian, Y., Evanini, K., Wang, X., Lee, C.M., Mulholland, M.D.: Bidirectional LSTM-RNN for improving automated assessment of non-native children’s speech. In: Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017. 1https://doi.org/10.21437/Interspeech.2017-1386
Rao, P., Swarup, P., Pasad, A., Tulsiani, H., Das, G.G.: Automatic assessment of reading with speech recognition technology. In: Copyright 2016 Asia-Pacific Society for Computers in Education All rights Reserved. No part of this Book May Be Reproduced, Stored in a Retrieval System, Transmitted, in Any Forms or Any Means, Without the Prior permission of the Asia-Pacific Society for Computers in Education, p. 1. ISBN 9789868473591 (2016)
Reeder, K., Shapiro, J., Wakefield, J., D’Silva, R.: Speech recognition software contributes to reading development for young learners of English. Int. J. Comput. Assist. Lang. Learn. Teach. 5(3), 60–74 (2015). https://doi.org/10.4018/ijcallt.2015070104
Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM Transition probabilities. In: Proceedings of the Interspeech, pp. 954–958 (2019)
Tejedor-García, C., Cardeñoso-Payo, V., Escudero-Mancebo, D.: Performance comparison of specific and general-purpose ASR systems for pronunciation assessment of japanese learners of Spanish. In: Proceedings of the IberSPEECH 2021, pp. 6–10 (2021). https://doi.org/10.21437/IberSPEECH.2021-2
Tepperman, J., et al.: A Bayesian network classifier for word-level reading assessment. In: Proceedings of the Interspeech, pp. 2185–2188, ISCA, Antwerp, Belgium, 27–31 August (2007). http://www.isca-speech.org/archive/interspeech_2007/i07_2185.html
Wise, B., et al.: Learning to read with a virtual tutor: foundations to literacy. In: Kinzer, C., Verhoeven, L. (eds.) Interactive Literacy Education: Facilitating Literacy Learning Environments Through Technology. Lawrence Erlbaum, Mahwah (2005) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.7734
Acknowledgements
The current research is carried out within the ‘Dutch ASR-based Reading Tutor’ (DART) project (http://hstrik.ruhosting.nl/DART). This work is part of the Netherlands Initiative for Education Research (NRO) with project number 40.5.18540.121, which is financed by the Dutch Research Council (NWO). We would like to thank children who used the reading tutor at home during the pandemic, their parents and teachers who gave us informative feedback and advice in questionnaires and interviews and schools that participated in the experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bai, Y., Tejedor-García, C., Hubers, F., Cucchiarini, C., Strik, H. (2021). An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-87802-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)