An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Bai, Yu; Tejedor-García, Cristian; Hubers, Ferdy; Cucchiarini, Catia; Strik, Helmer

doi:10.1007/978-3-030-87802-3_6

Yu Bai¹⁰,
Cristian Tejedor-García¹⁰,
Ferdy Hubers^10,11,
Catia Cucchiarini¹⁰ &
…
Helmer Strik^10,11,12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12997))

Included in the following conference series:

International Conference on Speech and Computer

1693 Accesses
1 Altmetric

Abstract

The interest in employing automatic speech recognition (ASR) in applications for reading practice has been growing in recent years. In a previous study, we presented an ASR-based Dutch reading tutor application that was developed to provide instantaneous feedback to first-graders learning to read. We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software. In the current study, we used children’s speech from an existing corpus (JASMIN) to develop two new ASR systems, and compared the results to those of the previous study. We analyze correct/incorrect classification of the ASR systems using human transcripts at word level, by means of evaluation measures such as Cohen’s Kappa, Matthews Correlation Coefficient (MCC), precision, recall and F-measures. We observe improvements for the newly developed ASR systems regarding the agreement with human-based judgment and correct rejection (CR). The accuracy of the ASR systems varies for different reading tasks and word types. Our results suggest that, in the current configuration, it is difficult to classify isolated words. We discuss these results, possible ways to improve our systems and avenues for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Integrating Speech Technology into the iSTART-Early Intelligent Tutoring System

Automatic assessment of children’s oral reading using speech recognition and prosody modeling

Article 25 June 2018

An Empirical Study of the Effect of ASR-Supported English Reading Aloud Practices on Pronunciation Accuracy

Notes

References

Abdou, S.M., et al.: Computer aided pronunciation learning system using speech recognition techniques. In: Proceedings of the Interspeech, pp. 849–852, Pittsburgh, PA, USA, 17–21 September 2006
Google Scholar
Bai, Y., Hubers, F., Cucchiarini, C., Strik, H.: ASR-based evaluation and feedback for individualized reading practice. In: Proceedings of the Interspeech, pp. 3870–3874, 2020–2842 (2020). https://doi.org/10.21437/Interspeech
Bai, Y., Hubers, F., Cucchiarini, C., Strik, H.: An ASR-based reading tutor for practicing reading skills in the first grade: improving performance through threshold adjustment. In: Proceedings of the IberSPEECH 2021, pp. 11–15 (2021). https://doi.org/10.21437/IberSPEECH.2021-3
Banerjee, S., Beck, J., Mostow, J.: Evaluating the effect of predicting oral reading miscues. In: Proceedings of the Interspeech, pp. 3165–3168, Geneva, Switzerland, 1–4 September (2003)
Google Scholar
Black, M.P., Tepperman, J., Narayanan, S.S.: Automatic prediction of children’s reading ability for high-level literacy assessment. IEEE Trans. Audio Speech Lang. Process. 19(4), 1015–1028 (2011). https://doi.org/10.1109/TASL.2010.2076389
Article Google Scholar
Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. Plos One 12(6), e0177678 (2017). https://doi.org/10.1371/journal.pone.0177678
Article Google Scholar
Castles, A., Rastle, K., Nation, K.: Ending the reading wars: reading acquisition from novice to expert. Psychol. Sci. Public Interest 19(1), 5–51 (2018). https://doi.org/10.1177/1529100618772271
Article Google Scholar
Cucchiarini, C., Van hamme, H.: The JASMIN speech corpus: recordings of children, non-natives and elderly people. In: Spyns, P., Odijk, J. (eds.) Essential Speech and Language Technology for Dutch. TANLP, pp. 43–59. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-30910-6_3
Chapter Google Scholar
Duchateau, J., et al.: Developing a reading tutor: design and evaluation of dedicated speech recognition and synthesis modules. Speech Commun. 51(10), 985–994 (2009). https://doi.org/10.1016/j.specom.2009.04.010
Article Google Scholar
Goldwater, S., Jurafsky, D., Manning, C.D.: Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Commun. 52(3), 181 (2010). https://doi.org/10.1016/j.specom.2009.10.001
Article Google Scholar
Hagen, A., Pellom, B., Cole, R.: Children’s speech recognition with application to interactive books and tutors. In: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No. 03EX721), pp. 186–191 (2003). https://doi.org/10.1109/ASRU.2003.1318426
Hsu, L.: An empirical examination of EFL learners’ perceptual learning styles and acceptance of ASR-based computer-assisted pronunciation training. Comput. Assist. Lang. Learn. 29(5), 881–900 (2016)
Article Google Scholar
Joshi, V., Zhao, R., Mehta, R.R., Kumar, K., Li, J.: Transfer learning approaches for streaming end-to-end speech recognition system (2020)
Google Scholar
Kipyatkova, I., Karpov, A.: DNN-based acoustic modeling for Russian speech recognition using Kaldi. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 246–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_29
Chapter Google Scholar
Kocharov, D.: Automatic alignment of phonetic transcriptions for Russian. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS (LNAI), vol. 8773, pp. 123–128. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11581-8_15
Chapter Google Scholar
Kraljevski, I., Tan, Z.H., Bissiri, M.P.: Comparison of forced-alignment speech recognition and humans for generating reference VAD. In: Proceedings of the Interspeech, pp. 2937–2941. Dresden, Germany, 6–10 September (2015)
Google Scholar
Kuhn, M.R., Schwanenflugel, P.J., Meisinger, E.B., Levy, B.A., Rasinski, T.V.: Aligning theory and assessment of reading fluency: Automaticity, prosody, and definitions of fluency. Read. Res. Q. 45(2), 230–251 (2010). https://doi.org/10.1598/rrq.45.2.4
Article Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159 (1977). https://doi.org/10.2307/2529310
Article MATH Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius–an open source real-time large vocabulary recognition engine. In: EUROSPEECH 2001, pp. 1691–1694 (2001)
Google Scholar
Li, X.L., Deng, L., Ju, Y.C., Acero, A.: Automatic children’s reading tutor on hand-held devices. In: Proceedings of the InterSpeech, pp. 1733–1736. International Speech Communication Association, Brisbane, Australia, 22–26 September 2008. https://www.microsoft.com/en-us/research/publication/automatic-childrens-reading-tutor-on-hand-held-devices/
Limonard, S., Cucchiarini, C., van Hout, R., Strik, H.: Analyzing read aloud speech by primary school pupils: insights for research and development. In: Proceedings of the Interspeech, pp. 3710–3714, Shanghai, China, 25–29 October 2020. https://doi.org/10.21437/Interspeech.2020-2804
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia (2002)
Google Scholar
McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., Sonderegger, M.: Montreal forced aligner: trainable text-speech alignment using Kaldi. In: Proceedings of the Interspeech, pp. 498–502, Stockholm, Sweden, 20–24 August (2017). https://doi.org/10.21437/Interspeech.2017-1386
Mohri, M., Riley, M.: Weighted finite-state transducers in speech recognition (tutorial). In: Proceedings of the ICSLP, Denver, Colorado, USA, 16–20 September (2002)
Google Scholar
Mostow, J.: Is ASR accurate enough for automated reading tutors, and how can we tell? In: Proceedings of the Interspeech, pp. 837–840, Pittsburgh, PA, USA, 17–21 September (2006)
Google Scholar
Mostow, J., Nelson-Taylor, J., Beck, J.E.: Computer-guided oral reading versus independent practice: comparison of sustained silent reading to an automated reading tutor that listens. J. Educ. Comput. Res. 49(2), 249–276 (2013). https://doi.org/10.2190/EC.49.2.g
Article Google Scholar
Mostow, J., Roth, S.F., Hauptmann, A.G., Kane, M.: A prototype reading coach that listens. In: Proceedings of the AAAI, pp. 785–792, Seattle, Washington, WA, 31 August – September 4 (1994)
Google Scholar
Pikulski, J.J., Chard, D.J.: Fluency: bridge between decoding and reading comprehension. Read. Teach. 58(6), 510–519 (2005). http://www.jstor.org/stable/20205516
Article Google Scholar
Povey et al. D.: The Kaldi speech recognition toolkit. In: Proceedings of the ASRU, pp. 1–4, Waikoloa, Hawaii, HI, USA, 11–15 December 2011
Google Scholar
Qian, Y., Evanini, K., Wang, X., Lee, C.M., Mulholland, M.D.: Bidirectional LSTM-RNN for improving automated assessment of non-native children’s speech. In: Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017. 1https://doi.org/10.21437/Interspeech.2017-1386
Rao, P., Swarup, P., Pasad, A., Tulsiani, H., Das, G.G.: Automatic assessment of reading with speech recognition technology. In: Copyright 2016 Asia-Pacific Society for Computers in Education All rights Reserved. No part of this Book May Be Reproduced, Stored in a Retrieval System, Transmitted, in Any Forms or Any Means, Without the Prior permission of the Asia-Pacific Society for Computers in Education, p. 1. ISBN 9789868473591 (2016)
Google Scholar
Reeder, K., Shapiro, J., Wakefield, J., D’Silva, R.: Speech recognition software contributes to reading development for young learners of English. Int. J. Comput. Assist. Lang. Learn. Teach. 5(3), 60–74 (2015). https://doi.org/10.4018/ijcallt.2015070104
Article Google Scholar
Sudhakara, S., Ramanathi, M.K., Yarra, C., Ghosh, P.K.: An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM Transition probabilities. In: Proceedings of the Interspeech, pp. 954–958 (2019)
Google Scholar
Tejedor-García, C., Cardeñoso-Payo, V., Escudero-Mancebo, D.: Performance comparison of specific and general-purpose ASR systems for pronunciation assessment of japanese learners of Spanish. In: Proceedings of the IberSPEECH 2021, pp. 6–10 (2021). https://doi.org/10.21437/IberSPEECH.2021-2
Tepperman, J., et al.: A Bayesian network classifier for word-level reading assessment. In: Proceedings of the Interspeech, pp. 2185–2188, ISCA, Antwerp, Belgium, 27–31 August (2007). http://www.isca-speech.org/archive/interspeech_2007/i07_2185.html
Wise, B., et al.: Learning to read with a virtual tutor: foundations to literacy. In: Kinzer, C., Verhoeven, L. (eds.) Interactive Literacy Education: Facilitating Literacy Learning Environments Through Technology. Lawrence Erlbaum, Mahwah (2005) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.7734
Google Scholar

Download references

Acknowledgements

The current research is carried out within the ‘Dutch ASR-based Reading Tutor’ (DART) project (http://hstrik.ruhosting.nl/DART). This work is part of the Netherlands Initiative for Education Research (NRO) with project number 40.5.18540.121, which is financed by the Dutch Research Council (NWO). We would like to thank children who used the reading tutor at home during the pandemic, their parents and teachers who gave us informative feedback and advice in questionnaires and interviews and schools that participated in the experiments.

Author information

Authors and Affiliations

Centre for Language and Speech Technology (CLST), Radboud University Nijmegen, Nijmegen, The Netherlands
Yu Bai, Cristian Tejedor-García, Ferdy Hubers, Catia Cucchiarini & Helmer Strik
Centre for Language Studies (CLS), Radboud University Nijmegen, Nijmegen, The Netherlands
Ferdy Hubers & Helmer Strik
Donders Institute for Brain, Cognition and Behaviour, Radboud University Nijmegen, Nijmegen, The Netherlands
Helmer Strik
NovoLearning B.V., Nijmegen, The Netherlands
Helmer Strik

Authors

Yu Bai
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Tejedor-García
View author publications
You can also search for this author in PubMed Google Scholar
Ferdy Hubers
View author publications
You can also search for this author in PubMed Google Scholar
Catia Cucchiarini
View author publications
You can also search for this author in PubMed Google Scholar
Helmer Strik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Bai .

Editor information

Editors and Affiliations

St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, Y., Tejedor-García, C., Hubers, F., Cucchiarini, C., Strik, H. (2021). An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-87802-3_6
Published: 22 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Abstract

Access this chapter

Similar content being viewed by others

Integrating Speech Technology into the iSTART-Early Intelligent Tutoring System

Automatic assessment of children’s oral reading using speech recognition and prosody modeling

An Empirical Study of the Effect of ASR-Supported English Reading Aloud Practices on Pronunciation Accuracy

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An ASR-Based Tutor for Learning to Read: How to Optimize Feedback to First Graders

Abstract

Access this chapter

Similar content being viewed by others

Integrating Speech Technology into the iSTART-Early Intelligent Tutoring System

Automatic assessment of children’s oral reading using speech recognition and prosody modeling

An Empirical Study of the Effect of ASR-Supported English Reading Aloud Practices on Pronunciation Accuracy

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation