Automatic Scoring of L2 English Speech Based on DNN Acoustic Models with Lattice-Free MMI

Luo, Dean; Guan, Mingxiang; Xia, Linzhong

doi:10.1007/978-3-030-66785-6_13

Dean Luo¹⁷,
Mingxiang Guan¹⁷ &
Linzhong Xia¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 342))

Included in the following conference series:

International Conference on Machine Learning and Intelligent Communications

895 Accesses
2 Citations

Abstract

This paper proposed improved automatic scoring methods for L2 English speaking tests based on acoustic models with lattice-free Maximum Mutual Information (MMI). Deep Neural Network (DNN) acoustic modeling with lattice-free MMI is the state-of-the-art technology in speech recognition because of its effectiveness in sequential discriminative training. Novel Goodness of Pronunciation (GOP) implementations based on lattice free MMI were proposed to improve the performance of automatic scoring for L2 English speech tests. Sequential acoustic weights during forced-alignment and posteriors based on Forward-Backward Algorithm with lattice free MMI acoustic models were used to improved GOP based automatic scoring. Experimental results show that our proposed lattice free MMI based methods outperform conventional regular DNN based automatic scoring methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tsubota, Y., et al.: Practical use of english pronunciation system for Japanese students in the CALL classroom. In: Proceedings of ICSLP 2004, pp. 1689–1692 (2004)
Google Scholar
Zhang, et al.: Generalized segment posterior probability for automatic Mandarin pronunciation evaluation. In: Proceedings of the ICASSP, pp. 201–204 (2007)
Google Scholar
Neri, A., et al.: Automatic Speech Recognition for second language learning: How and why it actually works. In: Proceedings of International Congresses of Phonetic Sciences, pp. 1157–1160 (2003)
Google Scholar
Cardenoso-Payo, V., et al.: Assessment of Non-native Prosody for Spanish as L2 using quantitative scores. In: Proceedings of LREC, pp. 3967–3972 (2014)
Google Scholar
Luo, D., et al.: Automatic pronunciation evaluation of lan-guage learners’ utterances generated through shadowing. In: Proceedings of the INTERSPEECH, pp. 2807–2810 (2008)
Google Scholar
Dahl, G.E.,. et al.: Large-vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of the ICASSP (2011)
Google Scholar
Hu, W., et al.: A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL). In: Proceedings of the INTERSPEECH 2012, pp. 1886–1890 (2012)
Google Scholar
Luo, D., et al.: Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus. In: Proceedings of the INTERSPEECH 2011, pp. 1593–1596 (2011)
Google Scholar
Luo, D., et al.: Naturalness judgement of L2 english through dubbing practice. In: Proceedings of the INTERSPEECH (2016)
Google Scholar
Luo, D., et al.: Factorized deep neural network adaptation for automatic scoring of L2 speech in english speaking tests. In: Proceedings of INTERSPEECH 2018, pp. 1656–1660 (2018)
Google Scholar
Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech Commun. 30(2–3), 95–108 (2000)
Article Google Scholar
Dahl, G.E., et al.: Context-Dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)
Google Scholar
Hadian, H., Sameti, H., Povey, D., et al.: End-to-end speech recognition using lattice-free MMI. In:. Conference of the International Speech Communication Association, pp. 12–16 (2018)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of the ASRU (2011)
Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Institute of Information Technology, Shenzhen, China
Dean Luo, Mingxiang Guan & Linzhong Xia

Authors

Dean Luo
View author publications
You can also search for this author in PubMed Google Scholar
Mingxiang Guan
View author publications
You can also search for this author in PubMed Google Scholar
Linzhong Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dean Luo .

Editor information

Editors and Affiliations

Shenzhen Institute of Information Technology, Shenzhen, China
Mingxiang Guan
Sci & Tech, DianHang Bldg, Rm 321, Dalian Maritime Univ, Sch of Info, Dalian, Liaoning, China
Zhenyu Na

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, D., Guan, M., Xia, L. (2021). Automatic Scoring of L2 English Speech Based on DNN Acoustic Models with Lattice-Free MMI. In: Guan, M., Na, Z. (eds) Machine Learning and Intelligent Communications. MLICOM 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 342. Springer, Cham. https://doi.org/10.1007/978-3-030-66785-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-66785-6_13
Published: 24 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66784-9
Online ISBN: 978-3-030-66785-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics