Animated Pronunciation Generated from Speech for Pronunciation Training

Iribe, Yurie; Manosavan, Silasak; Katsurada, Kouichi; Nitta, Tsuneo

doi:10.1007/978-3-642-29934-6_8

Yurie Iribe⁶,
Silasak Manosavan⁷,
Kouichi Katsurada⁷ &
…
Tsuneo Nitta⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 14))

1185 Accesses

Abstract

Computer-assisted pronunciation training (CAPT) was introduced for language education in recent years. CAPT scores the learner’s pronunciation quality and points out wrong phonemes by using speech recognition technology. However, although the learner can thus realize that his/her speech is different from the teacher’s, the learner still cannot control the articulation organs to pronounce correctly. The learner cannot understand how to correct the wrong articulatory gestures precisely. We indicate these differences by visualizing a learner’s wrong pronunciation movements and the correct pronunciation movements with CG animation. We propose a system for generating animated pronunciation by estimating a learner’s pronunciation movements from his/her speech automatically. The proposed system maps speech to coordinate values that are needed to generate the animations by using multi-layer neural networks (MLN). We use MRI data to generate smooth animated pronunciations. Additionally, we verify whether the vocal tract area and articulatory features are suitable as characteristics of pronunciation movement through experimental evaluation

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Delmonte, R.: SLIM prosodic automatic tools for self-learning instruction. Speech Communication 30(2-3), 145–166 (2000)
Article Google Scholar
Gamper, J., Knapp, J.: A Review of Intelligent CALL Systems. Computer Assisted Language Learning 15(4), 329–342 (2002)
Article Google Scholar
Neumeyer, L., Franco, H., Digalakis, V., Weintraub, M.: Automatic scoring of pronunciation quality. Speech Communication 30(2-3), 83–93 (2000)
Article Google Scholar
Witt, S.M., Young, S.J.: Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30(2-3), 95–108 (1995)
Article Google Scholar
Deroo, O., Ris, C., Gielen, S., Vanparys, J.: Automatic detection of mispronounced phonemes for language learning tools. In: Proceedings of ICSLP 2000, vol. 1, pp. 681–684 (2000)
Google Scholar
Wang, S., Higgins, M., Shima, Y.: Training English pronunciation for Japanese learners of English online. The JALT Call Journal 1(1), 39–47 (2005)
Google Scholar
Phonetics Flash Animation Project, http://www.uiowa.edu/~acadtech/phonetics/
Wong, K.H., Lo, W.K., Meng, H.: Allophonic variations in visual speech synthesis for corrective feedback in capt. In: Proc. ICASSP 2011, pp. 5708–5711 (2011)
Google Scholar
Iribe, Y., Manosavanh, S., Katsurada, K., Hayashi, R., Zhu, C., Nitta, T.: Generation Animated Pronunciation from Speech through Articulatory Feature Extraction. In: Proc. of Interspeecch 2011, pp. 1617–1621 (2011)
Google Scholar
Huda, M.N., Katsurada, K., Nitta, T.: Phoneme recognition based on hybrid neural networks with inhibition/enhancement of Distinctive Phonetic Feature (DPF) trajectories. In: Proc. Interspeech 2008, pp. 1529–1532 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Information and Media Center, Toyohashi University of Technology, Japan, Toyohashi
Yurie Iribe
Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
Silasak Manosavan, Kouichi Katsurada & Tsuneo Nitta

Authors

Yurie Iribe
View author publications
You can also search for this author in PubMed Google Scholar
Silasak Manosavan
View author publications
You can also search for this author in PubMed Google Scholar
Kouichi Katsurada
View author publications
You can also search for this author in PubMed Google Scholar
Tsuneo Nitta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yurie Iribe .

Editor information

Editors and Affiliations

Graduate School of Information Science, Nagoya University, Furo-cho, Nagoya, 464-8601, Japan
Toyohide Watanabe
Graduate School of Information,, Production & Systems (IPS), Waseda University, Hibikino 2-7, Kitakyushu, 808-0135, Japan
Junzo Watada
Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, 466-8555, Japan
Naohisa Takahashi
KES International, PO Box 2115, Shoreham-by-Sea, BN43 9AF, United Kingdom
Robert J. Howlett
, School of Electrical and Information, University of South Australia, Mawson Lakes Campus, Adelaide, SA 5095, South Australia, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Iribe, Y., Manosavan, S., Katsurada, K., Nitta, T. (2012). Animated Pronunciation Generated from Speech for Pronunciation Training. In: Watanabe, T., Watada, J., Takahashi, N., Howlett, R., Jain, L. (eds) Intelligent Interactive Multimedia: Systems and Services. Smart Innovation, Systems and Technologies, vol 14. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29934-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-29934-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29933-9
Online ISBN: 978-3-642-29934-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics