Abstract
Although automatic speech recognition (ASR) has been commercially used for general public, it still does not perform sufficiently well for people with speech disorders (e.g., dysarthria). Multimodal ASR, which involves multiple sources of signals, has recently shown potential to improve the performance of dysarthric speech recognition. When multiple views (sources) of data (e.g., acoustic and articulatory) are available for training while only one view (e.g., acoustic) is available for testing, a better representation can be learned by simultaneously analyzing multiple sources of data. Although multi-view representation learning has recently used in normal speech recognition, it has rarely been studied in dysarthric speech recognition. In this paper, we investigate the effectiveness of multi-view representation learning via canonical correlation analysis (CCA) for dysarthric speech recognition. A representation of acoustic data is learned using CCA from the multi-view data (acoustic and articulatory). The articulatory data was simultaneously recorded with acoustic data using electromagnetic articulograph. Experimental evaluation on a database collected from nine patients with dysarthria due to Lou Gehrig’s disease demonstrated the effectiveness of the multi-view representation learning via CCA on deep neural network-based speech recognition systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Duffy, J.R.: Motor Speech Disorders-E-Book: Substrates, Differential Diagnosis, and Management. Elsevier, New York City (2013)
Kim, M., Yoo, J., Kim, H.: Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models. In: Proceeding of Interspeech, pp. 3622–3626 (2013)
Rudzicz, F.: Articulatory knowledge in the recognition of dysarthric speech. IEEE Trans. Audio Speech Lang. Process. 19(4), 947–960 (2011)
Kim, M., Wang, J., Kim, H.: Dysarthric speech recognition using Kullback–Leibler divergence-based hidden markov model. In: Proceeding of Interspeech, pp. 2671–2675 (2016)
Yilmaz, E., Ganzeboom, M.S., Cucchiarini, C., Strik, H.: Multi-stage DNN training for automatic recognition of dysarthric speech. In: Proceeding of Interspeech, pp. 2685–2689 (2017)
Nakashika, T., Yoshioka, T., Takiguchi, T., Ariki, Y., Duffner, S., Garcia, C.: Dysarthric speech recognition using a convolutive bottleneck network. In: Proceeding of 12th IEEE International Conference on Signal Processing (ICSP), pp. 505–509 (2014)
Espana-Bonet, C., Fonollosa, J.A.: Automatic speech recognition with deep neural networks for impaired speech. In: Proceeding of Third International Conference on Advances in Speech and Language Technologies for Iberian Languages, pp. 97–107. Springer, Heidelberg (2016)
Kirchhoff, K., Fink, G.A., Sagerer, G.: Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37(3), 303–319 (2002)
Cao, B., Kim, M., Mau, T., Wang, J.: Recognizing whispered speech produced by an individual with surgically reconstructed larynx using articulatory movement data. In: Proceeding of ACL/ISCA Workshop Speech Language Processing Assistive Technology, pp. 80–86 (2016)
Hahm, S., Heitzman, D., Wang, J.: Recognizing dysarthric speech due to amyotrophic lateral sclerosis with across-speaker articulatory normalization. In: Proceeding of the ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, pp. 47–54 (2015)
Wang, J., Samal, A., Green, J.: Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph. In: Proceeding of ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, Baltimore, USA, pp. 38–45 (2014)
Wang, W., Arora, R., Livescu, K., Bilmes, J.A.: On deep multiview representation learning. In: ICML, pp. 1083–1092 (2015)
Wang, W., Arora, R., Livescu, K., Bilmes, J.A.: Unsupervised learning of acoustic features via deep canonical correlation analysis. In: Proceeding of ICASSP, pp. 4590–4594 (2015)
Borga, M.: Canonical correlation: a tutorial. Online tutorial (2001)
Green, J.R., Yunusova, Y., Kuruvilla, M.S., Wang, J., Pattee, G.L., Synhorst, L., Zinman, L., Berry, J.D.: Bulbar and speech motor assessment in ALS: challenges and future directions. Amyotroph. Lateral Scler. Frontotemporal Degener. 14(7–8), 494–500 (2013)
Kim, M., Cao, B., Mau, T., Wang, J.: Speaker-independent silent speech recognition from flesh-point articulatory movements using an LSTM neural network. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, 2323–2336 (2017)
Wang, J., Samal, A., Rong, P., Green, J.R.: An optimal set of flesh points on tongue and lips for speech-movement classification. J. Speech Lang. Hear. Res. 59, 15–26 (2016)
Berry Jeffery, J.: Accuracy of the NDI wave speech research system. J. Speech Lang. Hear. Res. 54(5), 295–301 (2011)
Wang, J., Green, J.R., Samal, A., Yunusova, Y.: Articulatory distinctiveness of vowels and consonants: a data-driven approach. J. Speech Lang. Hear. Res. 56(5), 1539–1551 (2013)
Kim, M., Cao, B, Mau, T, Wang, J.: Multiview representation learning via deep CCA for silent speech recognition. In: Proceeding of Interspeech, pp. 2769–2773 (2017)
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 3–4, 197–387 (2014)
Rabiner, L., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth Annual Conference of the International Speech Communication Association, pp. 338–342 (2014)
Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an open source toolkit for handling large scale language models. In: Proceeding of Interspeech, pp. 1618–1621 (2008)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J.: The Kaldi speech recognition toolkit. In: Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, USA, pp. 1–4 (2011)
Christensen, H., Aniol, M.B., Bell, P., Green, P.D., Hain, T., King, S., Swietojanski, P.: Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In: Proceeding of Interspeech, pp. 3642–3645 (2013)
Kim, M., Kim, Y., Yoo, J., Wang, J., Kim, H.: Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1581–1591 (2017)
Acknowledgments
This work was supported by the National Institutes of Health of the United States through grants R03DC013990 and R01DC013547 and by the American Speech-Language-Hearing Foundation through a New Century Scholar Research Grant. We would like to thank Dr. Jordan R. Green, Dr. Thomas F. Campbell, and Dr. Yana Yunusova, Jennifer McGlothlin, Kristin Teplansky, and the volunteering participants.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, M., Cao, B., Wang, J. (2019). Multi-view Representation Learning via Canonical Correlation Analysis for Dysarthric Speech Recognition. In: Deng, K., Yu, Z., Patnaik, S., Wang, J. (eds) Recent Developments in Mechatronics and Intelligent Robotics. ICMIR 2018. Advances in Intelligent Systems and Computing, vol 856. Springer, Cham. https://doi.org/10.1007/978-3-030-00214-5_133
Download citation
DOI: https://doi.org/10.1007/978-3-030-00214-5_133
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00213-8
Online ISBN: 978-3-030-00214-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)