Multi-view Representation Learning via Canonical Correlation Analysis for Dysarthric Speech Recognition

Kim, Myungjong; Cao, Beiming; Wang, Jun

doi:10.1007/978-3-030-00214-5_133

Myungjong Kim¹⁸,
Beiming Cao¹⁸ &
Jun Wang^18,19

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 856))

Included in the following conference series:

International Conference on Mechatronics and Intelligent Robotics

1354 Accesses
1 Citations

Abstract

Although automatic speech recognition (ASR) has been commercially used for general public, it still does not perform sufficiently well for people with speech disorders (e.g., dysarthria). Multimodal ASR, which involves multiple sources of signals, has recently shown potential to improve the performance of dysarthric speech recognition. When multiple views (sources) of data (e.g., acoustic and articulatory) are available for training while only one view (e.g., acoustic) is available for testing, a better representation can be learned by simultaneously analyzing multiple sources of data. Although multi-view representation learning has recently used in normal speech recognition, it has rarely been studied in dysarthric speech recognition. In this paper, we investigate the effectiveness of multi-view representation learning via canonical correlation analysis (CCA) for dysarthric speech recognition. A representation of acoustic data is learned using CCA from the multi-view data (acoustic and articulatory). The articulatory data was simultaneously recorded with acoustic data using electromagnetic articulograph. Experimental evaluation on a database collected from nine patients with dysarthria due to Lou Gehrig’s disease demonstrated the effectiveness of the multi-view representation learning via CCA on deep neural network-based speech recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Duffy, J.R.: Motor Speech Disorders-E-Book: Substrates, Differential Diagnosis, and Management. Elsevier, New York City (2013)
Google Scholar
Kim, M., Yoo, J., Kim, H.: Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models. In: Proceeding of Interspeech, pp. 3622–3626 (2013)
Google Scholar
Rudzicz, F.: Articulatory knowledge in the recognition of dysarthric speech. IEEE Trans. Audio Speech Lang. Process. 19(4), 947–960 (2011)
Article Google Scholar
Kim, M., Wang, J., Kim, H.: Dysarthric speech recognition using Kullback–Leibler divergence-based hidden markov model. In: Proceeding of Interspeech, pp. 2671–2675 (2016)
Google Scholar
Yilmaz, E., Ganzeboom, M.S., Cucchiarini, C., Strik, H.: Multi-stage DNN training for automatic recognition of dysarthric speech. In: Proceeding of Interspeech, pp. 2685–2689 (2017)
Google Scholar
Nakashika, T., Yoshioka, T., Takiguchi, T., Ariki, Y., Duffner, S., Garcia, C.: Dysarthric speech recognition using a convolutive bottleneck network. In: Proceeding of 12th IEEE International Conference on Signal Processing (ICSP), pp. 505–509 (2014)
Google Scholar
Espana-Bonet, C., Fonollosa, J.A.: Automatic speech recognition with deep neural networks for impaired speech. In: Proceeding of Third International Conference on Advances in Speech and Language Technologies for Iberian Languages, pp. 97–107. Springer, Heidelberg (2016)
Chapter Google Scholar
Kirchhoff, K., Fink, G.A., Sagerer, G.: Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37(3), 303–319 (2002)
Article Google Scholar
Cao, B., Kim, M., Mau, T., Wang, J.: Recognizing whispered speech produced by an individual with surgically reconstructed larynx using articulatory movement data. In: Proceeding of ACL/ISCA Workshop Speech Language Processing Assistive Technology, pp. 80–86 (2016)
Google Scholar
Hahm, S., Heitzman, D., Wang, J.: Recognizing dysarthric speech due to amyotrophic lateral sclerosis with across-speaker articulatory normalization. In: Proceeding of the ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, pp. 47–54 (2015)
Google Scholar
Wang, J., Samal, A., Green, J.: Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph. In: Proceeding of ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, Baltimore, USA, pp. 38–45 (2014)
Google Scholar
Wang, W., Arora, R., Livescu, K., Bilmes, J.A.: On deep multiview representation learning. In: ICML, pp. 1083–1092 (2015)
Google Scholar
Wang, W., Arora, R., Livescu, K., Bilmes, J.A.: Unsupervised learning of acoustic features via deep canonical correlation analysis. In: Proceeding of ICASSP, pp. 4590–4594 (2015)
Google Scholar
Borga, M.: Canonical correlation: a tutorial. Online tutorial (2001)
Google Scholar
Green, J.R., Yunusova, Y., Kuruvilla, M.S., Wang, J., Pattee, G.L., Synhorst, L., Zinman, L., Berry, J.D.: Bulbar and speech motor assessment in ALS: challenges and future directions. Amyotroph. Lateral Scler. Frontotemporal Degener. 14(7–8), 494–500 (2013)
Article Google Scholar
Kim, M., Cao, B., Mau, T., Wang, J.: Speaker-independent silent speech recognition from flesh-point articulatory movements using an LSTM neural network. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, 2323–2336 (2017)
Article Google Scholar
Wang, J., Samal, A., Rong, P., Green, J.R.: An optimal set of flesh points on tongue and lips for speech-movement classification. J. Speech Lang. Hear. Res. 59, 15–26 (2016)
Article Google Scholar
Berry Jeffery, J.: Accuracy of the NDI wave speech research system. J. Speech Lang. Hear. Res. 54(5), 295–301 (2011)
Google Scholar
Wang, J., Green, J.R., Samal, A., Yunusova, Y.: Articulatory distinctiveness of vowels and consonants: a data-driven approach. J. Speech Lang. Hear. Res. 56(5), 1539–1551 (2013)
Article Google Scholar
Kim, M., Cao, B, Mau, T, Wang, J.: Multiview representation learning via deep CCA for silent speech recognition. In: Proceeding of Interspeech, pp. 2769–2773 (2017)
Google Scholar
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 3–4, 197–387 (2014)
Article MathSciNet Google Scholar
Rabiner, L., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth Annual Conference of the International Speech Communication Association, pp. 338–342 (2014)
Google Scholar
Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an open source toolkit for handling large scale language models. In: Proceeding of Interspeech, pp. 1618–1621 (2008)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J.: The Kaldi speech recognition toolkit. In: Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, USA, pp. 1–4 (2011)
Google Scholar
Christensen, H., Aniol, M.B., Bell, P., Green, P.D., Hain, T., King, S., Swietojanski, P.: Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In: Proceeding of Interspeech, pp. 3642–3645 (2013)
Google Scholar
Kim, M., Kim, Y., Yoo, J., Wang, J., Kim, H.: Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1581–1591 (2017)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Institutes of Health of the United States through grants R03DC013990 and R01DC013547 and by the American Speech-Language-Hearing Foundation through a New Century Scholar Research Grant. We would like to thank Dr. Jordan R. Green, Dr. Thomas F. Campbell, and Dr. Yana Yunusova, Jennifer McGlothlin, Kristin Teplansky, and the volunteering participants.

Author information

Authors and Affiliations

Speech Disorders and Technology Lab, Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA
Myungjong Kim, Beiming Cao & Jun Wang
Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, TX, USA
Jun Wang

Authors

Myungjong Kim
View author publications
You can also search for this author in PubMed Google Scholar
Beiming Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wang .

Editor information

Editors and Affiliations

School of Transportation Science and Engineering, Beihang University, Beijing, China
Kevin Deng
School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
Zhengtao Yu
Department of Computer Science and Engineering, Soa University, Bhubaneswar, Odisha, India
Srikanta Patnaik
Department of Information Management and Business Analytics, Montclair State University, Montclair, NJ, USA
John Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, M., Cao, B., Wang, J. (2019). Multi-view Representation Learning via Canonical Correlation Analysis for Dysarthric Speech Recognition. In: Deng, K., Yu, Z., Patnaik, S., Wang, J. (eds) Recent Developments in Mechatronics and Intelligent Robotics. ICMIR 2018. Advances in Intelligent Systems and Computing, vol 856. Springer, Cham. https://doi.org/10.1007/978-3-030-00214-5_133

Download citation

DOI: https://doi.org/10.1007/978-3-030-00214-5_133
Published: 05 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00213-8
Online ISBN: 978-3-030-00214-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics