Skip to main content

Multi-view Representation Learning via Canonical Correlation Analysis for Dysarthric Speech Recognition

  • Conference paper
  • First Online:
Recent Developments in Mechatronics and Intelligent Robotics (ICMIR 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 856))

Included in the following conference series:

Abstract

Although automatic speech recognition (ASR) has been commercially used for general public, it still does not perform sufficiently well for people with speech disorders (e.g., dysarthria). Multimodal ASR, which involves multiple sources of signals, has recently shown potential to improve the performance of dysarthric speech recognition. When multiple views (sources) of data (e.g., acoustic and articulatory) are available for training while only one view (e.g., acoustic) is available for testing, a better representation can be learned by simultaneously analyzing multiple sources of data. Although multi-view representation learning has recently used in normal speech recognition, it has rarely been studied in dysarthric speech recognition. In this paper, we investigate the effectiveness of multi-view representation learning via canonical correlation analysis (CCA) for dysarthric speech recognition. A representation of acoustic data is learned using CCA from the multi-view data (acoustic and articulatory). The articulatory data was simultaneously recorded with acoustic data using electromagnetic articulograph. Experimental evaluation on a database collected from nine patients with dysarthria due to Lou Gehrig’s disease demonstrated the effectiveness of the multi-view representation learning via CCA on deep neural network-based speech recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Duffy, J.R.: Motor Speech Disorders-E-Book: Substrates, Differential Diagnosis, and Management. Elsevier, New York City (2013)

    Google Scholar 

  2. Kim, M., Yoo, J., Kim, H.: Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models. In: Proceeding of Interspeech, pp. 3622–3626 (2013)

    Google Scholar 

  3. Rudzicz, F.: Articulatory knowledge in the recognition of dysarthric speech. IEEE Trans. Audio Speech Lang. Process. 19(4), 947–960 (2011)

    Article  Google Scholar 

  4. Kim, M., Wang, J., Kim, H.: Dysarthric speech recognition using Kullback–Leibler divergence-based hidden markov model. In: Proceeding of Interspeech, pp. 2671–2675 (2016)

    Google Scholar 

  5. Yilmaz, E., Ganzeboom, M.S., Cucchiarini, C., Strik, H.: Multi-stage DNN training for automatic recognition of dysarthric speech. In: Proceeding of Interspeech, pp. 2685–2689 (2017)

    Google Scholar 

  6. Nakashika, T., Yoshioka, T., Takiguchi, T., Ariki, Y., Duffner, S., Garcia, C.: Dysarthric speech recognition using a convolutive bottleneck network. In: Proceeding of 12th IEEE International Conference on Signal Processing (ICSP), pp. 505–509 (2014)

    Google Scholar 

  7. Espana-Bonet, C., Fonollosa, J.A.: Automatic speech recognition with deep neural networks for impaired speech. In: Proceeding of Third International Conference on Advances in Speech and Language Technologies for Iberian Languages, pp. 97–107. Springer, Heidelberg (2016)

    Chapter  Google Scholar 

  8. Kirchhoff, K., Fink, G.A., Sagerer, G.: Combining acoustic and articulatory feature information for robust speech recognition. Speech Commun. 37(3), 303–319 (2002)

    Article  Google Scholar 

  9. Cao, B., Kim, M., Mau, T., Wang, J.: Recognizing whispered speech produced by an individual with surgically reconstructed larynx using articulatory movement data. In: Proceeding of ACL/ISCA Workshop Speech Language Processing Assistive Technology, pp. 80–86 (2016)

    Google Scholar 

  10. Hahm, S., Heitzman, D., Wang, J.: Recognizing dysarthric speech due to amyotrophic lateral sclerosis with across-speaker articulatory normalization. In: Proceeding of the ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, pp. 47–54 (2015)

    Google Scholar 

  11. Wang, J., Samal, A., Green, J.: Preliminary test of a real-time, interactive silent speech interface based on electromagnetic articulograph. In: Proceeding of ACL/ISCA Workshop on Speech and Language Processing for Assistive Technologies, Baltimore, USA, pp. 38–45 (2014)

    Google Scholar 

  12. Wang, W., Arora, R., Livescu, K., Bilmes, J.A.: On deep multiview representation learning. In: ICML, pp. 1083–1092 (2015)

    Google Scholar 

  13. Wang, W., Arora, R., Livescu, K., Bilmes, J.A.: Unsupervised learning of acoustic features via deep canonical correlation analysis. In: Proceeding of ICASSP, pp. 4590–4594 (2015)

    Google Scholar 

  14. Borga, M.: Canonical correlation: a tutorial. Online tutorial (2001)

    Google Scholar 

  15. Green, J.R., Yunusova, Y., Kuruvilla, M.S., Wang, J., Pattee, G.L., Synhorst, L., Zinman, L., Berry, J.D.: Bulbar and speech motor assessment in ALS: challenges and future directions. Amyotroph. Lateral Scler. Frontotemporal Degener. 14(7–8), 494–500 (2013)

    Article  Google Scholar 

  16. Kim, M., Cao, B., Mau, T., Wang, J.: Speaker-independent silent speech recognition from flesh-point articulatory movements using an LSTM neural network. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, 2323–2336 (2017)

    Article  Google Scholar 

  17. Wang, J., Samal, A., Rong, P., Green, J.R.: An optimal set of flesh points on tongue and lips for speech-movement classification. J. Speech Lang. Hear. Res. 59, 15–26 (2016)

    Article  Google Scholar 

  18. Berry Jeffery, J.: Accuracy of the NDI wave speech research system. J. Speech Lang. Hear. Res. 54(5), 295–301 (2011)

    Google Scholar 

  19. Wang, J., Green, J.R., Samal, A., Yunusova, Y.: Articulatory distinctiveness of vowels and consonants: a data-driven approach. J. Speech Lang. Hear. Res. 56(5), 1539–1551 (2013)

    Article  Google Scholar 

  20. Kim, M., Cao, B, Mau, T, Wang, J.: Multiview representation learning via deep CCA for silent speech recognition. In: Proceeding of Interspeech, pp. 2769–2773 (2017)

    Google Scholar 

  21. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 3–4, 197–387 (2014)

    Article  MathSciNet  Google Scholar 

  22. Rabiner, L., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)

    Article  Google Scholar 

  23. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  25. Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth Annual Conference of the International Speech Communication Association, pp. 338–342 (2014)

    Google Scholar 

  26. Federico, M., Bertoldi, N., Cettolo, M.: IRSTLM: an open source toolkit for handling large scale language models. In: Proceeding of Interspeech, pp. 1618–1621 (2008)

    Google Scholar 

  27. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J.: The Kaldi speech recognition toolkit. In: Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, USA, pp. 1–4 (2011)

    Google Scholar 

  28. Christensen, H., Aniol, M.B., Bell, P., Green, P.D., Hain, T., King, S., Swietojanski, P.: Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In: Proceeding of Interspeech, pp. 3642–3645 (2013)

    Google Scholar 

  29. Kim, M., Kim, Y., Yoo, J., Wang, J., Kim, H.: Regularized speaker adaptation of KL-HMM for dysarthric speech recognition. IEEE Trans. Neural Syst. Rehabil. Eng. 25(9), 1581–1591 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Institutes of Health of the United States through grants R03DC013990 and R01DC013547 and by the American Speech-Language-Hearing Foundation through a New Century Scholar Research Grant. We would like to thank Dr. Jordan R. Green, Dr. Thomas F. Campbell, and Dr. Yana Yunusova, Jennifer McGlothlin, Kristin Teplansky, and the volunteering participants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, M., Cao, B., Wang, J. (2019). Multi-view Representation Learning via Canonical Correlation Analysis for Dysarthric Speech Recognition. In: Deng, K., Yu, Z., Patnaik, S., Wang, J. (eds) Recent Developments in Mechatronics and Intelligent Robotics. ICMIR 2018. Advances in Intelligent Systems and Computing, vol 856. Springer, Cham. https://doi.org/10.1007/978-3-030-00214-5_133

Download citation

Publish with us

Policies and ethics