Skip to main content

A Study of Speech Phase in Dysarthria Voice Conversion System

  • Conference paper
  • First Online:
Future Trends in Biomedical and Health Informatics and Cybersecurity in Medical Devices (ICBHI 2019)

Part of the book series: IFMBE Proceedings ((IFMBE,volume 74))

Included in the following conference series:

  • 1058 Accesses

Abstract

Dysarthria is a communication disorder common in people with damaged neuro-muscular apparatus resulting from events such as stroke. For a dysarthric speaker, voice conversion (VC) is one of the well-known approaches to improve speech intelligibility for a dysarthric speaker. Most of the well-known VC methods focus on converting amplitude features without phase information. Previous studies indicated that phase is an important factor in the speech signal. Therefore, we are interested in adding the correct phase information to VC for dysarthria speech. The results of automatic speech recognition and spectrum analysis show that intelligibility is improved by replacing the dysarthria phase with the normal phase during the synthesis step. It implies that the correct phase information must be considered for the dysarthria VC system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Quick Facts About ASHA: American Speech-Language-Hearing Association (n.d.). https://www.asha.org/about/news/quick-facts/. Accessed 1 Feb 2019

  2. Hosom, J.-P., Kain, A.B., Mishra, T., Van Santen, J.P., Fried-Oken, M., Staehely, J.: Intelligibility of modifications to dysarthric speech. In: Proceedings of Acoustics, Speech, and Signal Processing, p. I (2003)

    Google Scholar 

  3. Hwang, H.-T., Tsao, Y., Wang, H.-M., Wang, Y.-R., Chen, S.-H.: Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. In: Proceedings of Interspeech, pp. 3062–3066 (2013)

    Google Scholar 

  4. Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)

    Article  Google Scholar 

  5. Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)

    Article  Google Scholar 

  6. Zhang, Q., Tao, L., Zhou, J., Wang, H.: The voice conversion method based on sparse convolutive non-negative matrix factorization. In: Proceedings of International Conference on Electrical and Information Technologies for Rail Transportation, pp. 259–267 (2016)

    Chapter  Google Scholar 

  7. Fu, S.-W., Li, P.-C., Lai, Y.-H., Yang, C.-C., Hsieh, L.-C., Tsao, Y.: Joint dictionary learning-based non-negative matrix factorization for voice conversion to improve speech intelligibility after oral surgery. IEEE Trans. Biomed. Eng. 64(11), 2584–2594 (2017)

    Article  Google Scholar 

  8. Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)

    Article  Google Scholar 

  9. Chen, L.-H., Ling, Z.-H., Liu, L.-J., Dai, L.-R.: Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1859–1872 (2014)

    Article  Google Scholar 

  10. Zhou, C., Horgan, M., Kumar, V., Vasco, C., Darcy, D.: Voice conversion with conditional sampleRNN. arXiv preprint arXiv:1808.08311 (2018)

  11. Chorowski, J., Weiss, R.J., Saurous, R.A., Bengio, S.: On using backpropagation for speech texture generation and voice conversion. In: Proceedings of Acoustics, Speech and Signal Processing, pp. 2256–2260 (2018)

    Google Scholar 

  12. Ohm, G.S.: Über die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Ann. Phys. 135(8), 513–565 (1843)

    Article  Google Scholar 

  13. Helmholtz, H.: On the sensations of tone. Courier Corporation (2013)

    Google Scholar 

  14. Kim, D.-S.: Perceptual phase redundancy in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 1383–1386 (2000)

    Google Scholar 

  15. Pobloth, H., Kleijn, W.B.: On phase perception in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 29–32 (1999)

    Google Scholar 

  16. Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)

    Article  Google Scholar 

  17. Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017). https://doi.org/10.1109/LSP.2017.2657381

    Article  Google Scholar 

  18. Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)

    Article  Google Scholar 

  19. Griffin, D.W.: Signal estimation from modified short-time Fourier transfrom. IEEE ASSP 32, 2 (1984)

    Article  Google Scholar 

  20. Wong, L.L., Soli, S.D., Liu, S., Han, N., Huang, M.-W.: Development of the Mandarin hearing in noise test (MHINT). Ear Hear. 28(2), 70S–74S (2007)

    Article  Google Scholar 

  21. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)

Download references

Acknowledgment

This work was supported by the Ministry of Science and Technology, Taiwan, under Grant MOST 107-2218-E-010-006.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying-Hui Lai .

Editor information

Editors and Affiliations

Ethics declarations

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, KC., Han, JY., Jhang, SH., Lai, YH. (2020). A Study of Speech Phase in Dysarthria Voice Conversion System. In: Lin, KP., Magjarevic, R., de Carvalho, P. (eds) Future Trends in Biomedical and Health Informatics and Cybersecurity in Medical Devices. ICBHI 2019. IFMBE Proceedings, vol 74. Springer, Cham. https://doi.org/10.1007/978-3-030-30636-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30636-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30635-9

  • Online ISBN: 978-3-030-30636-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics