A Study of Speech Phase in Dysarthria Voice Conversion System

Chen, Ko-Chiang; Han, Ji-Yan; Jhang, Sin-Hua; Lai, Ying-Hui

doi:10.1007/978-3-030-30636-6_31

Ko-Chiang Chen⁹,
Ji-Yan Han⁹,
Sin-Hua Jhang⁹ &
…
Ying-Hui Lai⁹

Part of the book series: IFMBE Proceedings ((IFMBE,volume 74))

Included in the following conference series:

International Conference on Biomedical and Health Informatics

1058 Accesses

Abstract

Dysarthria is a communication disorder common in people with damaged neuro-muscular apparatus resulting from events such as stroke. For a dysarthric speaker, voice conversion (VC) is one of the well-known approaches to improve speech intelligibility for a dysarthric speaker. Most of the well-known VC methods focus on converting amplitude features without phase information. Previous studies indicated that phase is an important factor in the speech signal. Therefore, we are interested in adding the correct phase information to VC for dysarthria speech. The results of automatic speech recognition and spectrum analysis show that intelligibility is improved by replacing the dysarthria phase with the normal phase during the synthesis step. It implies that the correct phase information must be considered for the dysarthria VC system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Quick Facts About ASHA: American Speech-Language-Hearing Association (n.d.). https://www.asha.org/about/news/quick-facts/. Accessed 1 Feb 2019
Hosom, J.-P., Kain, A.B., Mishra, T., Van Santen, J.P., Fried-Oken, M., Staehely, J.: Intelligibility of modifications to dysarthric speech. In: Proceedings of Acoustics, Speech, and Signal Processing, p. I (2003)
Google Scholar
Hwang, H.-T., Tsao, Y., Wang, H.-M., Wang, Y.-R., Chen, S.-H.: Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. In: Proceedings of Interspeech, pp. 3062–3066 (2013)
Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)
Article Google Scholar
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Article Google Scholar
Zhang, Q., Tao, L., Zhou, J., Wang, H.: The voice conversion method based on sparse convolutive non-negative matrix factorization. In: Proceedings of International Conference on Electrical and Information Technologies for Rail Transportation, pp. 259–267 (2016)
Chapter Google Scholar
Fu, S.-W., Li, P.-C., Lai, Y.-H., Yang, C.-C., Hsieh, L.-C., Tsao, Y.: Joint dictionary learning-based non-negative matrix factorization for voice conversion to improve speech intelligibility after oral surgery. IEEE Trans. Biomed. Eng. 64(11), 2584–2594 (2017)
Article Google Scholar
Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)
Article Google Scholar
Chen, L.-H., Ling, Z.-H., Liu, L.-J., Dai, L.-R.: Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1859–1872 (2014)
Article Google Scholar
Zhou, C., Horgan, M., Kumar, V., Vasco, C., Darcy, D.: Voice conversion with conditional sampleRNN. arXiv preprint arXiv:1808.08311 (2018)
Chorowski, J., Weiss, R.J., Saurous, R.A., Bengio, S.: On using backpropagation for speech texture generation and voice conversion. In: Proceedings of Acoustics, Speech and Signal Processing, pp. 2256–2260 (2018)
Google Scholar
Ohm, G.S.: Über die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Ann. Phys. 135(8), 513–565 (1843)
Article Google Scholar
Helmholtz, H.: On the sensations of tone. Courier Corporation (2013)
Google Scholar
Kim, D.-S.: Perceptual phase redundancy in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 1383–1386 (2000)
Google Scholar
Pobloth, H., Kleijn, W.B.: On phase perception in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 29–32 (1999)
Google Scholar
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Article Google Scholar
Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017). https://doi.org/10.1109/LSP.2017.2657381
Article Google Scholar
Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)
Article Google Scholar
Griffin, D.W.: Signal estimation from modified short-time Fourier transfrom. IEEE ASSP 32, 2 (1984)
Article Google Scholar
Wong, L.L., Soli, S.D., Liu, S., Han, N., Huang, M.-W.: Development of the Mandarin hearing in noise test (MHINT). Ear Hear. 28(2), 70S–74S (2007)
Article Google Scholar
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)

Download references

Acknowledgment

This work was supported by the Ministry of Science and Technology, Taiwan, under Grant MOST 107-2218-E-010-006.

Author information

Authors and Affiliations

Department of Biomedical Engineering, National Yang-Ming University, Taipei, Taiwan
Ko-Chiang Chen, Ji-Yan Han, Sin-Hua Jhang & Ying-Hui Lai

Authors

Ko-Chiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Yan Han
View author publications
You can also search for this author in PubMed Google Scholar
Sin-Hua Jhang
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Hui Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ying-Hui Lai .

Editor information

Editors and Affiliations

Department of Electrical Engineering, Chung Yuan Christian University, Taoyuan, Taiwan
Kang-Ping Lin
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Ratko Magjarevic
Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
Paulo de Carvalho

Ethics declarations

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, KC., Han, JY., Jhang, SH., Lai, YH. (2020). A Study of Speech Phase in Dysarthria Voice Conversion System. In: Lin, KP., Magjarevic, R., de Carvalho, P. (eds) Future Trends in Biomedical and Health Informatics and Cybersecurity in Medical Devices. ICBHI 2019. IFMBE Proceedings, vol 74. Springer, Cham. https://doi.org/10.1007/978-3-030-30636-6_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-30636-6_31
Published: 28 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30635-9
Online ISBN: 978-3-030-30636-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics