Abstract
Dysarthria is a communication disorder common in people with damaged neuro-muscular apparatus resulting from events such as stroke. For a dysarthric speaker, voice conversion (VC) is one of the well-known approaches to improve speech intelligibility for a dysarthric speaker. Most of the well-known VC methods focus on converting amplitude features without phase information. Previous studies indicated that phase is an important factor in the speech signal. Therefore, we are interested in adding the correct phase information to VC for dysarthria speech. The results of automatic speech recognition and spectrum analysis show that intelligibility is improved by replacing the dysarthria phase with the normal phase during the synthesis step. It implies that the correct phase information must be considered for the dysarthria VC system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Quick Facts About ASHA: American Speech-Language-Hearing Association (n.d.). https://www.asha.org/about/news/quick-facts/. Accessed 1 Feb 2019
Hosom, J.-P., Kain, A.B., Mishra, T., Van Santen, J.P., Fried-Oken, M., Staehely, J.: Intelligibility of modifications to dysarthric speech. In: Proceedings of Acoustics, Speech, and Signal Processing, p. I (2003)
Hwang, H.-T., Tsao, Y., Wang, H.-M., Wang, Y.-R., Chen, S.-H.: Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training. In: Proceedings of Interspeech, pp. 3062–3066 (2013)
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)
Virtanen, T.: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)
Zhang, Q., Tao, L., Zhou, J., Wang, H.: The voice conversion method based on sparse convolutive non-negative matrix factorization. In: Proceedings of International Conference on Electrical and Information Technologies for Rail Transportation, pp. 259–267 (2016)
Fu, S.-W., Li, P.-C., Lai, Y.-H., Yang, C.-C., Hsieh, L.-C., Tsao, Y.: Joint dictionary learning-based non-negative matrix factorization for voice conversion to improve speech intelligibility after oral surgery. IEEE Trans. Biomed. Eng. 64(11), 2584–2594 (2017)
Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)
Chen, L.-H., Ling, Z.-H., Liu, L.-J., Dai, L.-R.: Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1859–1872 (2014)
Zhou, C., Horgan, M., Kumar, V., Vasco, C., Darcy, D.: Voice conversion with conditional sampleRNN. arXiv preprint arXiv:1808.08311 (2018)
Chorowski, J., Weiss, R.J., Saurous, R.A., Bengio, S.: On using backpropagation for speech texture generation and voice conversion. In: Proceedings of Acoustics, Speech and Signal Processing, pp. 2256–2260 (2018)
Ohm, G.S.: Über die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen. Ann. Phys. 135(8), 513–565 (1843)
Helmholtz, H.: On the sensations of tone. Courier Corporation (2013)
Kim, D.-S.: Perceptual phase redundancy in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 1383–1386 (2000)
Pobloth, H., Kleijn, W.B.: On phase perception in speech. In: Proceedings of Acoustics, Speech, and Signal Processing, pp. 29–32 (1999)
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017). https://doi.org/10.1109/LSP.2017.2657381
Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)
Griffin, D.W.: Signal estimation from modified short-time Fourier transfrom. IEEE ASSP 32, 2 (1984)
Wong, L.L., Soli, S.D., Liu, S., Han, N., Huang, M.-W.: Development of the Mandarin hearing in noise test (MHINT). Ear Hear. 28(2), 70S–74S (2007)
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv preprint arXiv:1003.4083 (2010)
Acknowledgment
This work was supported by the Ministry of Science and Technology, Taiwan, under Grant MOST 107-2218-E-010-006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The authors declare that they have no conflict of interest.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, KC., Han, JY., Jhang, SH., Lai, YH. (2020). A Study of Speech Phase in Dysarthria Voice Conversion System. In: Lin, KP., Magjarevic, R., de Carvalho, P. (eds) Future Trends in Biomedical and Health Informatics and Cybersecurity in Medical Devices. ICBHI 2019. IFMBE Proceedings, vol 74. Springer, Cham. https://doi.org/10.1007/978-3-030-30636-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-30636-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30635-9
Online ISBN: 978-3-030-30636-6
eBook Packages: EngineeringEngineering (R0)