Abstract
We propose a voice conversion method based on sparse convolutive non-negative matrix factorization. The method utilizes the Itakura–Saito distance as the objective cost function, making the smaller matrix element with a smaller reconstruction error due to the property of scale invariant of the cost function. The time–frequency basis of the source and target were extracted during the training phase, and the speech is converted through time–frequency basis substitution. The effect of whisper-to-normal speech conversion experiment is also conducted. Experimental results show that the proposed voice conversion method outperforms the method based on the conventional convolutive non-negative matrix factorization and the method based on the Kullback–Leibler (K-L) cost function in the aspects of speech intelligibility.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abe M, Nakamura S, Shikano K, Kuwabara H et al (1988) Voice conversion through vector quantization. In: IEEE international conference on acoustics, speech, and signal processing. IEEE Press, Washington, DC, pp 655–658
Stylianou Y, Cappe O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142
Yue Z, Zou X, Wang H (2009) Voice conversion with the combination of HMM and GMM. J Data Acquisition Process 24(3):285–289
Yamagishi J, Kobayashi T, Nakano Y et al (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83
Watts O, Yamagishi J, King S (2010) Synthesis of child speech with HMM adaptation and voice conversion. IEEE Trans Audio Speech Lang Process 18(5):1005–1016
Desai S, Black AW, Yegnanarayana B et al (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964
Duxans H, Bonafonte A, Kain A et al (2004) Including dynamic and phonetic information in voice conversion systems. In: 8th international conference on spoken language processing. Jeju Island, Korea, pp 5–8
Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235
Zen H, Nankaku Y, Tokuda K (2010) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430
Sun J, Zhang X, Cao T et al (2013) Voice conversion based on convolutive nonnegative matrix factorization. J Data Acquisition Process 28(2):141–148
Ma Z, Zhang X, Yang J et al (2013) Voice conversion based on sparse convolutive nonnegative matrix factorization. J Data Acquisition Process 34(2):1–7
Fumitada I, Shuji S (1970) A statistical method for estimation of speech spectral density and formant frequencies. Electron Commun 53(A):36–43
Smaragdis P (2007) Convolutive speech bases and their application to supervised speech separation. IEEE Trans Audio Speech Lang Process 15(1):1–12
Lee D, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee D, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems. MIT Press, Cambridge, Mass, USA, pp 556–562
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Acknowledgments
This work is supported by the Natural Science Foundation of China (No. 61372137, No. 61301295) and the Anhui Natural Science Foundation (No. 1308085QF100).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Q., Tao, L., Zhou, J., Wang, H. (2016). The Voice Conversion Method Based on Sparse Convolutive Non-negative Matrix Factorization. In: Qin, Y., Jia, L., Feng, J., An, M., Diao, L. (eds) Proceedings of the 2015 International Conference on Electrical and Information Technologies for Rail Transportation. Lecture Notes in Electrical Engineering, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49370-0_27
Download citation
DOI: https://doi.org/10.1007/978-3-662-49370-0_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49368-7
Online ISBN: 978-3-662-49370-0
eBook Packages: EnergyEnergy (R0)