The Voice Conversion Method Based on Sparse Convolutive Non-negative Matrix Factorization

Zhang, Qianmin; Tao, Liang; Zhou, Jian; Wang, Huabin

doi:10.1007/978-3-662-49370-0_27

Qianmin Zhang⁶,
Liang Tao⁶,
Jian Zhou⁶ &
…
Huabin Wang⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 378))

1372 Accesses
2 Citations

Abstract

We propose a voice conversion method based on sparse convolutive non-negative matrix factorization. The method utilizes the Itakura–Saito distance as the objective cost function, making the smaller matrix element with a smaller reconstruction error due to the property of scale invariant of the cost function. The time–frequency basis of the source and target were extracted during the training phase, and the speech is converted through time–frequency basis substitution. The effect of whisper-to-normal speech conversion experiment is also conducted. Experimental results show that the proposed voice conversion method outperforms the method based on the conventional convolutive non-negative matrix factorization and the method based on the Kullback–Leibler (K-L) cost function in the aspects of speech intelligibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abe M, Nakamura S, Shikano K, Kuwabara H et al (1988) Voice conversion through vector quantization. In: IEEE international conference on acoustics, speech, and signal processing. IEEE Press, Washington, DC, pp 655–658
Google Scholar
Stylianou Y, Cappe O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142
Article Google Scholar
Yue Z, Zou X, Wang H (2009) Voice conversion with the combination of HMM and GMM. J Data Acquisition Process 24(3):285–289
Google Scholar
Yamagishi J, Kobayashi T, Nakano Y et al (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83
Article Google Scholar
Watts O, Yamagishi J, King S (2010) Synthesis of child speech with HMM adaptation and voice conversion. IEEE Trans Audio Speech Lang Process 18(5):1005–1016
Article Google Scholar
Desai S, Black AW, Yegnanarayana B et al (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964
Article Google Scholar
Duxans H, Bonafonte A, Kain A et al (2004) Including dynamic and phonetic information in voice conversion systems. In: 8th international conference on spoken language processing. Jeju Island, Korea, pp 5–8
Google Scholar
Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235
Article Google Scholar
Zen H, Nankaku Y, Tokuda K (2010) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430
Article Google Scholar
Sun J, Zhang X, Cao T et al (2013) Voice conversion based on convolutive nonnegative matrix factorization. J Data Acquisition Process 28(2):141–148
MathSciNet Google Scholar
Ma Z, Zhang X, Yang J et al (2013) Voice conversion based on sparse convolutive nonnegative matrix factorization. J Data Acquisition Process 34(2):1–7
Google Scholar
Fumitada I, Shuji S (1970) A statistical method for estimation of speech spectral density and formant frequencies. Electron Commun 53(A):36–43
Google Scholar
Smaragdis P (2007) Convolutive speech bases and their application to supervised speech separation. IEEE Trans Audio Speech Lang Process 15(1):1–12
Article Google Scholar
Lee D, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Article Google Scholar
Lee D, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems. MIT Press, Cambridge, Mass, USA, pp 556–562
Google Scholar
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work is supported by the Natural Science Foundation of China (No. 61372137, No. 61301295) and the Anhui Natural Science Foundation (No. 1308085QF100).

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, No. 111, Jiulong Road, Shushan District, Hefei City, Anhui Province, China
Qianmin Zhang, Liang Tao, Jian Zhou & Huabin Wang

Authors

Qianmin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Tao
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Huabin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qianmin Zhang .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yong Qin
Beijing Jiaotong University, Beijing, China
Limin Jia
CSR Zhuzhou Institute CO., LTD., Zhuzhou, China
Jianghua Feng
University of Birmingham, Birmingham, United Kingdom
Min An
Beijing Jiaotong University, Beijing, China
Lijun Diao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Q., Tao, L., Zhou, J., Wang, H. (2016). The Voice Conversion Method Based on Sparse Convolutive Non-negative Matrix Factorization. In: Qin, Y., Jia, L., Feng, J., An, M., Diao, L. (eds) Proceedings of the 2015 International Conference on Electrical and Information Technologies for Rail Transportation. Lecture Notes in Electrical Engineering, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49370-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-662-49370-0_27
Published: 11 March 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49368-7
Online ISBN: 978-3-662-49370-0
eBook Packages: EnergyEnergy (R0)

Publish with us

Policies and ethics