Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 378))

Abstract

We propose a voice conversion method based on sparse convolutive non-negative matrix factorization. The method utilizes the Itakura–Saito distance as the objective cost function, making the smaller matrix element with a smaller reconstruction error due to the property of scale invariant of the cost function. The time–frequency basis of the source and target were extracted during the training phase, and the speech is converted through time–frequency basis substitution. The effect of whisper-to-normal speech conversion experiment is also conducted. Experimental results show that the proposed voice conversion method outperforms the method based on the conventional convolutive non-negative matrix factorization and the method based on the Kullback–Leibler (K-L) cost function in the aspects of speech intelligibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abe M, Nakamura S, Shikano K, Kuwabara H et al (1988) Voice conversion through vector quantization. In: IEEE international conference on acoustics, speech, and signal processing. IEEE Press, Washington, DC, pp 655–658

    Google Scholar 

  2. Stylianou Y, Cappe O, Moulines E (1998) Continuous probabilistic transform for voice conversion. IEEE Trans Speech Audio Process 6(2):131–142

    Article  Google Scholar 

  3. Yue Z, Zou X, Wang H (2009) Voice conversion with the combination of HMM and GMM. J Data Acquisition Process 24(3):285–289

    Google Scholar 

  4. Yamagishi J, Kobayashi T, Nakano Y et al (2009) Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans Audio Speech Lang Process 17(1):66–83

    Article  Google Scholar 

  5. Watts O, Yamagishi J, King S (2010) Synthesis of child speech with HMM adaptation and voice conversion. IEEE Trans Audio Speech Lang Process 18(5):1005–1016

    Article  Google Scholar 

  6. Desai S, Black AW, Yegnanarayana B et al (2010) Spectral mapping using artificial neural networks for voice conversion. IEEE Trans Audio Speech Lang Process 18(5):954–964

    Article  Google Scholar 

  7. Duxans H, Bonafonte A, Kain A et al (2004) Including dynamic and phonetic information in voice conversion systems. In: 8th international conference on spoken language processing. Jeju Island, Korea, pp 5–8

    Google Scholar 

  8. Toda T, Black AW, Tokuda K (2007) Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans Audio Speech Lang Process 15(8):2222–2235

    Article  Google Scholar 

  9. Zen H, Nankaku Y, Tokuda K (2010) Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans Audio Speech Lang Process 19(2):417–430

    Article  Google Scholar 

  10. Sun J, Zhang X, Cao T et al (2013) Voice conversion based on convolutive nonnegative matrix factorization. J Data Acquisition Process 28(2):141–148

    MathSciNet  Google Scholar 

  11. Ma Z, Zhang X, Yang J et al (2013) Voice conversion based on sparse convolutive nonnegative matrix factorization. J Data Acquisition Process 34(2):1–7

    Google Scholar 

  12. Fumitada I, Shuji S (1970) A statistical method for estimation of speech spectral density and formant frequencies. Electron Commun 53(A):36–43

    Google Scholar 

  13. Smaragdis P (2007) Convolutive speech bases and their application to supervised speech separation. IEEE Trans Audio Speech Lang Process 15(1):1–12

    Article  Google Scholar 

  14. Lee D, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  15. Lee D, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems. MIT Press, Cambridge, Mass, USA, pp 556–562

    Google Scholar 

  16. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported by the Natural Science Foundation of China (No. 61372137, No. 61301295) and the Anhui Natural Science Foundation (No. 1308085QF100).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qianmin Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Q., Tao, L., Zhou, J., Wang, H. (2016). The Voice Conversion Method Based on Sparse Convolutive Non-negative Matrix Factorization. In: Qin, Y., Jia, L., Feng, J., An, M., Diao, L. (eds) Proceedings of the 2015 International Conference on Electrical and Information Technologies for Rail Transportation. Lecture Notes in Electrical Engineering, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49370-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-49370-0_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-49368-7

  • Online ISBN: 978-3-662-49370-0

  • eBook Packages: EnergyEnergy (R0)

Publish with us

Policies and ethics