A Voice Conversion Method Based on the Separation of Speaker-Specific Characteristics

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 236)


This paper aims to study independent and complete characterization of speaker-specific voice characteristics. Thus, the authors conduct a method on the separation between voice characteristics and linguistic content in speech and carry out voice conversion from the point of information separation. In this paper, authors take full account of the K-means singular value decomposition (K-SVD) algorithm which can train the dictionary to contain the personal characteristics and inter-frame correlation of voice. With this feature, the dictionary which contains the personal characteristics is extracted from training data through the K-SVD algorithm. Then the authors use the trained dictionary and other content information to reconstruct the target speech. Compared to traditional methods, the personal characteristics can be better preserved based on the proposed method through the sparse nature of voice and can easily solve the problems encountered in feature mapping methods and the voice conversion improvements are to be expected. Experimental results using objective evaluations show that the proposed method outperforms the Gaussian Mixture Model and Artificial Neural Network based methods in the view of both speech quality and conversion similarity to the target.


Voice conversion Speaker-specific characteristics Information separation K-SVD 


  1. 1.
    Stylianou, Y.: Voice transformation: a survey. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3585–3588 (2009)Google Scholar
  2. 2.
    Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 655–658 (1988)Google Scholar
  3. 3.
    Stylianou, Y., Cappe, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6(2), 131–142 (1998)CrossRefGoogle Scholar
  4. 4.
    Yamagishi, J., Kobayashi, T., Nakano, Y., et al.: Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 17(1), 66–83 (2009)CrossRefGoogle Scholar
  5. 5.
    Erro, D., Moreno, A., Bonafonte, A.: Voice conversion based on weighted frequency warping. IEEE Trans. Audio Speech Lang. Process. 18(5), 922–931 (2010)CrossRefGoogle Scholar
  6. 6.
    Desai, S., Black, A.W., Yegnanarayana, B., et al.: Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio Speech Lang. Process. 18(5), 954–964 (2010)CrossRefGoogle Scholar
  7. 7.
    Popa, V., Nurminen, J., Gabbouj M.: A Novel Technique for Voice Conversion Based on Style and Content Decomposition with Bilinear Models. Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009), pp. 6–10. Brighton, U.K. (2009) Google Scholar
  8. 8.
    Michal, A., Michael, E., Alfred, B.: K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)CrossRefGoogle Scholar
  9. 9.
    Xu, N., Yang, Z., Zhang, L.H., et al.: Voice conversion based on state-space model for modelling spectral trajectory. Electron. Lett. 45(14), 763–764 (2009)CrossRefGoogle Scholar
  10. 10.
    Jian, S., Xiongwei, Z., Tieyong, C. et al. Voice conversion based on convolutive non negative matrix factorization. Data Collect. Process. 28(3), 285–390 (2012) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Institute of Communication EngineeringPLA University of Science and TechnologyNanjingChina

Personalised recommendations