A Voice Conversion Method Based on the Separation of Speaker-Specific Characteristics
This paper aims to study independent and complete characterization of speaker-specific voice characteristics. Thus, the authors conduct a method on the separation between voice characteristics and linguistic content in speech and carry out voice conversion from the point of information separation. In this paper, authors take full account of the K-means singular value decomposition (K-SVD) algorithm which can train the dictionary to contain the personal characteristics and inter-frame correlation of voice. With this feature, the dictionary which contains the personal characteristics is extracted from training data through the K-SVD algorithm. Then the authors use the trained dictionary and other content information to reconstruct the target speech. Compared to traditional methods, the personal characteristics can be better preserved based on the proposed method through the sparse nature of voice and can easily solve the problems encountered in feature mapping methods and the voice conversion improvements are to be expected. Experimental results using objective evaluations show that the proposed method outperforms the Gaussian Mixture Model and Artificial Neural Network based methods in the view of both speech quality and conversion similarity to the target.
KeywordsVoice conversion Speaker-specific characteristics Information separation K-SVD
- 1.Stylianou, Y.: Voice transformation: a survey. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3585–3588 (2009)Google Scholar
- 2.Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 655–658 (1988)Google Scholar
- 7.Popa, V., Nurminen, J., Gabbouj M.: A Novel Technique for Voice Conversion Based on Style and Content Decomposition with Bilinear Models. Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009), pp. 6–10. Brighton, U.K. (2009) Google Scholar
- 10.Jian, S., Xiongwei, Z., Tieyong, C. et al. Voice conversion based on convolutive non negative matrix factorization. Data Collect. Process. 28(3), 285–390 (2012) Google Scholar