Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning
In this study, we propose a voice conversion technique from arbitrary speakers based on deep neural networks using adversarial learning, which is realized by introducing adversarial learning to the conventional voice conversion. Adversarial learning is expected to enable us more natural voice conversion by using a discriminative model which classifies input speech to natural speech or converted speech in addition to a generative model. Experiments showed that proposed method was effective to enhance global variance (GV) of mel-cepstrum but naturalness of converted speech was a little lower than speech using the conventional variance compensation technique.
KeywordsDNN-based voice conversion Adversarial learning Spectral differential filter Model training
Part of this work was supported by JSPS KAKENHI Grant Number JP26280055 and JP15H02720.
- 1.Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings of the ICASSP, pp. 3893–3896 (2009)Google Scholar
- 3.Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
- 5.Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint (2015). arXiv:1502.03167
- 6.Kain, A., Macon, M.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)Google Scholar
- 8.Ling, Z.H., Wu, Y.J., Wang, Y.P., Qin, L., Wang, R.H.: USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006)Google Scholar
- 12.Pilkington, N.C., Zen, H., Gales, M.J., et al.: Gaussian process experts for voice conversion. In: Proceedings of the INTERSPEECH, pp. 2772–2775 (2011)Google Scholar
- 13.Saito, Y., Takamichi, S., Saruwatari, H.: Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. In: Proceedings of the ICASSPGoogle Scholar
- 14.Stylianou, Y.: Voice transformation: a survey. In: Proceedings of the ICASSP, pp. 3585–3588 (2009)Google Scholar
- 15.Tomoki, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inf. Syst. 90(5), 816–824 (2007)Google Scholar