Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning

  • Sou MiyamotoEmail author
  • Takashi Nose
  • Suzunosuke Ito
  • Harunori Koike
  • Yuya Chiba
  • Akinori Ito
  • Takahiro Shinozaki
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 82)


In this study, we propose a voice conversion technique from arbitrary speakers based on deep neural networks using adversarial learning, which is realized by introducing adversarial learning to the conventional voice conversion. Adversarial learning is expected to enable us more natural voice conversion by using a discriminative model which classifies input speech to natural speech or converted speech in addition to a generative model. Experiments showed that proposed method was effective to enhance global variance (GV) of mel-cepstrum but naturalness of converted speech was a little lower than speech using the conventional variance compensation technique.


DNN-based voice conversion Adversarial learning Spectral differential filter Model training 



Part of this work was supported by JSPS KAKENHI Grant Number JP26280055 and JP15H02720.


  1. 1.
    Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings of the ICASSP, pp. 3893–3896 (2009)Google Scholar
  2. 2.
    Furui, S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust. Speech Sig. Process. 34(1), 52–59 (1986)CrossRefGoogle Scholar
  3. 3.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  4. 4.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  5. 5.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint (2015). arXiv:1502.03167
  6. 6.
    Kain, A., Macon, M.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)Google Scholar
  7. 7.
    Koike, H., Nose, T., Shinozaki, T., Ito, A.: Improvement of quality of voice conversion based on spectral differential filter using straight-based mel-cepstral coefficients. J. Acoust. Soc. Am. 140(4), 2963–2963 (2016)CrossRefGoogle Scholar
  8. 8.
    Ling, Z.H., Wu, Y.J., Wang, Y.P., Qin, L., Wang, R.H.: USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006)Google Scholar
  9. 9.
    Morise, M., Yokomori, F., Ozawa, K.: World: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE Trans. Inf. Syst. 99(7), 1877–1884 (2016)CrossRefGoogle Scholar
  10. 10.
    Nose, T., Ota, Y., Kobayashi, T.: HMM-based voice conversion using quantized F0 context. IEICE Trans. Inf. Syst. E93–D(9), 2483–2490 (2010)CrossRefGoogle Scholar
  11. 11.
    Nose, T.: Efficient implementation of global variance compensation for parametric speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1694–1704 (2016)CrossRefGoogle Scholar
  12. 12.
    Pilkington, N.C., Zen, H., Gales, M.J., et al.: Gaussian process experts for voice conversion. In: Proceedings of the INTERSPEECH, pp. 2772–2775 (2011)Google Scholar
  13. 13.
    Saito, Y., Takamichi, S., Saruwatari, H.: Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. In: Proceedings of the ICASSPGoogle Scholar
  14. 14.
    Stylianou, Y.: Voice transformation: a survey. In: Proceedings of the ICASSP, pp. 3585–3588 (2009)Google Scholar
  15. 15.
    Tomoki, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inf. Syst. 90(5), 816–824 (2007)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Sou Miyamoto
    • 1
    Email author
  • Takashi Nose
    • 1
  • Suzunosuke Ito
    • 1
    • 2
  • Harunori Koike
    • 1
    • 2
  • Yuya Chiba
    • 1
  • Akinori Ito
    • 1
  • Takahiro Shinozaki
    • 2
  1. 1.Graduate School of EngineeringTohoku UniversitySendai-shiJapan
  2. 2.Department of Information and Communication Engineering, School of EngineeringTokyo Institute of TechnologyYokohama-shiJapan

Personalised recommendations