An Improved Speech Synthesis Algorithm with Post filter Parameters Based on Deep Neural Network

  • Shunjie Dong
  • Chunyang Li
  • Hong ZhangEmail author
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 517)


Statistical parameters speech synthesis typically relies on context-dependent Hidden Markov Model (HMM) that is based on decision tree clustering. However, the shortcomings of clustering decision tree, restricted to a feature rigid subdivision model space, results in smooth speech parameters generated from HMM. In this paper, Deep Neural Network (DNN) is put forward to replace clustering decision tree, and we propose a post filter-parameter-based speech synthesis improvement algorithm. This method enhances the formant region of synthesized speech spectrum by selecting the most optimized filter parameter according to the flatness of spectrum. The experimental results show that DNN effectively can modify the deficiency of two smooth parameters. Furthermore, the improved post filter algorithm increases the naturalness of synthesized speech.


HTS DNN Post filter HMM Naturalness 



This paper is supported by the URTP project of School of SME, the demonstration course project of Xidian University, and the Ministry of Education cooperation collaborative education project.


  1. 1.
    Dahl, G.E., Yu, D., Deng, L., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  2. 2.
    Qian, Y., Fan, Y., Hu, W., et al.: On the training aspects of deep neural network (DNN) for parametric TTS synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3829–3833. IEEE (2014)Google Scholar
  3. 3.
    Ze, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7962–7966. IEEE (2013)Google Scholar
  4. 4.
    Yoshimura, T., Tokuda, K., Masuko, T., et al.: Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis. Syst. Comput. Jpn. 36(12), 43–50 (2005)CrossRefGoogle Scholar
  5. 5.
    Takamichi, S., Toda, T., Neubig, G., et al.: A postfilter to modify the modulation spectrum in HMM-based speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 290–294. IEEE (2014)Google Scholar
  6. 6.
    Ling, Z.H., Wu, Y.J., Wang, Y.P., et al.: USTC system for Blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006)Google Scholar
  7. 7.
    Deng, L.: Analysis of Deep Learning. Publishing House of Electronics Industry (2016)Google Scholar
  8. 8.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)zbMATHGoogle Scholar
  10. 10.
    Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: NIPS, vol. 4, pp. 950–957 (1991)Google Scholar
  11. 11.
    Grancharov, V., Samuelsson, J., Kleijn, W.B.: Distortion measures for vector quantization of noisy spectrum. In: INTERSPEECH 2005 - Eurospeech, European Conference on Speech Communication and Technology, Lisbon, Portugal, September, DBLP, pp. 3173–3176 (2005)Google Scholar
  12. 12.
    Grancharov, V., Plasberg, J.H., Samuelsson, J., et al.: Generalized postfilter for speech quality enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 57–64 (2008)CrossRefGoogle Scholar
  13. 13.
    Koishida, K., Tokuda, K., Kobayashi, T., et al.: CELP coding based on mel-cepstral analysis. In: International Conference on Acoustics, Speech, and Signal Processing, vol.1, 33–36. IEEE (1995)Google Scholar
  14. 14.
    Ge, Y.K.: Postfilter Parameter Adapted Speech Synthesis Modified Agorithim. Advance publish house (2015)Google Scholar
  15. 15.
    Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)Google Scholar
  16. 16.
    Fan, Y., Qian, Y., Soong, F.K., et al.: Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4475–4479. IEEE (2015)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of MicroelectronicsXidian UniversityXi’anChina
  2. 2.School of Computer Science and TechnologyXidian UniversityXi’anChina

Personalised recommendations