An Improved Speech Synthesis Algorithm with Post filter Parameters Based on Deep Neural Network

Dong, Shunjie; Li, Chunyang; Zhang, Hong

doi:10.1007/978-981-13-6508-9_30

An Improved Speech Synthesis Algorithm with Post filter Parameters Based on Deep Neural Network

Shunjie Dong⁴⁰,
Chunyang Li⁴¹ &
Hong Zhang⁴⁰

Conference paper
First Online: 14 June 2019

2346 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 517))

Abstract

Statistical parameters speech synthesis typically relies on context-dependent Hidden Markov Model (HMM) that is based on decision tree clustering. However, the shortcomings of clustering decision tree, restricted to a feature rigid subdivision model space, results in smooth speech parameters generated from HMM. In this paper, Deep Neural Network (DNN) is put forward to replace clustering decision tree, and we propose a post filter-parameter-based speech synthesis improvement algorithm. This method enhances the formant region of synthesized speech spectrum by selecting the most optimized filter parameter according to the flatness of spectrum. The experimental results show that DNN effectively can modify the deficiency of two smooth parameters. Furthermore, the improved post filter algorithm increases the naturalness of synthesized speech.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Dahl, G.E., Yu, D., Deng, L., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Qian, Y., Fan, Y., Hu, W., et al.: On the training aspects of deep neural network (DNN) for parametric TTS synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3829–3833. IEEE (2014)
Google Scholar
Ze, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7962–7966. IEEE (2013)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., et al.: Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis. Syst. Comput. Jpn. 36(12), 43–50 (2005)
Article Google Scholar
Takamichi, S., Toda, T., Neubig, G., et al.: A postfilter to modify the modulation spectrum in HMM-based speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 290–294. IEEE (2014)
Google Scholar
Ling, Z.H., Wu, Y.J., Wang, Y.P., et al.: USTC system for Blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006)
Google Scholar
Deng, L.: Analysis of Deep Learning. Publishing House of Electronics Industry (2016)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
MATH Google Scholar
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: NIPS, vol. 4, pp. 950–957 (1991)
Google Scholar
Grancharov, V., Samuelsson, J., Kleijn, W.B.: Distortion measures for vector quantization of noisy spectrum. In: INTERSPEECH 2005 - Eurospeech, European Conference on Speech Communication and Technology, Lisbon, Portugal, September, DBLP, pp. 3173–3176 (2005)
Google Scholar
Grancharov, V., Plasberg, J.H., Samuelsson, J., et al.: Generalized postfilter for speech quality enhancement. IEEE Trans. Audio Speech Lang. Process. 16(1), 57–64 (2008)
Article Google Scholar
Koishida, K., Tokuda, K., Kobayashi, T., et al.: CELP coding based on mel-cepstral analysis. In: International Conference on Acoustics, Speech, and Signal Processing, vol.1, 33–36. IEEE (1995)
Google Scholar
Ge, Y.K.: Postfilter Parameter Adapted Speech Synthesis Modified Agorithim. Advance publish house (2015)
Google Scholar
Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Google Scholar
Fan, Y., Qian, Y., Soong, F.K., et al.: Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4475–4479. IEEE (2015)
Google Scholar

Download references

Acknowledgements

This paper is supported by the URTP project of School of SME, the demonstration course project of Xidian University, and the Ministry of Education cooperation collaborative education project.

Author information

Authors and Affiliations

School of Microelectronics, Xidian University, Xi’an, China
Shunjie Dong & Hong Zhang
School of Computer Science and Technology, Xidian University, Xi’an, China
Chunyang Li

Authors

Shunjie Dong
View author publications
You can also search for this author in PubMed Google Scholar
Chunyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Zhang .

Editor information

Editors and Affiliations

Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Qilian Liang
School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
Xin Liu
School of Information Science and Technology, Dalian Maritime University, Dalian, China
Zhenyu Na
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Wei Wang
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Jiasong Mu
College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin, China
Baoju Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dong, S., Li, C., Zhang, H. (2020). An Improved Speech Synthesis Algorithm with Post filter Parameters Based on Deep Neural Network. In: Liang, Q., Liu, X., Na, Z., Wang, W., Mu, J., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2018. Lecture Notes in Electrical Engineering, vol 517. Springer, Singapore. https://doi.org/10.1007/978-981-13-6508-9_30

Download citation

DOI: https://doi.org/10.1007/978-981-13-6508-9_30
Published: 14 June 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6507-2
Online ISBN: 978-981-13-6508-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics