Abstract
In this chapter, two hybrid source modeling methods are proposed for improving the quality of HMM-based speech synthesis. In the first method, the optimal pitch-synchronous residual frames which represent the excitation signals of phones are used for modeling the source. In the second method, a hybrid source model which is capable of generating the excitation signal specific to every phone is proposed. Initially, an analysis of phone-dependent characteristics of the excitation signal is performed. In the proposed source model, the pitch-synchronous residual frames of a phone are modeled as a sum of deterministic and noise components.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
HMM-based speech synthesis system (HTS) [Online]. http://hts.sp.nitech.ac.jp/
L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and Regression Trees (Wadsworth & Brooks, Pacific Grove, 1984)
T. Raitio, A. Suni, H. Pulakka, M. Vainio, P. Alku, Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2011), pp. 4564–4567
R.A. Clark, K. Richmond, S. King, Multisyn: open-domain unit selection for the Festival speech synthesis system. Speech Commun. 49, 317–330 (2007)
CMU ARCTIC speech synthesis databases [Online]. http://festvox.org/cmu_arctic/
G. Seshadri, B. Yegnanarayana, Perceived loudness of speech based on the characteristics of glottal excitation source. J. Acoust. Soc. Am. 4, 2061–2071 (2009)
T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)
T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)
H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)
G. Fant, Acoustic Theory of Speech Production (Mouton De Gruyter, Berlin, 1960)
J.L. Flanagan, Source-System Interaction in the Vocal Tract. Ann. New York Acad. Sci. 155(1), 9–17 (1968)
I.R. Titze, B.H. Story, Acoustic interactions of the voice source with the lower vocal tract. J. Acoust. Soc. Am. 101(4), 2234–2243 (1997)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rao, K.S., Narendra, N.P. (2019). Hybrid Approach of Modeling the Source Signal. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-02759-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)