Abstract
This chapter deals with the development of an HMM-based speech synthesis system that is capable of generating the creaky voice in addition to modal voice. For identifying the creaky regions in the speech utterance, an automatic creaky voice detection method is proposed based on the analysis of variation of epoch parameters for different voicing regions. A neural network classifier is trained using the variances of epoch parameters for detection of creaky regions. A hybrid source model which is an extension of the time-domain deterministic plus noise model-based source model (discussed in the previous chapter) is proposed for modeling the creaky excitation signal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
K.S.R. Murty, B. Yegnanarayana, M.A. Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)
P. Alku, T. Bakstrom, E. Vikman, Normalized amplitude quotient for parameterization of the glottal flow. J. Acoust. Soc. Am. 112(2), 701–710 (2002)
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006)
CMU ARCTIC speech synthesis databases [Online]. http://festvox.org/cmu_arctic/
H. Sil, E. Helander, K. Koppinen, M. Gabbouj, Building a Finnish unit selection TTS system, in Proceedings of International Speech Communication Association Speech Synthesis Workshop 6 (ISCA SW6) (2007), pp. 310–315
M. Vainio, Artificial neural network based prosody models for Finnish text-to-speech synthesis. Ph.D. dissertation, University of Helsinki, Finland, 2001
C. Ishi, K. Sakakibara, H. Ishiguro, N. Hagita, A method for automatic detection of vocal fry. IEEE Trans. Audio Speech Lang. Process. 16(1), 47–56 (2008)
J. Kane, T. Drugman, C. Gobl, Improved automatic detection of creak. Comput. Speech Lang. 27(4), 1028–1047 (2013)
T. Drugman, T. Raitio, Excitation modeling for HMM-based speech synthesis: breaking down the impact of periodic and aperiodic components, in Proceedings of International Conference on Audio, Speech and Signal Processing (ICASSP) (2014), pp. 260–264
M. Blomgren, Y. Chen, M. Ng, H. Gilbert, Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. J. Acoust. Soc. Am. 103(5), 2649–2658 (1998)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rao, K.S., Narendra, N.P. (2019). Generation of Creaky Voice. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-02759-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02758-2
Online ISBN: 978-3-030-02759-9
eBook Packages: EngineeringEngineering (R0)