Abstract
This paper describes a high-quality Spanish HMM-based speech synthesis of emotional speaking styles. The quality of the HMM-based speech synthesis is enhanced by using the most recent features presented for the Blizzard system (i.e. STRAIGHT spectrum extraction and mixed excitation). Two techniques are evaluated. First, a method simultaneously model all emotions within a single acoustic model. Second, an adaptation techniques to convert a neutral emotional style to a target emotion. We consider 3 kinds of emotions expressions: neutral, happy and sad. A subjective evaluation will show the quality of the system and the intensity of the produced emotion while an objective evaluation based on voice quality parameters evaluates the effectiveness of the approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bulut, M., Narayanan, S., Syrdal, A.: Expressive speech synthesis using a concatenative synthesizer. In: Proc. of ICSLP, Denver, USA (September 2002)
Montero, J.M., Guiterrez-Arriola, J., Colas, J., Macias, J., Enriquez, E., Pardo, J.M.: Development of an emotional speech synthesizer in spanish. In: Proc of Eurospeech, Budapest, pp. 2099–2102 (1999)
Inanoglu, Z., Young, S.: A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality. In: Proc. of Interspeech, Antwerp, Belgium (August 2007)
Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Text-to-speech synthesis with arbitrary speaker’s voice from average voice from average voice. In: Proc. of Eurospeech (2001)
Yamagishi, J., Masuko, T., Kobayashi, T.: HMM-based expressive speech synthesis – towards tts with arbitrary speaking styles and emotions. In: Proc. of Special Workshop in Maui, SWIM (2004)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. of Eurospeech (1999)
Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E90-D 5, 816–824 (2007)
Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of nitech HMM-based speech synthesis system for the blizzard challenge 2005. IEICE Transactions on Fund. of Electronics, Comm. and Computer Sciences E90-D 1, 325–333 (2007)
Kawahara, H., Estill, J., Fujimura, O.: Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight. In: Proc. of MAVEBA (2001)
Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited. In: Proc. of ICASSP, Washington, DC, USA, p. 1303. IEEE Computer Society, Los Alamitos (1997)
Gonzalvo, X., Socoró, J., Iriondo, I., Monzo, C., Martínez, E.: Linguistic and mixed excitation improvements on a HMM-based speech synthesis for castilian spanish. In: Proc. of ICASSP, Bonn, Germany (2007)
Iriondo, I., Planet, S., Socoro, J., Alias, F.: Objective and subjective evaluation of an expressive speech corpus. In: Proc. of NoLISP, Paris, France (May 2007)
Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J., Kobayashi, T.: HSMM-based model adaptation algorithms for average-voice-based speech synthesis. In: Proc. of ICASSP, Toulouse, France (2006)
Yamagishi, J., Kobayashi, T.: Hidden Semi-Markov model and its speaker adaptation techniques. IEICE Transactions on Audio, Speech and Language Processing 6 (2007)
Toda, T., Tokuda, K.: Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. In: Proc. of Interspeech, Portugal, pp. 2801–2804 (2005)
Kawanami, H., Iwami, Y., Toda, T., Saruwatari, H., Shikano, K.: GMM-based voice conversion applied to emotional speech synthesis. In: Proc. of Eurospeech, Geneva, Switzerland (September 2003)
Monzo, C., Alias, F., Iriondo, I., Gonzalvo, X., Planet, S.: Discriminating expressive speech styles by voice quality parameterization. In: Proc. of ICPhS (2007)
Monzo, C., Iriondo, I., Martínez, E.: Procedimiento para la medida y la modificación del jitter y del shimmer aplicado a la síntesis del habla expresiva. In: Proc. of JTH, Bilbao, Spain (2008)
Michaelis, D., Gramss, T., Strube, H.W.: Glottal to noise excitation ratio - a new measure for describing pathological voices. In: Acustica, Acta Acustica, pp. 700–706 (1997)
Abdi, H.: Least Squares. In: Lewis-Beck, M., Bryman, A., Futing, T. (eds.) Encyclopedia for research methods for the social sciences, pp. 559–561. Sage, Thousand Oaks (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gonzalvo, X., Taylor, P., Monzo, C., Iriondo, I., Socoró, J.C. (2010). High Quality Emotional HMM-Based Synthesis in Spanish. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-11509-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11508-0
Online ISBN: 978-3-642-11509-7
eBook Packages: Computer ScienceComputer Science (R0)