High Quality Emotional HMM-Based Synthesis in Spanish

Gonzalvo, Xavi; Taylor, Paul; Monzo, Carlos; Iriondo, Ignasi; Socoró, Joan Claudi

doi:10.1007/978-3-642-11509-7_4

Xavi Gonzalvo^21,22,
Paul Taylor²¹,
Carlos Monzo²²,
Ignasi Iriondo²² &
…
Joan Claudi Socoró²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5933))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

569 Accesses
1 Citations

Abstract

This paper describes a high-quality Spanish HMM-based speech synthesis of emotional speaking styles. The quality of the HMM-based speech synthesis is enhanced by using the most recent features presented for the Blizzard system (i.e. STRAIGHT spectrum extraction and mixed excitation). Two techniques are evaluated. First, a method simultaneously model all emotions within a single acoustic model. Second, an adaptation techniques to convert a neutral emotional style to a target emotion. We consider 3 kinds of emotions expressions: neutral, happy and sad. A subjective evaluation will show the quality of the system and the intensity of the produced emotion while an objective evaluation based on voice quality parameters evaluates the effectiveness of the approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bulut, M., Narayanan, S., Syrdal, A.: Expressive speech synthesis using a concatenative synthesizer. In: Proc. of ICSLP, Denver, USA (September 2002)
Google Scholar
Montero, J.M., Guiterrez-Arriola, J., Colas, J., Macias, J., Enriquez, E., Pardo, J.M.: Development of an emotional speech synthesizer in spanish. In: Proc of Eurospeech, Budapest, pp. 2099–2102 (1999)
Google Scholar
Inanoglu, Z., Young, S.: A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality. In: Proc. of Interspeech, Antwerp, Belgium (August 2007)
Google Scholar
Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Text-to-speech synthesis with arbitrary speaker’s voice from average voice from average voice. In: Proc. of Eurospeech (2001)
Google Scholar
Yamagishi, J., Masuko, T., Kobayashi, T.: HMM-based expressive speech synthesis – towards tts with arbitrary speaking styles and emotions. In: Proc. of Special Workshop in Maui, SWIM (2004)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. of Eurospeech (1999)
Google Scholar
Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E90-D 5, 816–824 (2007)
Google Scholar
Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of nitech HMM-based speech synthesis system for the blizzard challenge 2005. IEICE Transactions on Fund. of Electronics, Comm. and Computer Sciences E90-D 1, 325–333 (2007)
Google Scholar
Kawahara, H., Estill, J., Fujimura, O.: Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight. In: Proc. of MAVEBA (2001)
Google Scholar
Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited. In: Proc. of ICASSP, Washington, DC, USA, p. 1303. IEEE Computer Society, Los Alamitos (1997)
Google Scholar
Gonzalvo, X., Socoró, J., Iriondo, I., Monzo, C., Martínez, E.: Linguistic and mixed excitation improvements on a HMM-based speech synthesis for castilian spanish. In: Proc. of ICASSP, Bonn, Germany (2007)
Google Scholar
Iriondo, I., Planet, S., Socoro, J., Alias, F.: Objective and subjective evaluation of an expressive speech corpus. In: Proc. of NoLISP, Paris, France (May 2007)
Google Scholar
Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J., Kobayashi, T.: HSMM-based model adaptation algorithms for average-voice-based speech synthesis. In: Proc. of ICASSP, Toulouse, France (2006)
Google Scholar
Yamagishi, J., Kobayashi, T.: Hidden Semi-Markov model and its speaker adaptation techniques. IEICE Transactions on Audio, Speech and Language Processing 6 (2007)
Google Scholar
Toda, T., Tokuda, K.: Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. In: Proc. of Interspeech, Portugal, pp. 2801–2804 (2005)
Google Scholar
Kawanami, H., Iwami, Y., Toda, T., Saruwatari, H., Shikano, K.: GMM-based voice conversion applied to emotional speech synthesis. In: Proc. of Eurospeech, Geneva, Switzerland (September 2003)
Google Scholar
Monzo, C., Alias, F., Iriondo, I., Gonzalvo, X., Planet, S.: Discriminating expressive speech styles by voice quality parameterization. In: Proc. of ICPhS (2007)
Google Scholar
Monzo, C., Iriondo, I., Martínez, E.: Procedimiento para la medida y la modificación del jitter y del shimmer aplicado a la síntesis del habla expresiva. In: Proc. of JTH, Bilbao, Spain (2008)
Google Scholar
Michaelis, D., Gramss, T., Strube, H.W.: Glottal to noise excitation ratio - a new measure for describing pathological voices. In: Acustica, Acta Acustica, pp. 700–706 (1997)
Google Scholar
Abdi, H.: Least Squares. In: Lewis-Beck, M., Bryman, A., Futing, T. (eds.) Encyclopedia for research methods for the social sciences, pp. 559–561. Sage, Thousand Oaks (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Phonetic-Arts Ltd. St. John’s Innovation Center, Cambridge, UK
Xavi Gonzalvo & Paul Taylor
Enginyeria i Arquitectura La Salle, Universitat Ramon Llull Grup de Recerca en Processament,
Xavi Gonzalvo, Carlos Monzo, Ignasi Iriondo & Joan Claudi Socoró

Authors

Xavi Gonzalvo
View author publications
You can also search for this author in PubMed Google Scholar
Paul Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Monzo
View author publications
You can also search for this author in PubMed Google Scholar
Ignasi Iriondo
View author publications
You can also search for this author in PubMed Google Scholar
Joan Claudi Socoró
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Escola Politecnica Superior, Universidat de Vic, c/. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Jordi Solé-Casals
Department of Computer Science, Escola Politecnica Superior, Universitat de Vic, c./. Sagrada Familia, 7, 08500, Vic (Barcelona), Spain
Vladimir Zaiats

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonzalvo, X., Taylor, P., Monzo, C., Iriondo, I., Socoró, J.C. (2010). High Quality Emotional HMM-Based Synthesis in Spanish. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-11509-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11508-0
Online ISBN: 978-3-642-11509-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics