Skip to main content

High Quality Emotional HMM-Based Synthesis in Spanish

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5933))

Included in the following conference series:

Abstract

This paper describes a high-quality Spanish HMM-based speech synthesis of emotional speaking styles. The quality of the HMM-based speech synthesis is enhanced by using the most recent features presented for the Blizzard system (i.e. STRAIGHT spectrum extraction and mixed excitation). Two techniques are evaluated. First, a method simultaneously model all emotions within a single acoustic model. Second, an adaptation techniques to convert a neutral emotional style to a target emotion. We consider 3 kinds of emotions expressions: neutral, happy and sad. A subjective evaluation will show the quality of the system and the intensity of the produced emotion while an objective evaluation based on voice quality parameters evaluates the effectiveness of the approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bulut, M., Narayanan, S., Syrdal, A.: Expressive speech synthesis using a concatenative synthesizer. In: Proc. of ICSLP, Denver, USA (September 2002)

    Google Scholar 

  2. Montero, J.M., Guiterrez-Arriola, J., Colas, J., Macias, J., Enriquez, E., Pardo, J.M.: Development of an emotional speech synthesizer in spanish. In: Proc of Eurospeech, Budapest, pp. 2099–2102 (1999)

    Google Scholar 

  3. Inanoglu, Z., Young, S.: A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality. In: Proc. of Interspeech, Antwerp, Belgium (August 2007)

    Google Scholar 

  4. Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Text-to-speech synthesis with arbitrary speaker’s voice from average voice from average voice. In: Proc. of Eurospeech (2001)

    Google Scholar 

  5. Yamagishi, J., Masuko, T., Kobayashi, T.: HMM-based expressive speech synthesis – towards tts with arbitrary speaking styles and emotions. In: Proc. of Special Workshop in Maui, SWIM (2004)

    Google Scholar 

  6. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. of Eurospeech (1999)

    Google Scholar 

  7. Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E90-D 5, 816–824 (2007)

    Google Scholar 

  8. Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of nitech HMM-based speech synthesis system for the blizzard challenge 2005. IEICE Transactions on Fund. of Electronics, Comm. and Computer Sciences E90-D 1, 325–333 (2007)

    Google Scholar 

  9. Kawahara, H., Estill, J., Fujimura, O.: Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight. In: Proc. of MAVEBA (2001)

    Google Scholar 

  10. Kawahara, H.: Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited. In: Proc. of ICASSP, Washington, DC, USA, p. 1303. IEEE Computer Society, Los Alamitos (1997)

    Google Scholar 

  11. Gonzalvo, X., Socoró, J., Iriondo, I., Monzo, C., Martínez, E.: Linguistic and mixed excitation improvements on a HMM-based speech synthesis for castilian spanish. In: Proc. of ICASSP, Bonn, Germany (2007)

    Google Scholar 

  12. Iriondo, I., Planet, S., Socoro, J., Alias, F.: Objective and subjective evaluation of an expressive speech corpus. In: Proc. of NoLISP, Paris, France (May 2007)

    Google Scholar 

  13. Yamagishi, J., Ogata, K., Nakano, Y., Isogai, J., Kobayashi, T.: HSMM-based model adaptation algorithms for average-voice-based speech synthesis. In: Proc. of ICASSP, Toulouse, France (2006)

    Google Scholar 

  14. Yamagishi, J., Kobayashi, T.: Hidden Semi-Markov model and its speaker adaptation techniques. IEICE Transactions on Audio, Speech and Language Processing 6 (2007)

    Google Scholar 

  15. Toda, T., Tokuda, K.: Speech parameter generation algorithm considering global variance for HMM-based speech synthesis. In: Proc. of Interspeech, Portugal, pp. 2801–2804 (2005)

    Google Scholar 

  16. Kawanami, H., Iwami, Y., Toda, T., Saruwatari, H., Shikano, K.: GMM-based voice conversion applied to emotional speech synthesis. In: Proc. of Eurospeech, Geneva, Switzerland (September 2003)

    Google Scholar 

  17. Monzo, C., Alias, F., Iriondo, I., Gonzalvo, X., Planet, S.: Discriminating expressive speech styles by voice quality parameterization. In: Proc. of ICPhS (2007)

    Google Scholar 

  18. Monzo, C., Iriondo, I., Martínez, E.: Procedimiento para la medida y la modificación del jitter y del shimmer aplicado a la síntesis del habla expresiva. In: Proc. of JTH, Bilbao, Spain (2008)

    Google Scholar 

  19. Michaelis, D., Gramss, T., Strube, H.W.: Glottal to noise excitation ratio - a new measure for describing pathological voices. In: Acustica, Acta Acustica, pp. 700–706 (1997)

    Google Scholar 

  20. Abdi, H.: Least Squares. In: Lewis-Beck, M., Bryman, A., Futing, T. (eds.) Encyclopedia for research methods for the social sciences, pp. 559–561. Sage, Thousand Oaks (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gonzalvo, X., Taylor, P., Monzo, C., Iriondo, I., Socoró, J.C. (2010). High Quality Emotional HMM-Based Synthesis in Spanish. In: Solé-Casals, J., Zaiats, V. (eds) Advances in Nonlinear Speech Processing. NOLISP 2009. Lecture Notes in Computer Science(), vol 5933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11509-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11509-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11508-0

  • Online ISBN: 978-3-642-11509-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics