Skip to main content

Mixing HMM-Based Spanish Speech Synthesis with a CBR for Prosody Estimation

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4885))

Included in the following conference series:

Abstract

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using an external machine learning technique to help improving the expressiveness. System performance is analysed objectively and subjectively. The experiments were conducted on a reliably labelled speech corpus, whose units were clustered using contextual factors based on the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing non-declarative short sentences while the durations accuracy is similar with the CBR or the HMM system.

Thanks to Prof. Dr. Eric Keller, University of Lausanne, for kindly spending a time on verifying this paper. This work has been partially supported by the European Commission, project SALERO FP6 IST-4-027122-IP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alías, F., Iriondo, I.: Formiga, Ll., Gonzalvo, X., Monzo, C., Sevillano, X.: High quality Spanish restricted-domain TTS oriented to a weather forecast application. In: INTERSPEECH (2005)

    Google Scholar 

  2. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in hmm-based speech synthesis. In: Eurospeech (1999)

    Google Scholar 

  3. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Speaker interpolation in HMM-based speech synthesis. In: EUROSPEECH (1997)

    Google Scholar 

  4. Shichiri, K., Sawabe, A., Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Eigenvoices for HMM-based speech synthesis. In: ICSLP (2002)

    Google Scholar 

  5. Latorre, J., Iwano, K., Furui, S.: Cross-language synthesis with a polyglot synthesizer. In: INTERSPEECH, pp. 1477–1480 (2005)

    Google Scholar 

  6. Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English, IEEE SSW (2002)

    Google Scholar 

  7. Maia, R., Zen, H., Tokuda, K., Kitamura, T., Resende, J.F.G.: Towards the development of a Brazilian Portuguese text-to-speech system based on HMM. In: Eurospeech (2003)

    Google Scholar 

  8. Toda, T., Tokuda, K.: A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis. IEICE Transactions E90-D(5), 816–824 (2007)

    Google Scholar 

  9. Donovan, R.E., Woodland, P.C.: A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language 13, 223–241 (1999)

    Article  Google Scholar 

  10. Iriondo, I., Socoró, J.C., Formiga, L., Gonzalvo, X., Alías, F., Miralles, P.: Modeling and estimating of prosody through CBR. In: JTH 2006 (in Spanish)

    Google Scholar 

  11. Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for mel-cepstral analysis of speech. In: ICASSP 1992 (1992)

    Google Scholar 

  12. Alías, F., Monzo, C., Socoró, J.C.: A Pitch Marks Filtering Algorithm based on Restricted Dynamic Programming. In: InterSpeech - ICSLP 2006 (2006)

    Google Scholar 

  13. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Duration modeling in HMM-based speech synthesis system. In: ICSP 1998 (1998)

    Google Scholar 

  14. Section software in http://www.salle.url.edu/tsenyal

  15. Black, A.W., Taylor, P., Caley, R.: The Festival Speech Synthesis System, http://www.festvox.org/festival

  16. HTS, http://hts.ics.nitech.ac.jp

  17. Keller, E., Zellner Keller, B.: How Much Prosody Can You Learn from Twenty Utterances? Linguistik online 17(5/03), 57–78 (2003), http://www.linguistik-online.de/

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mohamed Chetouani Amir Hussain Bruno Gas Maurice Milgram Jean-Luc Zarader

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gonzalvo, X., Iriondo, I., Socoró, J.C., Alías, F., Monzo, C. (2007). Mixing HMM-Based Spanish Speech Synthesis with a CBR for Prosody Estimation. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77347-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77346-7

  • Online ISBN: 978-3-540-77347-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics