Mixing HMM-Based Spanish Speech Synthesis with a CBR for Prosody Estimation

Gonzalvo, Xavi; Iriondo, Ignasi; Socoró, Joan Claudi; Alías, Francesc; Monzo, Carlos

doi:10.1007/978-3-540-77347-4_4

Xavi Gonzalvo¹,
Ignasi Iriondo¹,
Joan Claudi Socoró¹,
Francesc Alías¹ &
…
Carlos Monzo¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4885))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

580 Accesses
1 Citations

Abstract

Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using an external machine learning technique to help improving the expressiveness. System performance is analysed objectively and subjectively. The experiments were conducted on a reliably labelled speech corpus, whose units were clustered using contextual factors based on the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing non-declarative short sentences while the durations accuracy is similar with the CBR or the HMM system.

Thanks to Prof. Dr. Eric Keller, University of Lausanne, for kindly spending a time on verifying this paper. This work has been partially supported by the European Commission, project SALERO FP6 IST-4-027122-IP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alías, F., Iriondo, I.: Formiga, Ll., Gonzalvo, X., Monzo, C., Sevillano, X.: High quality Spanish restricted-domain TTS oriented to a weather forecast application. In: INTERSPEECH (2005)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in hmm-based speech synthesis. In: Eurospeech (1999)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Speaker interpolation in HMM-based speech synthesis. In: EUROSPEECH (1997)
Google Scholar
Shichiri, K., Sawabe, A., Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Eigenvoices for HMM-based speech synthesis. In: ICSLP (2002)
Google Scholar
Latorre, J., Iwano, K., Furui, S.: Cross-language synthesis with a polyglot synthesizer. In: INTERSPEECH, pp. 1477–1480 (2005)
Google Scholar
Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English, IEEE SSW (2002)
Google Scholar
Maia, R., Zen, H., Tokuda, K., Kitamura, T., Resende, J.F.G.: Towards the development of a Brazilian Portuguese text-to-speech system based on HMM. In: Eurospeech (2003)
Google Scholar
Toda, T., Tokuda, K.: A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis. IEICE Transactions E90-D(5), 816–824 (2007)
Google Scholar
Donovan, R.E., Woodland, P.C.: A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language 13, 223–241 (1999)
Article Google Scholar
Iriondo, I., Socoró, J.C., Formiga, L., Gonzalvo, X., Alías, F., Miralles, P.: Modeling and estimating of prosody through CBR. In: JTH 2006 (in Spanish)
Google Scholar
Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for mel-cepstral analysis of speech. In: ICASSP 1992 (1992)
Google Scholar
Alías, F., Monzo, C., Socoró, J.C.: A Pitch Marks Filtering Algorithm based on Restricted Dynamic Programming. In: InterSpeech - ICSLP 2006 (2006)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Duration modeling in HMM-based speech synthesis system. In: ICSP 1998 (1998)
Google Scholar
Section software in http://www.salle.url.edu/tsenyal
Black, A.W., Taylor, P., Caley, R.: The Festival Speech Synthesis System, http://www.festvox.org/festival
HTS, http://hts.ics.nitech.ac.jp
Keller, E., Zellner Keller, B.: How Much Prosody Can You Learn from Twenty Utterances? Linguistik online 17(5/03), 57–78 (2003), http://www.linguistik-online.de/
Google Scholar

Download references

Author information

Authors and Affiliations

GPMM - Grup de Recerca en Processament Multimodal, Enginyeria i Arquitectura La Salle, Universitat Ramon Llull, Quatre Camins 2, 08022 Barcelona, Spain
Xavi Gonzalvo, Ignasi Iriondo, Joan Claudi Socoró, Francesc Alías & Carlos Monzo

Authors

Xavi Gonzalvo
View author publications
You can also search for this author in PubMed Google Scholar
Ignasi Iriondo
View author publications
You can also search for this author in PubMed Google Scholar
Joan Claudi Socoró
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Alías
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Monzo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mohamed Chetouani Amir Hussain Bruno Gas Maurice Milgram Jean-Luc Zarader

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonzalvo, X., Iriondo, I., Socoró, J.C., Alías, F., Monzo, C. (2007). Mixing HMM-Based Spanish Speech Synthesis with a CBR for Prosody Estimation. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-77347-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77346-7
Online ISBN: 978-3-540-77347-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics