Abstract
Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is a technique for generating speech from trained statistical models where spectrum, pitch and durations of basic speech units are modelled altogether. The aim of this work is to describe a Spanish HMM-TTS system using an external machine learning technique to help improving the expressiveness. System performance is analysed objectively and subjectively. The experiments were conducted on a reliably labelled speech corpus, whose units were clustered using contextual factors based on the Spanish language. The results show that the CBR-based F0 estimation is capable of improving the HMM-based baseline performance when synthesizing non-declarative short sentences while the durations accuracy is similar with the CBR or the HMM system.
Thanks to Prof. Dr. Eric Keller, University of Lausanne, for kindly spending a time on verifying this paper. This work has been partially supported by the European Commission, project SALERO FP6 IST-4-027122-IP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alías, F., Iriondo, I.: Formiga, Ll., Gonzalvo, X., Monzo, C., Sevillano, X.: High quality Spanish restricted-domain TTS oriented to a weather forecast application. In: INTERSPEECH (2005)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in hmm-based speech synthesis. In: Eurospeech (1999)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Speaker interpolation in HMM-based speech synthesis. In: EUROSPEECH (1997)
Shichiri, K., Sawabe, A., Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Eigenvoices for HMM-based speech synthesis. In: ICSLP (2002)
Latorre, J., Iwano, K., Furui, S.: Cross-language synthesis with a polyglot synthesizer. In: INTERSPEECH, pp. 1477–1480 (2005)
Tokuda, K., Zen, H., Black, A.W.: An HMM-based speech synthesis system applied to English, IEEE SSW (2002)
Maia, R., Zen, H., Tokuda, K., Kitamura, T., Resende, J.F.G.: Towards the development of a Brazilian Portuguese text-to-speech system based on HMM. In: Eurospeech (2003)
Toda, T., Tokuda, K.: A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis. IEICE Transactions E90-D(5), 816–824 (2007)
Donovan, R.E., Woodland, P.C.: A hidden Markov-model-based trainable speech synthesizer. Computer Speech and Language 13, 223–241 (1999)
Iriondo, I., Socoró, J.C., Formiga, L., Gonzalvo, X., Alías, F., Miralles, P.: Modeling and estimating of prosody through CBR. In: JTH 2006 (in Spanish)
Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for mel-cepstral analysis of speech. In: ICASSP 1992 (1992)
Alías, F., Monzo, C., Socoró, J.C.: A Pitch Marks Filtering Algorithm based on Restricted Dynamic Programming. In: InterSpeech - ICSLP 2006 (2006)
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Duration modeling in HMM-based speech synthesis system. In: ICSP 1998 (1998)
Section software in http://www.salle.url.edu/tsenyal
Black, A.W., Taylor, P., Caley, R.: The Festival Speech Synthesis System, http://www.festvox.org/festival
Keller, E., Zellner Keller, B.: How Much Prosody Can You Learn from Twenty Utterances? Linguistik online 17(5/03), 57–78 (2003), http://www.linguistik-online.de/
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gonzalvo, X., Iriondo, I., Socoró, J.C., Alías, F., Monzo, C. (2007). Mixing HMM-Based Spanish Speech Synthesis with a CBR for Prosody Estimation. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-77347-4_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77346-7
Online ISBN: 978-3-540-77347-4
eBook Packages: Computer ScienceComputer Science (R0)