Fine Vocoder Tuning for HMM-Based Speech Synthesis: Effect of the Analysis Window Length

Alonso, Agustin; Erro, Daniel; Navas, Eva; Hernaez, Inma

doi:10.1007/978-3-319-13623-3_3

Agustin Alonso²³,
Daniel Erro^23,24,
Eva Navas²³ &
…
Inma Hernaez²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8854))

824 Accesses

Abstract

This paper studies how the length of the window used during spectral envelope estimation influences the perceptual quality of HMM-based speech synthesis. We show that the acoustic differences due to variations in the window length are audible. The experiments reveal an overall preference towards short analysis windows, although longer windows seem to alleviate some artifacts related to training data scarcity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)
Article Google Scholar
Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., Oura, K.: Speech synthesis based on hidden Markov Models. Proceedings IEEE 101(5), 1234–1252 (2013)
Article Google Scholar
Toda, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Transactions on Information and System E90-D(5), 816–824 (2007)
Article Google Scholar
HHM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings ICSLP, vol. 3, pp. 1043–1046 (1994)
Google Scholar
Imai, S.: Cepstral analysis synthesis on the mel frequency scale. In: Proceedigns ICASSP, pp. 93–96 (1983)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Mixed excitation for HMM-based speech synthesis. In: Proceedings Eurospeech, pp. 2263–2266 (2001)
Google Scholar
Gonzalvo, X., Socorro, J.C., Iriondo, I., Monzo, C., Martinez, E.: Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish. In: Proceedings of the 6th ISCA Speech Synthesis Workshop, pp. 362–367 (2007)
Google Scholar
Maia, R., Toda, T., Zen, H., Nankaku, Y., Tokuda, K.: An excitation model for HMM-based speech synthesis based on residual modeling. In: Proceedings 6th ISCA Speech Synthesis Workshop, pp. 131–136 (2007)
Google Scholar
Drugman, T., Wilfart, G., Dutoit, T.: A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis. In: Proceedings Interspeech, pp. 1779–1782 (2009)
Google Scholar
Zen, H., Toda, T., Nakamura, M., Tokuda, K.: Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Transactions on Information and System E90-D(1), 325–333 (2007)
Article Google Scholar
Kawahara, H., Masuda-Kasuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Article Google Scholar
Cabral, J.P., Renals, S., Richmond, K., Yamagishi, J.: Glottal Spectra Separation for Parametric Speech Synthesis. In: Proceedings Interspeech, pp. 1829–1832 (2008)
Google Scholar
Lanchantin, P., Degottex, G., Rodet, X.: A HMM-based speech synthesis system using a new glottal source and vocal-tract separation method. In: Proceedings ICASSP, pp. 4630–4633 (2010)
Google Scholar
Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., Alku, P.: HMM-based Speech Synthesis Utilizing Glottal Inverse Filtering. IEEE Transactions on Audio Speech and Language Processing 19(1), 153–165 (2011)
Article Google Scholar
Banos, E., Derro, D., Bonafonte, A., Moreno, A.: Flexible harmonic/stochastic modeling for HMM-based speech synthesis. In: Proceedings V Jornadas en Tecnologías del Habla, pp. 145–148 (2008)
Google Scholar
Shechtman, S., Sorin, A.: Sinusoidal model parameterization for HMM-based TTS system. In: Proceedings Interspeech, pp. 805–808 (2010)
Google Scholar
Erro, D., Sainz, I., Navas, E., Hernaez, I.: Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Processing (in press)
Google Scholar
Toda, T., Tokuda, K.: Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM. In: Proceedings ICASSP, pp. 3925–3928 (2008)
Google Scholar
Wu, Y.J., Tokuda, K.: Minimum generation error training by using original spectrum as reference for log spectral distortion measure. In: Proceedings ICASSP, pp. 4013–4016 (2009)
Google Scholar
Ling, Z.H., Deng, L., Yu, D.: Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Transactions on Audio Speech and Language Processing 21(10), 2129–2139 (2013)
Article Google Scholar
Hojo, N., Yoshizato, K., Kameoka, H., Saito, D., Sagayama, S.: Text-to-speech synthesizer based on combination of composite wavelet and hidden Markov models. In: Proceedings of the 8th ISCA Speech Synthesis Workshop, pp. 129–134 (2013)
Google Scholar
Stylianou, Y.: Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. thesis, École Nationale Supèrieure de Télécommunications, Paris (1996)
Google Scholar
Erro, D., Sainz, I., Navas, E., Hernaez, I.: Efficient spectral envelope estimation from harmonic speech signals. IET Electronics Letters 48(16), 1019–1021 (2012)
Article Google Scholar
Cappé, O., Laroche, J., Moulines, E.: Regularized estimation of cepstrum envelope from discrete frequency points. In: Proceedings WASPAA, pp. 213–219 (1995)
Google Scholar
Rix, A.W., Beerends, J.G., Hollier, M.P., Hekstra, A.P.: Perceptual evaluation of speech quality (PESQ) – a new method for speech quality assessment of telephone networks and codecs. In: Proceedings ICASSP, pp. 749–752 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

AHOLAB, University of the Basque Country (UPV/EHU), Bilbao, Spain
Agustin Alonso, Daniel Erro, Eva Navas & Inma Hernaez
IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
Daniel Erro

Authors

Agustin Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Erro
View author publications
You can also search for this author in PubMed Google Scholar
Eva Navas
View author publications
You can also search for this author in PubMed Google Scholar
Inma Hernaez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETSIT, Las Palmas de Gran Canaria, Spain
Juan Luis Navarro Mesa , Eduardo Hernández Pérez , Pedro Quintana Morales , Antonio Ravelo García & Iván Guerra Moreno , , , &
University of Zaragoza, Spain
Alfonso Ortega
Dep. of Electronics, Telecommunications and Informatics Engineering, University of Aveiro, Portugal
António Teixeira
ATVS Biometric Recognition Group,, Universidad Autónoma de Madrid, Spain
Doroteo T. Toledano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alonso, A., Erro, D., Navas, E., Hernaez, I. (2014). Fine Vocoder Tuning for HMM-Based Speech Synthesis: Effect of the Analysis Window Length. In: Navarro Mesa, J.L., et al. Advances in Speech and Language Technologies for Iberian Languages. Lecture Notes in Computer Science(), vol 8854. Springer, Cham. https://doi.org/10.1007/978-3-319-13623-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-13623-3_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13622-6
Online ISBN: 978-3-319-13623-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics