Skip to main content

Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Abstract

This paper focuses on the understanding of the effects leading to high-quality HMM-based speech synthesis with various degrees of articulation. The adaptation of a neutral speech synthesizer to generate hypo and hyperarticulated speech is first performed. The impact of cepstral adaptation, of prosody, of phonetic transcription as well as the adaptation technique on the perceived degree of articulation is studied. For this, a subjective evaluation is conducted. It is shown that high-quality hypo and hyperarticulated speech synthesis requires the use of an efficient adaptation such as CMLLR. Moreover, in addition to prosody adaptation, the importance of cepstrum adaptation as well as the use of a Natural Language Processor able to generate realistic hypo and hyperarticulated phonetic transcriptions is assessed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lindblom, B.: Economy of Speech Gestures. The Production of Speech. Springer, New-York (1983)

    Book  Google Scholar 

  2. Beller, G.: Analyse et Modèle Génératif de l’Expressivité - Application à la Parole et à l’Interprétation Musicale, PhD Thesis, Universit Paris VI - Pierre et Marie Curie, IRCAM (2009) (in French)

    Google Scholar 

  3. Beller, G., Obin, N., Rodet, X.: Articulation Degree as a Prosodic Dimension of Expressive Speech. In: Fourth International Conference on Speech Prosody, Campinas, Brazil (2008)

    Google Scholar 

  4. Picart, B., Drugman, T., Dutoit, T.: Analysis and Synthesis of Hypo and Hyperarticulated Speech. In: Proc. Speech Synthesis Workshop 7 (SSW7), Kyoto, Japan (2010)

    Google Scholar 

  5. Picart, B., Drugman, T., Dutoit, T.: Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis. In: Proc. Interspeech, Firenze, Italy (2011)

    Google Scholar 

  6. Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., King, S., Renals, S.: A Robust Speaker-Adaptive HMM-based Text-to-Speech Synthesis. IEEE Audio, Speech, & Language Processing 17(6), 1208–1230 (2009)

    Article  Google Scholar 

  7. Yamagishi, J., Masuko, T., Kobayashi, T.: HMM-based expressive speech synthesis – Towards TTS with arbitrary speaking styles and emotions. In: Proc. of Special Workshop in Maui, SWIM (2004)

    Google Scholar 

  8. Nose, T., Tachibana, M., Kobayashi, T.: HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker’s Voice Using Model Adaptation. IEICE Transactions on Information and Systems 92(3), 489–497 (2009)

    Article  Google Scholar 

  9. HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/

  10. Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)

    Article  Google Scholar 

  11. Drugman, T., Wilfart, G., Dutoit, T.: A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis. In: Proc. Interspeech, Brighton, U.K. (2009)

    Google Scholar 

  12. Digalakis, V., Rtischev, D., Neumeyer, L.: Speaker adaptation using constrained reestimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)

    Article  Google Scholar 

  13. Gales, M.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)

    Article  Google Scholar 

  14. Ferguson, J.: Variable Duration Models for Speech. In: Proc. Symp. on the Application of Hidden Markov Models to Text and Speech, pp. 143–179 (1980)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Picart, B., Drugman, T., Dutoit, T. (2011). Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25020-0_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25019-4

  • Online ISBN: 978-3-642-25020-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics