Skip to main content

Hybrid Approach of Modeling the Source Signal

  • Chapter
  • First Online:
  • 312 Accesses

Part of the book series: SpringerBriefs in Speech Technology ((BRIEFSSPEECHTECH))

Abstract

In this chapter, two hybrid source modeling methods are proposed for improving the quality of HMM-based speech synthesis. In the first method, the optimal pitch-synchronous residual frames which represent the excitation signals of phones are used for modeling the source. In the second method, a hybrid source model which is capable of generating the excitation signal specific to every phone is proposed. Initially, an analysis of phone-dependent characteristics of the excitation signal is performed. In the proposed source model, the pitch-synchronous residual frames of a phone are modeled as a sum of deterministic and noise components.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. HMM-based speech synthesis system (HTS) [Online]. http://hts.sp.nitech.ac.jp/

  2. L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and Regression Trees (Wadsworth & Brooks, Pacific Grove, 1984)

    Google Scholar 

  3. T. Raitio, A. Suni, H. Pulakka, M. Vainio, P. Alku, Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, (ICASSP) (2011), pp. 4564–4567

    Google Scholar 

  4. R.A. Clark, K. Richmond, S. King, Multisyn: open-domain unit selection for the Festival speech synthesis system. Speech Commun. 49, 317–330 (2007)

    Article  Google Scholar 

  5. CMU ARCTIC speech synthesis databases [Online]. http://festvox.org/cmu_arctic/

  6. G. Seshadri, B. Yegnanarayana, Perceived loudness of speech based on the characteristics of glottal excitation source. J. Acoust. Soc. Am. 4, 2061–2071 (2009)

    Article  Google Scholar 

  7. T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, P. Alku, HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Trans. Audio Speech Lang. Process. 19(1), 153–165 (2011)

    Article  Google Scholar 

  8. T. Drugman, T. Dutoit, The deterministic plus stochastic model of the residual signal and its applications. IEEE Trans. Audio Speech Lang. Process. 20(3), 968–981 (2012)

    Article  Google Scholar 

  9. H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst. E90-D(1), 325–333 (2007)

    Article  Google Scholar 

  10. G. Fant, Acoustic Theory of Speech Production (Mouton De Gruyter, Berlin, 1960)

    Google Scholar 

  11. J.L. Flanagan, Source-System Interaction in the Vocal Tract. Ann. New York Acad. Sci. 155(1), 9–17 (1968)

    Article  Google Scholar 

  12. I.R. Titze, B.H. Story, Acoustic interactions of the voice source with the lower vocal tract. J. Acoust. Soc. Am. 101(4), 2234–2243 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rao, K.S., Narendra, N.P. (2019). Hybrid Approach of Modeling the Source Signal. In: Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis. SpringerBriefs in Speech Technology. Springer, Cham. https://doi.org/10.1007/978-3-030-02759-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02759-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02758-2

  • Online ISBN: 978-3-030-02759-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics