Advertisement

Speech Modeling Using the Complex Cepstrum

  • Martin Vondra
  • Robert Vích
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6456)

Abstract

Conventional cepstral speech modeling is based on the minimum phase parametric speech production model with infinite impulse response. In that approach only the logarithmic magnitude frequency response of the corresponding speech frame is approximated. In this contribution the principle of the cepstral speech modeling using the complex cepstrum is described. The obtained mixed-phase vocal tract model with finite impulse response contains also the information about the phase properties of the modeled speech frame. This model approximates the speech signal with higher accuracy than the model based on the real cepstrum, the numerical complexity and the memory requirements are at least twice greater.

Keywords

Impulse Response Speech Signal Finite Impulse Response Vocal Tract Infinite Impulse Response 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zen, H., Tokuda, K., Black, A.W.: Statistical Parametric Speech Synthesis. Speech Communication 51, 1039–1064 (2009)CrossRefGoogle Scholar
  2. 2.
    Vích, R.: Cepstral Speech Model, Padé Approximation, Excitation and Gain Matching in Cepstral Speech Synthesis. In: Jan, J. (ed.) BIOSIGNAL 2000, pp. 77–82. VUTIUM, Brno (2000)Google Scholar
  3. 3.
    Drugman, T., Moinet, A., Dutoit, T., Wilfart, G.: Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis. In: IEEE ICASSP, Taipei, Taiwan, pp. 3793–3796 (2009)Google Scholar
  4. 4.
    Quatieri, T.F.: Discrete-Time Speech Signal Processing, pp. 253–308. Prentice-Hall, Englewood Cliffs (2002)Google Scholar
  5. 5.
    Drugman, T., Bozkurt, B.T., Dutoit, T.: Complex Cepstrum-based Decomposition of Speech for Glottal Source Estimation. In: Interspeech 2009, Brighton, U.K, pp. 116–119 (2009)Google Scholar
  6. 6.
    Oppenheim, A.V., Schafer, R.W.: Discrete-Time Signal Processing, pp. 768–825. Prentice-Hall, Englewood Cliffs (1989)zbMATHGoogle Scholar
  7. 7.
    Vích, R.: Z-transform Theory and Application, pp. 207–216. D. Reidel Publ. Comp., Dordrecht (1987)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Martin Vondra
    • 1
  • Robert Vích
    • 1
  1. 1.Institute of Photonics and ElectronicsAcademy of Sciences of the Czech RepublicPrague 8Czech Republic

Personalised recommendations