Skip to main content

From Volatility Modeling of Financial Time-Series to Stochastic Modeling and Enhancement of Speech Signals

  • Chapter
Book cover Speech Enhancement

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

Modeling speech signals in the short-time Fourier transform (STFT) domain is a fundamental problem in designing speech enhancement systems. This chapter introduces a novel modeling approach, which is based on generalized autoregressive conditional heteroscedasticity (GARCH). GARCH is widely-used for volatility modeling of financial time-series such as exchange rates and stock returns. GARCH models take into account the heavy tailed distribution and volatility clustering characteristics of financial time-series. Spectral analysis shows that speech signals in the STFT domain are also characterized by heavy tailed distributions and volatility clustering. We demonstrate the application of GARCH modeling to speech enhancement, and show its advantage compared to using the conventional decision-directed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. Ephraim and I. Cohen, “Recent advancements in speech enhancement,” in The Electrical Engineering Handbook, 3rd ed. CRC Press, to be published. [Online]. Available: http://ece.gmu.edu/~yephraim/ephraim.html

    Google Scholar 

  2. Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-32, pp. 1109–1121, Dec. 1984.

    Article  Google Scholar 

  3. —, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-33, pp. 443–445, Apr. 1985.

    Article  Google Scholar 

  4. A. J. Accardi and R. V. Cox, “A modular approach to speech enhancement with an application to speech coding,” in Proc. IEEE ICASSP, 1999, pp. 201–204.

    Google Scholar 

  5. J. Sohn, N. S. Kim, and W. Sung, “A statistical model-based voice activity detector,” IEEE Signal Processing Letters, vol. 6, pp. 1–3, Jan. 1999.

    Article  Google Scholar 

  6. I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal Processing, vol. 81, pp. 2403–2418, Nov. 2001.

    Article  MATH  Google Scholar 

  7. T. Lotter, C. Benien, and P. Vary, “Multichannel speech enhancement using bayesian spectral amplitude estimation,” in Proc. IEEE ICASSP, 2003, pp. I_832–I_835.

    Google Scholar 

  8. P. J. Wolfe and S. J. Godsill, “Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement,” special issue of EURASIP JASP on Digital Audio for Multimedia Communications, vol. 2003, pp. 1043–1051, Sept. 2003.

    MATH  Google Scholar 

  9. J. Porter and S. Boll, “Optimal estimators for spectral restoration of noisy speech,” in Proc. IEEE ICASSP, 1984, pp. 18A.2.1–18A.2.4.

    Google Scholar 

  10. R. Martin, “Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors,” in Proc. IEEE ICASSP, 2002, pp. I-253–I-256.

    Google Scholar 

  11. S. Gazor and W. Zhang, “Speech probability distribution,” IEEE Signal Processing Letters, vol. 10, pp. 204–207, July 2003.

    Article  Google Scholar 

  12. —, “A soft voice activity detector based on a laplacian-gaussian model,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 498–505, Sept. 2003.

    Article  Google Scholar 

  13. R. Martin and C. Breithaupt, “Speech enhancement in the DFT domain using Laplacian speech priors,” in Proc. IWAENC, 2003, pp. 87–90.

    Google Scholar 

  14. Y. Ephraim and D. Malah, “Signal to noise ratio estimation for enhancing speech using the Viterbi algorithm,” Technion-Israel Institute of Technology, Haifa, Israel, Technical Report, EE PUB 489, Mar. 1984.

    Google Scholar 

  15. O. Cappé, “Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. 2, pp. 345–349, Apr. 1994.

    Google Scholar 

  16. B. H. Juang and L. R. Rabiner, “Mixture autoregressive hidden Markov models for speech signals,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-33, pp. 1404–1413, Dec. 1985.

    Article  Google Scholar 

  17. Y. Ephraim and N. Merhav, “Hidden Markov processes,” IEEE Trans. Information Theory, vol. 48, pp. 1518–1568, June 2002.

    Article  MathSciNet  MATH  Google Scholar 

  18. H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, “HMM-based strategies for enhancement of speech signals embedded in nonstationary noise,” IEEE Trans. Speech and Audio Processing, vol. 6, pp. 445–455, Sept. 1998.

    Article  Google Scholar 

  19. I. Cohen, “Modeling speech signals in the time-frequency domain using GARCH,” Signal Processing, vol. 84, pp. 2453–2459, Dec. 2004.

    Article  Google Scholar 

  20. R. F. Engle, Ed., ARCH Selected Readings. New York: Oxford University Press Inc., 1995.

    Google Scholar 

  21. T. Bollerslev, R. Y. ChouKenneth, and F. Kroner, “ARCH modeling in finance: A review of the theory and empirical evidence,” Journal of Econometrics, vol. 52, pp. 5–59, Apr.–May 1992.

    Article  MATH  Google Scholar 

  22. I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 466–475, Sept. 2003.

    Article  Google Scholar 

  23. —, “Relaxed statistical model for speech enhancement and a priori SNR estimation,” to appear in IEEE Trans. Speech and Audio Processing.

    Google Scholar 

  24. J. S. Garofolo, “Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database,” National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, Tech. Rep., (prototype as of Dec. 1988).

    Google Scholar 

  25. A. Stuart and J. K. Ord, Kendall’s Advanced Theory of Statistics. 6th ed. London, UK: Edward Arnold, vol. 1, 1994.

    Google Scholar 

  26. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of Royal Statistical Society (B), vol. 39, pp. 1–38, 1977.

    MathSciNet  MATH  Google Scholar 

  27. G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. New York: Wiley, 1997.

    MATH  Google Scholar 

  28. E. K. Berndt, B. H. Hall, R. E. Hall, and J. A. Hausman, “Estimation and inference in nonlinear structural models,” Annals of Economic and Social Measurement, vol. 4, pp. 653–665, 1974.

    Google Scholar 

  29. T. Bollerslev, “Generalized autoregressive conditional heteroskedasticity,” Journal of Econometrics, vol. 31, pp. 307–327, Apr. 1986.

    Article  MATH  MathSciNet  Google Scholar 

  30. I. Cohen, “Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity model,” Technion-Israel Institute of Technology, Haifa, Israel, Technical Report, EE PUB 1425, Apr. 2004.

    Google Scholar 

  31. J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of the IEEE, vol. 67, pp. 1586–1604, Dec. 1979.

    Article  Google Scholar 

  32. M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE ICASSP, 1979, pp. 208–211.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cohen, I. (2005). From Volatility Modeling of Financial Time-Series to Stochastic Modeling and Enhancement of Speech Signals. In: Speech Enhancement. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27489-8_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-27489-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24039-6

  • Online ISBN: 978-3-540-27489-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics