Abstract
Modeling speech signals in the short-time Fourier transform (STFT) domain is a fundamental problem in designing speech enhancement systems. This chapter introduces a novel modeling approach, which is based on generalized autoregressive conditional heteroscedasticity (GARCH). GARCH is widely-used for volatility modeling of financial time-series such as exchange rates and stock returns. GARCH models take into account the heavy tailed distribution and volatility clustering characteristics of financial time-series. Spectral analysis shows that speech signals in the STFT domain are also characterized by heavy tailed distributions and volatility clustering. We demonstrate the application of GARCH modeling to speech enhancement, and show its advantage compared to using the conventional decision-directed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Y. Ephraim and I. Cohen, “Recent advancements in speech enhancement,” in The Electrical Engineering Handbook, 3rd ed. CRC Press, to be published. [Online]. Available: http://ece.gmu.edu/~yephraim/ephraim.html
Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-32, pp. 1109–1121, Dec. 1984.
—, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-33, pp. 443–445, Apr. 1985.
A. J. Accardi and R. V. Cox, “A modular approach to speech enhancement with an application to speech coding,” in Proc. IEEE ICASSP, 1999, pp. 201–204.
J. Sohn, N. S. Kim, and W. Sung, “A statistical model-based voice activity detector,” IEEE Signal Processing Letters, vol. 6, pp. 1–3, Jan. 1999.
I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal Processing, vol. 81, pp. 2403–2418, Nov. 2001.
T. Lotter, C. Benien, and P. Vary, “Multichannel speech enhancement using bayesian spectral amplitude estimation,” in Proc. IEEE ICASSP, 2003, pp. I_832–I_835.
P. J. Wolfe and S. J. Godsill, “Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement,” special issue of EURASIP JASP on Digital Audio for Multimedia Communications, vol. 2003, pp. 1043–1051, Sept. 2003.
J. Porter and S. Boll, “Optimal estimators for spectral restoration of noisy speech,” in Proc. IEEE ICASSP, 1984, pp. 18A.2.1–18A.2.4.
R. Martin, “Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors,” in Proc. IEEE ICASSP, 2002, pp. I-253–I-256.
S. Gazor and W. Zhang, “Speech probability distribution,” IEEE Signal Processing Letters, vol. 10, pp. 204–207, July 2003.
—, “A soft voice activity detector based on a laplacian-gaussian model,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 498–505, Sept. 2003.
R. Martin and C. Breithaupt, “Speech enhancement in the DFT domain using Laplacian speech priors,” in Proc. IWAENC, 2003, pp. 87–90.
Y. Ephraim and D. Malah, “Signal to noise ratio estimation for enhancing speech using the Viterbi algorithm,” Technion-Israel Institute of Technology, Haifa, Israel, Technical Report, EE PUB 489, Mar. 1984.
O. Cappé, “Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. 2, pp. 345–349, Apr. 1994.
B. H. Juang and L. R. Rabiner, “Mixture autoregressive hidden Markov models for speech signals,” IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-33, pp. 1404–1413, Dec. 1985.
Y. Ephraim and N. Merhav, “Hidden Markov processes,” IEEE Trans. Information Theory, vol. 48, pp. 1518–1568, June 2002.
H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, “HMM-based strategies for enhancement of speech signals embedded in nonstationary noise,” IEEE Trans. Speech and Audio Processing, vol. 6, pp. 445–455, Sept. 1998.
I. Cohen, “Modeling speech signals in the time-frequency domain using GARCH,” Signal Processing, vol. 84, pp. 2453–2459, Dec. 2004.
R. F. Engle, Ed., ARCH Selected Readings. New York: Oxford University Press Inc., 1995.
T. Bollerslev, R. Y. ChouKenneth, and F. Kroner, “ARCH modeling in finance: A review of the theory and empirical evidence,” Journal of Econometrics, vol. 52, pp. 5–59, Apr.–May 1992.
I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 466–475, Sept. 2003.
—, “Relaxed statistical model for speech enhancement and a priori SNR estimation,” to appear in IEEE Trans. Speech and Audio Processing.
J. S. Garofolo, “Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database,” National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, Tech. Rep., (prototype as of Dec. 1988).
A. Stuart and J. K. Ord, Kendall’s Advanced Theory of Statistics. 6th ed. London, UK: Edward Arnold, vol. 1, 1994.
A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of Royal Statistical Society (B), vol. 39, pp. 1–38, 1977.
G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. New York: Wiley, 1997.
E. K. Berndt, B. H. Hall, R. E. Hall, and J. A. Hausman, “Estimation and inference in nonlinear structural models,” Annals of Economic and Social Measurement, vol. 4, pp. 653–665, 1974.
T. Bollerslev, “Generalized autoregressive conditional heteroskedasticity,” Journal of Econometrics, vol. 31, pp. 307–327, Apr. 1986.
I. Cohen, “Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity model,” Technion-Israel Institute of Technology, Haifa, Israel, Technical Report, EE PUB 1425, Apr. 2004.
J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of the IEEE, vol. 67, pp. 1586–1604, Dec. 1979.
M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE ICASSP, 1979, pp. 208–211.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cohen, I. (2005). From Volatility Modeling of Financial Time-Series to Stochastic Modeling and Enhancement of Speech Signals. In: Speech Enhancement. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27489-8_5
Download citation
DOI: https://doi.org/10.1007/3-540-27489-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24039-6
Online ISBN: 978-3-540-27489-6
eBook Packages: EngineeringEngineering (R0)