An Evaluation Study on Speech Feature Densities for Bayesian Estimation in Robust ASR

  • Simone Cifani
  • Emanuele Principi
  • Rudy Rotili
  • Stefano Squartini
  • Francesco Piazza
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6456)


Bayesian estimators, especially the Minimum Mean Square Error (MMSE) and the Maximum A Posteriori (MAP), are very popular in estimating the clean speech STFT coefficients. Recently, a similar trend has been successfully applied to speech feature enhancement for robust Automatic Speech/Speaker Recognition (ASR) applications either in the Mel, log-Mel or in the cepstral domain. It is a matter of fact that the goodness of the estimate directly depends on the assumptions made about the noise and speech coefficients densities. Nevertheless, while this latter has been exhaustively studied in the case of STFT coefficients, not equivalent attention has been paid to the case of speech features. In this paper, we study the distribution of Mel, log-Mel as well as MFCC coefficients obtained from speech segments. The histograms of the speech features are first fitted into several pdf models by means of the Chi-Square Goodness-of-Fit test, then they are modeled using a Gaussian Mixture Model (GMM). Performed computer simulations show that the choice of log-Mel and MFCC coefficients is more convenient w.r.t. the Mel one from this perspective.


Speech Feature Densities Estimation Speech Enhancement Automatic Speech Recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Computer Speech & Language 23(3), 389–405 (2009)CrossRefGoogle Scholar
  2. 2.
    Wang, X., O’Shaughnessy, D.: Environmental Independent ASR Model Adaptation/Compensation by Bayesian Parametric Representation. IEEE Trans. Audio, Speech, and Lang. Process 15(4), 1204–1217 (2007)CrossRefGoogle Scholar
  3. 3.
    Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Process 32(6), 1109–1121 (1984)CrossRefGoogle Scholar
  4. 4.
    Wolfe, P.J., Godsill, S.J.: Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement. EURASIP J. Appl. Signal Process 2003, 1043–1051 (2003)CrossRefzbMATHGoogle Scholar
  5. 5.
    Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., Acero, A.: Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor. IEEE Trans. Audio, Speech, and Lang. Process 16(5), 1061–1070 (2008)CrossRefGoogle Scholar
  6. 6.
    Rotili, R., Principi, E., Cifani, S., Squartini, S., Piazza, F.: Robust speech recognition using MAP based noise suppression rules in the feature domain. In: Proc. of the 19th Czech-German Workshop on Speech Processing, Prague, Czech Republic, pp. 35–41 (September 2009)Google Scholar
  7. 7.
    Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Minimum Mean-Squared Error Estimation of Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model. IEEE Trans. on Audio, Speech & Lang. Proc. 16(8), 1654–1661 (2008)CrossRefGoogle Scholar
  8. 8.
    Li Deng, J., Droppo, J., Acero, A.: Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features. IEEE Trans. on Speech & Audio Proc. 12(3) (2004)Google Scholar
  9. 9.
    Breithaupt, C., Martin, R.: MMSE estimation of magnitude-squared DFT coefficients with SuperGaussian priors. In: Proc. IEEE ICASSP 2003, vol. I, pp. 896–899 (2003)Google Scholar
  10. 10.
    Lotter, T., Vary, P.: Speech Enhancement by MAP Spectral Amplitude Estimation using a Super-Gaussian Speech Model. EURASIP Journal on Applied Signal Processing 7, 1110–1126 (2005)CrossRefzbMATHGoogle Scholar
  11. 11.
    Martin, R.: Speech enhancement based on Minimum Mean-Square Error Estimation and Supergaussian Priors. IEEE Trans. Speech and Audio Process 13(5), 845–856 (2005)CrossRefGoogle Scholar
  12. 12.
    Andrianakis, Y., White, P.R.: Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors. Speech Communication (51), 1–14 (2009)Google Scholar
  13. 13.
    Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J.: Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients with Generalized Gamma Priors. IEEE Trans. Audio, Speech, and Lang. Process 15(6), 1741–1752 (2005)CrossRefGoogle Scholar
  14. 14.
    Hendriks, R.C., Martin, R.: MAP Estimators for Speech Enhancement Under Normal and Rayleigh Inverse Gaussian Distributions. IEEE Trans. Audio, Speech, and Lang. Process 15(3), 918–927 (2007)CrossRefGoogle Scholar
  15. 15.
    Chen, B., Loizou, P.C.: A Laplacian-based MMSE estimator for speech enhancement. Speech Communication (49), 134–143 (2007)Google Scholar
  16. 16.
    Dat, T.H., Takeda, K., Itakura, F.: Generalized Gamma modeling of speech and its online estimation for speech enhancement. In: Proc. of ICASSP 2005, pp. 181–184 (2005)Google Scholar
  17. 17.
    Van Trees, H.L.: Detection, Estimation, and Modulation Theory. Wiley, New York (1968)zbMATHGoogle Scholar
  18. 18.
    McAulay, R.J., Malpass, M.L.: Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust., Speech, Signal Process 28(2), 137–145 (1980)CrossRefGoogle Scholar
  19. 19.
    Gazor, S., Zhang, W.: Speech Probability Distribution. IEEE Signal Processing Letters 10(7) (July 2003)Google Scholar
  20. 20.
    Jensen, J., Batina, I., Hendriks, R.C., Heusdens, R.: A study of the distribution of time-domain speech samples and discrete Fourier coefficients. In: Proc. of IEEE SPS-DARTS, pp. 155–158 (2005)Google Scholar
  21. 21.
    Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Figueredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(3), 381–396 (2002)CrossRefGoogle Scholar
  23. 23.
    Cohen, I.: Noise estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Proc., 466–475 (September 2003)Google Scholar
  24. 24.
    Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error log-spectral amplitude estimator. IEEE Trans. Acoust., Speech, Signal Process 23(2), 443–445 (1985)CrossRefGoogle Scholar
  25. 25.
    Principi, E., Cifani, S., Rotili, R., Squartini, S., Piazza, F.: Comparative Evaluation of Single-Channel MMSE-Based Noise Reduction Schemes for Speech Recognition. Journal of Electrical and Computer Engineering 2010, Article ID 962103, 6pages (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Simone Cifani
    • 1
  • Emanuele Principi
    • 1
  • Rudy Rotili
    • 1
  • Stefano Squartini
    • 1
  • Francesco Piazza
    • 1
  1. 1.3MediaLabs, DIBETUniversità Politecnica delle MarcheAnconaItaly

Personalised recommendations