An Evaluation Study on Speech Feature Densities for Bayesian Estimation in Robust ASR
Bayesian estimators, especially the Minimum Mean Square Error (MMSE) and the Maximum A Posteriori (MAP), are very popular in estimating the clean speech STFT coefficients. Recently, a similar trend has been successfully applied to speech feature enhancement for robust Automatic Speech/Speaker Recognition (ASR) applications either in the Mel, log-Mel or in the cepstral domain. It is a matter of fact that the goodness of the estimate directly depends on the assumptions made about the noise and speech coefficients densities. Nevertheless, while this latter has been exhaustively studied in the case of STFT coefficients, not equivalent attention has been paid to the case of speech features. In this paper, we study the distribution of Mel, log-Mel as well as MFCC coefficients obtained from speech segments. The histograms of the speech features are first fitted into several pdf models by means of the Chi-Square Goodness-of-Fit test, then they are modeled using a Gaussian Mixture Model (GMM). Performed computer simulations show that the choice of log-Mel and MFCC coefficients is more convenient w.r.t. the Mel one from this perspective.
KeywordsSpeech Feature Densities Estimation Speech Enhancement Automatic Speech Recognition
Unable to display preview. Download preview PDF.
- 6.Rotili, R., Principi, E., Cifani, S., Squartini, S., Piazza, F.: Robust speech recognition using MAP based noise suppression rules in the feature domain. In: Proc. of the 19th Czech-German Workshop on Speech Processing, Prague, Czech Republic, pp. 35–41 (September 2009)Google Scholar
- 8.Li Deng, J., Droppo, J., Acero, A.: Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features. IEEE Trans. on Speech & Audio Proc. 12(3) (2004)Google Scholar
- 9.Breithaupt, C., Martin, R.: MMSE estimation of magnitude-squared DFT coefficients with SuperGaussian priors. In: Proc. IEEE ICASSP 2003, vol. I, pp. 896–899 (2003)Google Scholar
- 12.Andrianakis, Y., White, P.R.: Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors. Speech Communication (51), 1–14 (2009)Google Scholar
- 15.Chen, B., Loizou, P.C.: A Laplacian-based MMSE estimator for speech enhancement. Speech Communication (49), 134–143 (2007)Google Scholar
- 16.Dat, T.H., Takeda, K., Itakura, F.: Generalized Gamma modeling of speech and its online estimation for speech enhancement. In: Proc. of ICASSP 2005, pp. 181–184 (2005)Google Scholar
- 19.Gazor, S., Zhang, W.: Speech Probability Distribution. IEEE Signal Processing Letters 10(7) (July 2003)Google Scholar
- 20.Jensen, J., Batina, I., Hendriks, R.C., Heusdens, R.: A study of the distribution of time-domain speech samples and discrete Fourier coefficients. In: Proc. of IEEE SPS-DARTS, pp. 155–158 (2005)Google Scholar
- 23.Cohen, I.: Noise estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Proc., 466–475 (September 2003)Google Scholar