Robust Arabic Multi-stream Speech Recognition System in Noisy Environment

  • Anissa Imen Amrous
  • Mohamed Debyeche
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7340)

Abstract

In this paper, the framework of multi-stream combination has been explored to improve the noise robustness of automatic speech recognition systems. The main important issues of multi-stream systems are which features representation to combine and what importance (weights) be given to each one. Two stream features have been investigated, namely the MFCC features and a set of complementary features which consists of pitch frequency, energy and the first three formants. Empiric optimum weights are fixed for each stream. The multi-stream vectors are modeled by Hidden Markov Models (HMMs) with Gaussian Mixture Models (GMMs) state distributions. Our ASR is implemented using HTK toolkit and ARADIGIT corpus which is data base of Arabic spoken words. The obtained results show that for highly noisy speech, the proposed multi-stream vectors leads to a significant improvement in recognition accuracy.

Keywords

Multi-stream speech recognition HMM noisy environments 

References

  1. 1.
    Janin, A., Ellis, D., Morgan, N.: Multi-stream speech recognition: ready for prime time. In: Proc. of Eurospeech, Budapest (1999)Google Scholar
  2. 2.
    Guo, H., Chen, Q., Huang, D., Zhao, X.: A Multi-stream Speech Recognition System Based on The Estimation of Stream Weights. In: Proc. ICISP, pp. 3479 – 3482 (2010)Google Scholar
  3. 3.
    Sanchez-soto, E., Potaminos, A., Daoudi, K.: Unsupervised stream weights computation in classification and recognition Tasks. IEEE Trans. Audio, Speech and Language Processing 17(3), 436–445 (2009)CrossRefGoogle Scholar
  4. 4.
    Potamianos, A., Sánchez-Soto, E., Daoudi, K.: Stream weight computation for multi-stream classifiers. In: Proc. ICASSP, pp. 353–356 (2006)Google Scholar
  5. 5.
    Li, X., Tao, J., Johanson, M.T., Soltis, Savage, J.: Stress and emotion classification using jitter and shimmer features. In: Proc. ICASSP, vol. 4, pp. IV-1081–IV-1084(2007)Google Scholar
  6. 6.
    Holmes, J.N., Holmes, W.J.: Using formant frequencies in speech recognition. In: Proc. Eurospeech, Rhodes, pp. 2083–2086 (1997)Google Scholar
  7. 7.
    Selouani, S.A., Tolba, H.: Distinctive features, formants and cepstral coefficients to improve automatic speech recognition. In: Proc. IASTED, pp. 530–535 (2002)Google Scholar
  8. 8.
    Selouani, S.A., Tolba, H., O’Shaughnessy, D.: Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm. In: Proc. of ICASSP, pp. 837–840 (2002)Google Scholar
  9. 9.
    Tolba, H., Selouani, S.A., O’Shaughnessy, D.: Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low snr car environments. In: Proc. of Eurospeech, pp. 3085–3088 (2003)Google Scholar
  10. 10.
    Chongjia, N.I., Wenju, L., Xu, B.: Improved Large Vocabulary Mandarin Speech Recognition Using Prosodic and Lexical Information in Maximum Entropy Framework. In: Proc. CCPR 2009, pp. 1–4 (2009)Google Scholar
  11. 11.
    Ma, B., Zhu, D., Tong, R.: Chinese Dialect Identification Using Tone Features Based on Pitch Flux. In: Proc. ICASSP, p. I (2006)Google Scholar
  12. 12.
    Gurbuz, S., Tufekci, Z., Patterson, E., Gowdy, John, N.: Multi-stream product modal audio-visual integration strategy for robust adaptive speech recognition. In: Proc. ICASSP, pp. II-2021–II-2024 (2002)Google Scholar
  13. 13.
    Guoyun, L.V., Dongmei, J., Rongchun, Z., Yunshu, H.: Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition. In: Proc. ISM, pp. 37–44 (2007)Google Scholar
  14. 14.
    Addou, D., Selouani, S.A., Boudraa, M., Boudraa, B.: Transform-based multi-feature optimization for robust distributed speech recognition. In: Proc. GCC, pp. 505– 508 (2011)Google Scholar
  15. 15.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Proc. IEEE Trans. ASSP 28, 357–366 (1980)CrossRefGoogle Scholar
  16. 16.
    Mary, L., Yegnanarayana, B.: Extraction and representation of prosodic features for language and speaker recognition. Proc. Speech Communication 50, 782–796 (2008)CrossRefGoogle Scholar
  17. 17.
    Doss, M.: Using auxiliary sources of knowledge for automatic speech recognition. Ph.D Theses; École Polytechnique Fédérale de Lausane (2005)Google Scholar
  18. 18.
    Ververidis, D., Kotropoulos, C.: Emotional speech recognition: resources, features, and methods. Proc. Speech Communication 48, 1162–1181 (2006)CrossRefGoogle Scholar
  19. 19.
    Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America 87, 1738–1752 (1990)CrossRefGoogle Scholar
  20. 20.
    Slifka, J., Anderson, T.R.: Speaker modification with lpc pole analysis. In: Proc. of ICASSP, pp. 644–647 (1995)Google Scholar
  21. 21.
    Rabiner, L.R.: On the Use of Autocorrelation Analysis for Pitch Detection. IEEE Transaction on Acoustics, Speech, and Signal Processing 25, 1 (1977)CrossRefGoogle Scholar
  22. 22.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllable word recognition in continuously spoken sentences. IEEE Trans. on Speech and Audio Processing 28(4), 357–366 (1980)Google Scholar
  23. 23.
    Young, S., Odell, J., et al.: The HTK Book Version 3.3. Speech group, Engineering Department. Cambridge University Press (2005)Google Scholar
  24. 24.
    Amrouche, A.: Reconnaissance automatique de la parole par les modèles connexionnistes. Ph.D Theses, Faculty of Electronics and Computer Sciences, USTHB (2007)Google Scholar
  25. 25.
    Boersma, P., Weenink, D.: Praat: doing phonetics by computer (2008), http://www.praat.org/
  26. 26.
    Varga, A.P., Steeneken, H.J.M., et al.: The NOISEX-92 study on the effect of additive noise on automatic speech recognition. In: NOISEX 1992 CDROM (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Anissa Imen Amrous
    • 1
  • Mohamed Debyeche
    • 1
  1. 1.Speech Communication and Signal Processing Laboratory (LPCTS), Faculty of Electronics and Computer SciencesUSTHBBab EzzouarAlgeria

Personalised recommendations