Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR

  • M. F. R. Chowdhury
  • S.-A. Selouani
  • D. O’Shaughnessy


Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora 2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode.


On-line environment learning Bayesian on-line inference for spectral change point detection MCRA On-line ASR JAC compensation Non-stationary noise tracking and estimate Minimum search window Frame dynamic DSR Highly non-stationary unknown test conditions Real-world application Smart phones and mobile hand-held devices BOSCPD 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Acero, A. (1993). Acoustical and environmental robustness in automatic speech recognition. Dordrecht: Kluwer Academic. CrossRefGoogle Scholar
  2. Adams, R. P., & Mackay, D. J. C. (2007). Bayesian online changepoint detection. University of Cambridge Technical Report. arXiv:0710.3742v1 [stat.ML].
  3. Afify, M., Gong, Y., & Haton, J.-P. (1998). A general joint additive and convolutive bias compensation approach applied to noisy lombard speech recognition. IEEE Transactions on Speech and Audio Processing, 6(6), 524–538. CrossRefGoogle Scholar
  4. Akbacak, M., & Hansen, J. H. L. (2007). Environmental sniffing: noise knowledge estimation for robust speech systems. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 465–477. CrossRefGoogle Scholar
  5. Barreaud, V., Illina, I., & Fohr, D. (2008). On-line stochastic matching compensation for non-stationary noise. Computer Speech & Language, 22(3), 207–229. CrossRefGoogle Scholar
  6. Berouti, M., Schwartz, M., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proc. IEEE int. conf. acoustics, speech, signal proc (pp. 208–211). Google Scholar
  7. Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2009). A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition. In Proc. of IEEE Canadian conference on electrical and computer engineering (CCECE) (pp. 664–669). Google Scholar
  8. Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011a). Real-time Bayesian inference: a soft computing approach to environmental learning for on-line robust automatic speech recognition. In Advances in intelligent and soft computing: Vol. 87/2011. Proc. of 6th international conference on soft computing models in industrial and environmental applications SOCO 2011 (pp. 445–452). CrossRefGoogle Scholar
  9. Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011b). A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection. In Proc. of INTERSPEECH 2011, Florence, Italy, 28–31 August. Google Scholar
  10. Cohen, I. (2003). Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Processing, 11(5), 466–475. CrossRefGoogle Scholar
  11. Cohen, I., & Berdugo, B. (2002). Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15. CrossRefGoogle Scholar
  12. Cohen, I., Benesty, J., & Gannot, S. (Eds.) (2010). Springer topics in signal processing: Vol. 3. Speech processing in modern communication: challenges and perspectives (1st edn.). Berlin: Springer. MATHGoogle Scholar
  13. ETSI (2000). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm, ETSI ES 201 108, v1.1.1 (2000-02). Google Scholar
  14. Fan, N., Rosca, J., & Balan, R. (2007). Speech noise estimation using enhanced minima controlled recursive averaging. In Proc. IEEE int. conf. acoustics, speech, signal proc. (Vol. 4, pp. 581–584). Google Scholar
  15. Gales, M. J. L. (1995). Model-based techniques for noise robust speech recognition. Ph.D. Thesis, University of Cambridge, UK. Google Scholar
  16. Hirsch, H., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 153–156). Google Scholar
  17. Hirsch, H.-G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of ISCA ITRW ASR2000 automatic speech recognition: challenges for the next millennium (pp. 181–188). Google Scholar
  18. Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm, and system development. New York: Prentice Hall. Google Scholar
  19. ITU-T Recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels, ITU-T, November 1996. Google Scholar
  20. Lawrence, C., & Rahim, M. (1999). Integrated bias removal techniques for robust speech recognition. Computer Speech & Language, 13, 283–298. CrossRefGoogle Scholar
  21. Leonard, R. G. (1984). A database for speaker-independent digit recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 328–331). Google Scholar
  22. Li, J., Deng, L., Yu, D., Gong, Y., & Acero, A. (2009). A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Computer Speech & Language, 23, 389–405. CrossRefGoogle Scholar
  23. Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press. Google Scholar
  24. Nair, N. U., & Sreenivas, T. V. (2010). Joint evaluation of multiple speech patterns for speech recognition and training. Computer Speech & Language, 24, 307–340. CrossRefGoogle Scholar
  25. O’Shaughnessy, D. (1999). Speech communications: human and machine (2nd edn.). New York: Wiley-IEEE Press. Google Scholar
  26. Menéndez-Pidal, X., Chen, R., Wu, D., & Tanaka, M. (2001). Compensation of channel and noise distortions combining normalization and speech enhancement techniques. Speech Communication, 34, 115–126. MATHCrossRefGoogle Scholar
  27. Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. New York: Prentice Hall. Google Scholar
  28. Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. Ph.D Thesis, University of Texas at Dallas, USA. Google Scholar
  29. Rangachari, S., & Loizou, P. C. (2006). A noise estimation algorithm for highly nonstationary environments. Speech Communication, 48, 220–231. CrossRefGoogle Scholar
  30. Turner, R. (2010). Bayesian change point detection for satellite fault prediction. In Proceedings of interdisciplinary graduate conference (IGC), Cambridge, UK (pp. 213–221). Google Scholar
  31. Tian, B., Sun, M., Sclabassi, R. J., & Yi, K. (2003). A unified compensation approach for speech recognition in severely adverse environment. In Fourth international symposium on uncertainty modeling and analysis (ISUMA 2003) (pp. 256–261). CrossRefGoogle Scholar
  32. Young, S. (2007). ATK real-time API for HTK, ver. 1.6. Cambridge: Cambridge University Engineering Department. Google Scholar
  33. Young, S. (2009). HTK BOOK ver 3.4. Cambridge: Machine Intelligence Laboratory, University of Cambridge. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • M. F. R. Chowdhury
    • 1
  • S.-A. Selouani
    • 2
  • D. O’Shaughnessy
    • 1
  1. 1.INRS-EMTUniversité du QuébecMontréalCanada
  2. 2.Université de MonctonMonctonCanada

Personalised recommendations