Bayesian on-line spectral change point detection: a soft computing approach for on-line ASR
Current automatic speech recognition (ASR) works in off-line mode and needs prior knowledge of the stationary or quasi-stationary test conditions for expected word recognition accuracy. These requirements limit the application of ASR for real-world applications where test conditions are highly non-stationary and are not known a priori. This paper presents an innovative frame dynamic rapid adaptation and noise compensation technique for tracking highly non-stationary noises and its application for on-line ASR. The proposed algorithm is based on a soft computing model using Bayesian on-line inference for spectral change point detection (BOSCPD) in unknown non-stationary noises. BOSCPD is tested with the MCRA noise tracking technique for on-line rapid environmental change learning in different non-stationary noise scenarios. The test results show that the proposed BOSCPD technique reduces the delay in spectral change point detection significantly compared to the baseline MCRA and its derivatives. The proposed BOSCPD soft computing model is tested for joint additive and channel distortions compensation (JAC)-based on-line ASR in unknown test conditions using non-stationary noisy speech samples from the Aurora 2 speech database. The simulation results for the on-line AR show significant improvement in recognition accuracy compared to the baseline Aurora 2 distributed speech recognition (DSR) in batch-mode.
KeywordsOn-line environment learning Bayesian on-line inference for spectral change point detection MCRA On-line ASR JAC compensation Non-stationary noise tracking and estimate Minimum search window Frame dynamic DSR Highly non-stationary unknown test conditions Real-world application Smart phones and mobile hand-held devices BOSCPD
Unable to display preview. Download preview PDF.
- Adams, R. P., & Mackay, D. J. C. (2007). Bayesian online changepoint detection. University of Cambridge Technical Report. arXiv:0710.3742v1 [stat.ML].
- Berouti, M., Schwartz, M., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Proc. IEEE int. conf. acoustics, speech, signal proc (pp. 208–211). Google Scholar
- Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2009). A study on bias-based speech signal conditioning techniques for improving the robustness of automatic speech recognition. In Proc. of IEEE Canadian conference on electrical and computer engineering (CCECE) (pp. 664–669). Google Scholar
- Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011a). Real-time Bayesian inference: a soft computing approach to environmental learning for on-line robust automatic speech recognition. In Advances in intelligent and soft computing: Vol. 87/2011. Proc. of 6th international conference on soft computing models in industrial and environmental applications SOCO 2011 (pp. 445–452). CrossRefGoogle Scholar
- Chowdhury, M. F. R., Selouani, S.-A., & O’Shaughnessy, D. (2011b). A rapid adaptation algorithm for tracking highly non-stationary noises based on Bayesian inference for on-line spectral change point detection. In Proc. of INTERSPEECH 2011, Florence, Italy, 28–31 August. Google Scholar
- ETSI (2000). Speech processing, transmission and quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithm, ETSI ES 201 108, v1.1.1 (2000-02). Google Scholar
- Fan, N., Rosca, J., & Balan, R. (2007). Speech noise estimation using enhanced minima controlled recursive averaging. In Proc. IEEE int. conf. acoustics, speech, signal proc. (Vol. 4, pp. 581–584). Google Scholar
- Gales, M. J. L. (1995). Model-based techniques for noise robust speech recognition. Ph.D. Thesis, University of Cambridge, UK. Google Scholar
- Hirsch, H., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 153–156). Google Scholar
- Hirsch, H.-G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of ISCA ITRW ASR2000 automatic speech recognition: challenges for the next millennium (pp. 181–188). Google Scholar
- Huang, X., Acero, A., & Hon, H. W. (2001). Spoken language processing—a guide to theory, algorithm, and system development. New York: Prentice Hall. Google Scholar
- ITU-T Recommendation G.712 (1996). Transmission performance characteristics of pulse code modulation channels, ITU-T, November 1996. Google Scholar
- Leonard, R. G. (1984). A database for speaker-independent digit recognition. In Proc. IEEE int. conf. acoustics, speech, signal proc. (pp. 328–331). Google Scholar
- Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press. Google Scholar
- O’Shaughnessy, D. (1999). Speech communications: human and machine (2nd edn.). New York: Wiley-IEEE Press. Google Scholar
- Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. New York: Prentice Hall. Google Scholar
- Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. Ph.D Thesis, University of Texas at Dallas, USA. Google Scholar
- Turner, R. (2010). Bayesian change point detection for satellite fault prediction. In Proceedings of interdisciplinary graduate conference (IGC), Cambridge, UK (pp. 213–221). Google Scholar
- Young, S. (2007). ATK real-time API for HTK, ver. 1.6. Cambridge: Cambridge University Engineering Department. Google Scholar
- Young, S. (2009). HTK BOOK ver 3.4. Cambridge: Machine Intelligence Laboratory, University of Cambridge. Google Scholar