Summary
This paper overviews the main technologies that have recently been developed for making speech recognition systems more robust against acoustic variations. These technologies are reviewed from the viewpoint of a stochastic pattern matching paradigm for speech recognition. Improved robustness enables better speech recognition over a wide range of unexpected and adverse conditions by reducing mismatches between training and testing speech utterances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acero, A. and stem, R. M., “Environmental robustness in automatic speech recognition,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Albuquerque, S 15b. 11, pp. 849–852 (1990)
Bellegarda, J. R., De Sousa, P. V., Nadas, A. J., Nahamoo, D., Picheny, M. A. and Bahl, L. R., “The metamorphic algorithm, a speaker mapping approach to data augmentation,” IEEE Trans. Speech and Audio Processing, Vol. 2, No. 3, pp. 413–420 (1994)
Cox, S. J. and Bridle, J. S., “Unsupervised speaker adaptation by probabilistic fitting,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Glasgow, Scottland, S6.ll, pp. 294–297 (1989)
Cox, S. J., “Predictive speaker adaptation in speech recognition,” Computer Speech and Language, Vol. 9, pp. 1–17 (1995)
Digalakis, V. and Neumeyer, L, L., “Speaker adaptation using combined transformation and Bayesian methods,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Detroit, pp. I–680–683 (1995).
Furui, S., “A training procedure for isolated word recognition systems,” IEEE Trans. Acoust, Speech Signal Processing, Vol. 28, No. 2, pp. 129–136 (1980).
Furui, S., “Research on individuality features in speech waves and automatic speaker recognition techniques,” Speech Communication, Vol. 5, No. 2, pp. 183–197 (1986)
Furui, S., “Unsupervised speaker adaptation method based on hierarchical spectral clustering,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Glasgow, S6. 9, pp. 286–289 (1989)
Furui, S., Digital Speech Processing, Synthesis and Recognition;, Marcel Dekker, New York (1989).
Furui, S., “Speaker-dependent-feature extraction, recognition and processing techniques,” Speech Communication, Vol. 10, Nos. 5–6, pp. 505–520 (1991).
Furui, S., “Speaker-independent and speaker-adaptive recognition techniques,” in Advances in Speech Signal Processing, edited by S. Furui and M. M. Sondhi, pp. 597–622 (1992).
Furui, S., “Toward robust speech recognition under adverse conditions,” Proc. ESCA Workshop on Speech Processing in Adverse Conditions, Cannes-Mandelieu, France, pp. 31–42(1992)
Furui, S., “Flexible speech recognition,” Proc. Eurospeech, Madrid, pp. 1595–1603 (1995)
Furui, S., “Recent advances in robust speech recognition,” Proc. ESCA-NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-a-Mousson, pp. 11–20 (1997).
Gauvain, J.-L. and Lee, C.-H., “Bayesian learning for hidden Markov models with Gaussian mixture state observation densities,” Speech Communication, Vol. 11, Nos. 2–3, pp. 205–214 (1992).
Juang, B. H., “Recent developments in speech recognition under adverse conditions,” Proc. Int. Conf. Spoken Language Processing, Kobe, 25.1, pp. 1113–1116 (1990).
Juang, B.-H., “Speech recognition in adverse environments,” Computer Speech and Language, Vol. 5, pp. 275–294 (1991)
Junqua, J. C. and Anglade, Y., “Acoustic and perceptual studies of Lombard speech: Application to isolated-words automatic speech recognition,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Albuquerque, S15b. 9, pp. 841–844 (1990)
Kato, K. and Furui, S., “Listener adaptability for individual voice in speech perception”, Trans. Committee of Hearing Research, H85-5 (1985).
Lee, C.-H. and Gauvain, J.-L., “Bayesian adaptive learning and MAP estimation of HMM,” in Advanced Topics in Automatic Speech and Speaker Recognition, edited by C.-H. Lee, K. K. Paliwal and F. K. Soong, Kluwer Academic Publishers, pp. 83–107 (1995).
Leggetter, C. J. and Woodland, P. C., “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 9, pp. 171–185 (1995).
Matsui, T. and Furui, S., “N-best-based instantaneous speaker adaptation method for speech recognition,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 973–976 (1996)
Matsui, T., Matsuoka, T. and Furui, S., “Smoothed N-best-based speaker adaptation for speech recognition,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Munich, pp. 1015–1018 (1997).
Matsuoka, T. and Lee, C.-H., “A study of on-line Bayesian adaptation for HMM-based speech recognition,” Proc. Eurospeech, Berlin, pp. 815–818 (1993).
Ohkura, K., Sugiyama, M. and Sagayama, S., “Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs,” Proc. Int. Conf. Spoken Language Processing, Banff, We.fPM. 1. 1, pp. 369–372 (1992)
Sankar, A. and Lee, C.-H, C.-H., “Robust speech recognition based on stochastic matching,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Detroit, pp. I–121–124 (1995)
Sankar, A. and Lee, C.-H., “A maximum-likelihood approach to stochastic matching for robust speech recognition,” IEEE Trans. Speech and Audio Processing, Vol. 4, No. 3, pp. 190–202 (1996).
Schwartz, R., Chow, Y.-L. and Kubala, F., “Rapid speaker adaptation using a probabilistic spectral mapping,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Dallas, 15.3, pp. 633–636 (1987).
Shikano, K., Lee, K.-F. and Reddy, R., “Speaker adaptation through vector quantization,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Tokyo, 49.5, pp. 2643–2646 (1986).
Zavaliagkos, G., Schwartz, R. and Makhoul, J, J., “Batch, incremental and instantaneous adaptation techniques for speech recognition,” Proc. Int. Conf. Acoust., Speech, Signal Processing, Detroit, pp. I–676–679 (1995).
Zhao, Y., “An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition,” IEEE Trans. Speech and Audio Processing, Vol. 2, No. 3, pp. 380394 (1994).
Zhao, Y., “Robust speaker characterization,” Proc. IEEE Automatic Speech Recognition Workshop, Snowbird, pp. 101-102 (1995).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Furui, S. (1999). Robust Speech Recognition. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-60087-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-64250-0
Online ISBN: 978-3-642-60087-6
eBook Packages: Springer Book Archive