Skip to main content

Part of the book series: NATO ASI Series ((NATO ASI F,volume 169))

  • 227 Accesses

Summary

This paper overviews the main technologies that have recently been developed for making speech recognition systems more robust against acoustic variations. These technologies are reviewed from the viewpoint of a stochastic pattern matching paradigm for speech recognition. Improved robustness enables better speech recognition over a wide range of unexpected and adverse conditions by reducing mismatches between training and testing speech utterances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acero, A. and stem, R. M., “Environmental robustness in automatic speech recognition,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Albuquerque, S 15b. 11, pp. 849–852 (1990)

    Google Scholar 

  2. Bellegarda, J. R., De Sousa, P. V., Nadas, A. J., Nahamoo, D., Picheny, M. A. and Bahl, L. R., “The metamorphic algorithm, a speaker mapping approach to data augmentation,” IEEE Trans. Speech and Audio Processing, Vol. 2, No. 3, pp. 413–420 (1994)

    Article  Google Scholar 

  3. Cox, S. J. and Bridle, J. S., “Unsupervised speaker adaptation by probabilistic fitting,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Glasgow, Scottland, S6.ll, pp. 294–297 (1989)

    Google Scholar 

  4. Cox, S. J., “Predictive speaker adaptation in speech recognition,” Computer Speech and Language, Vol. 9, pp. 1–17 (1995)

    Article  Google Scholar 

  5. Digalakis, V. and Neumeyer, L, L., “Speaker adaptation using combined transformation and Bayesian methods,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Detroit, pp. I–680–683 (1995).

    Google Scholar 

  6. Furui, S., “A training procedure for isolated word recognition systems,” IEEE Trans. Acoust, Speech Signal Processing, Vol. 28, No. 2, pp. 129–136 (1980).

    Article  MathSciNet  Google Scholar 

  7. Furui, S., “Research on individuality features in speech waves and automatic speaker recognition techniques,” Speech Communication, Vol. 5, No. 2, pp. 183–197 (1986)

    Article  Google Scholar 

  8. Furui, S., “Unsupervised speaker adaptation method based on hierarchical spectral clustering,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Glasgow, S6. 9, pp. 286–289 (1989)

    Google Scholar 

  9. Furui, S., Digital Speech Processing, Synthesis and Recognition;, Marcel Dekker, New York (1989).

    Google Scholar 

  10. Furui, S., “Speaker-dependent-feature extraction, recognition and processing techniques,” Speech Communication, Vol. 10, Nos. 5–6, pp. 505–520 (1991).

    Article  Google Scholar 

  11. Furui, S., “Speaker-independent and speaker-adaptive recognition techniques,” in Advances in Speech Signal Processing, edited by S. Furui and M. M. Sondhi, pp. 597–622 (1992).

    Google Scholar 

  12. Furui, S., “Toward robust speech recognition under adverse conditions,” Proc. ESCA Workshop on Speech Processing in Adverse Conditions, Cannes-Mandelieu, France, pp. 31–42(1992)

    Google Scholar 

  13. Furui, S., “Flexible speech recognition,” Proc. Eurospeech, Madrid, pp. 1595–1603 (1995)

    Google Scholar 

  14. Furui, S., “Recent advances in robust speech recognition,” Proc. ESCA-NATO Workshop on Robust Speech Recognition for Unknown Communication Channels, Pont-a-Mousson, pp. 11–20 (1997).

    Google Scholar 

  15. Gauvain, J.-L. and Lee, C.-H., “Bayesian learning for hidden Markov models with Gaussian mixture state observation densities,” Speech Communication, Vol. 11, Nos. 2–3, pp. 205–214 (1992).

    Article  Google Scholar 

  16. Juang, B. H., “Recent developments in speech recognition under adverse conditions,” Proc. Int. Conf. Spoken Language Processing, Kobe, 25.1, pp. 1113–1116 (1990).

    Google Scholar 

  17. Juang, B.-H., “Speech recognition in adverse environments,” Computer Speech and Language, Vol. 5, pp. 275–294 (1991)

    Article  Google Scholar 

  18. Junqua, J. C. and Anglade, Y., “Acoustic and perceptual studies of Lombard speech: Application to isolated-words automatic speech recognition,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Albuquerque, S15b. 9, pp. 841–844 (1990)

    Google Scholar 

  19. Kato, K. and Furui, S., “Listener adaptability for individual voice in speech perception”, Trans. Committee of Hearing Research, H85-5 (1985).

    Google Scholar 

  20. Lee, C.-H. and Gauvain, J.-L., “Bayesian adaptive learning and MAP estimation of HMM,” in Advanced Topics in Automatic Speech and Speaker Recognition, edited by C.-H. Lee, K. K. Paliwal and F. K. Soong, Kluwer Academic Publishers, pp. 83–107 (1995).

    Google Scholar 

  21. Leggetter, C. J. and Woodland, P. C., “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, Vol. 9, pp. 171–185 (1995).

    Article  Google Scholar 

  22. Matsui, T. and Furui, S., “N-best-based instantaneous speaker adaptation method for speech recognition,” Proc. Int. Conf. Spoken Language Processing, Philadelphia, pp. 973–976 (1996)

    Google Scholar 

  23. Matsui, T., Matsuoka, T. and Furui, S., “Smoothed N-best-based speaker adaptation for speech recognition,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Munich, pp. 1015–1018 (1997).

    Google Scholar 

  24. Matsuoka, T. and Lee, C.-H., “A study of on-line Bayesian adaptation for HMM-based speech recognition,” Proc. Eurospeech, Berlin, pp. 815–818 (1993).

    Google Scholar 

  25. Ohkura, K., Sugiyama, M. and Sagayama, S., “Speaker adaptation based on transfer vector field smoothing with continuous mixture density HMMs,” Proc. Int. Conf. Spoken Language Processing, Banff, We.fPM. 1. 1, pp. 369–372 (1992)

    Google Scholar 

  26. Sankar, A. and Lee, C.-H, C.-H., “Robust speech recognition based on stochastic matching,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Detroit, pp. I–121–124 (1995)

    Google Scholar 

  27. Sankar, A. and Lee, C.-H., “A maximum-likelihood approach to stochastic matching for robust speech recognition,” IEEE Trans. Speech and Audio Processing, Vol. 4, No. 3, pp. 190–202 (1996).

    Article  Google Scholar 

  28. Schwartz, R., Chow, Y.-L. and Kubala, F., “Rapid speaker adaptation using a probabilistic spectral mapping,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Dallas, 15.3, pp. 633–636 (1987).

    Google Scholar 

  29. Shikano, K., Lee, K.-F. and Reddy, R., “Speaker adaptation through vector quantization,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Tokyo, 49.5, pp. 2643–2646 (1986).

    Google Scholar 

  30. Zavaliagkos, G., Schwartz, R. and Makhoul, J, J., “Batch, incremental and instantaneous adaptation techniques for speech recognition,” Proc. Int. Conf. Acoust., Speech, Signal Processing, Detroit, pp. I–676–679 (1995).

    Google Scholar 

  31. Zhao, Y., “An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition,” IEEE Trans. Speech and Audio Processing, Vol. 2, No. 3, pp. 380394 (1994).

    Article  Google Scholar 

  32. Zhao, Y., “Robust speaker characterization,” Proc. IEEE Automatic Speech Recognition Workshop, Snowbird, pp. 101-102 (1995).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Furui, S. (1999). Robust Speech Recognition. In: Ponting, K. (eds) Computational Models of Speech Pattern Processing. NATO ASI Series, vol 169. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60087-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-60087-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-64250-0

  • Online ISBN: 978-3-642-60087-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics