Skip to main content

Spectral Dynamics for Speech Recognition Under Adverse Conditions

  • Chapter

Part of the book series: The Kluwer International Series in Engineering and Computer Science ((SECS,volume 355))

Abstract

Significant improvements in automatic speech recognition performance have been obtained through front-end feature representations which exploit the time varying properties of speech spectra. Various techniques have been developed to incorporate “spectral dynamics” into the speech representation, including temporal derivative features, spectral mean normalization and, more generally, spectral parameter filtering. This chapter describes the implementation and interrelationships of these techniques and illustrates their use in automatic speech recognition under different types of adverse conditions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Picone, “Signal modeling techniques in speech recognition,” Proc. IEEE, vol. 81, pp. 1215–1247, Sept. 1993.

    Article  Google Scholar 

  2. W. V. Summers, D. B. Pisoni, R. H. Bernacki, R. I. Pedlow, and M. A. Stokes, “Effects of noise on speech production: Acoustic and perceptual analyses,” JASA, vol. 84, pp. 917–928, 1988.

    Google Scholar 

  3. J. Hansen, Analysis and compensation of stressed and noisy speech with application to robust automatic recognition. PhD. thesis, Georgia Institute of Technology, 1988.

    Google Scholar 

  4. J.-C. Junqua, “The Lombard reflex and its role on human listeners and automatic speech recognizers,” JASA, pp. 510–524, 1993.

    Google Scholar 

  5. J. Pickett, “Effects of vocal force on the intelligibility of speech sounds,” JASA, vol. 28, pp. 902–905, 1956.

    Google Scholar 

  6. J. Dreher and J. O’Neill, “Effects of ambient noise on speaker intelligibility for words and phrases,” JASA, vol. 29, pp. 1320–1323, 1957.

    Google Scholar 

  7. F. Soong and M. M. Sondhi, “A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise,” IEEE Trans. ASSP, vol. 36, no. 1, pp. 41–48, 1988.

    Article  Google Scholar 

  8. D. Mansour and B.-H. Juang, “A family of distortion measures based upon projection operation for robust speech recognition,” IEEE Trans. ASSP, vol. 37, no. 11, pp. 1659–1671, 1989.

    Article  Google Scholar 

  9. A. Acero, Acoustical and Environmental Robustness in Automatic Speech Recognition. PhD thesis, Carnegie Mellon University, 1990.

    Google Scholar 

  10. F.-H. Liu, R. Stern, A. Acero, and P. J. Moreno, “Environment normalization for robust speech recognition using direct cepstral comparison,” Proc. ICASSP, vol. II, pp. 61–64, 1994.

    Google Scholar 

  11. J. Smolders, T. Clase, G. Sablon, and D. Van Compernolle, “On the importance of the microphone position for speech recognition in the car,” Proc. ICASSP, vol. I, pp. 429–432, 1994.

    Google Scholar 

  12. J. Chang and V. Zue, “A study of speech recognition system robustness to microphone variations: Experiments in phonetic classification,” Proc. ICSLP, vol. 3, pp. 995–998, 1994.

    Google Scholar 

  13. H. Van Hamme, G. Gallopyn, L. Weynants, B. D’hoore, and H. Bourlard, “Comparison of acoustic features and robustness tests of a real-time recognizer using hardware telephone line simulator,” Proc. ICSLP, pp. 1907–1910, 1994.

    Google Scholar 

  14. H. Hermansky and N. Morgan, “RASTA processing of speech,” IEEE Trans. Speech and Audio Processing, vol. 2, pp. 578–589, 1994.

    Article  Google Scholar 

  15. Y. Zhao, “Iterative self-learning speaker and channel adaptation under various initial conditions,” Proc. ICASSP, vol. 1, pp. 712–715, 1995.

    Google Scholar 

  16. A. Sankar and C.-H. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition,” accepted for publication in IEEE Trans. Speech and Audio Processing.

    Google Scholar 

  17. Y. Gong, “Speech recognition in noisy environments: A survey,” Speech Communication, vol. 16, pp. 261–291, April 1995.

    Article  Google Scholar 

  18. S. Purui, “Toward robust speech recognition under adverse conditions,” Proc. ESCA Workshop on Speech Processing in Adverse Conditions, pp. 31–42, Nov. 1992.

    Google Scholar 

  19. B.-H. Juang, “Speech recognition in adverse environments,” Computer Speech and Language, vol. 5, pp. 275–294, 1991.

    Article  Google Scholar 

  20. S. Purui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Trans. ASSP, vol. 34, pp. 52–59, 1986.

    Article  Google Scholar 

  21. F. K. Soong and A. E. Rosenberg, “On the use of instantaneous and transitional spectral information in speaker recognition,” Proc. ICASSP, pp. 877–880, 1986.

    Google Scholar 

  22. S. Purui, “Speaker-independent isolated word recognition based on emphasized spectral dynamics,” Proc. ICASSP, pp. 1991–1994, 1986.

    Google Scholar 

  23. S. Furui, “On the use of hierarchical spectral dynamics in speech recognition,” Proc. ICASSP, pp. 789–792, 1990.

    Google Scholar 

  24. B. A. Hanson and T. H. Applebaum, “Robust speaker-independent word recognition using static, dynamic and acceleration features: Experiments with Lombard and noisy speech,” Proc. ICASSP, pp. 857–860, 1990.

    Google Scholar 

  25. H. Ney, “Experiments on mixture-density phoneme-modelling for the speaker-independent 1000-word speech recognition task,” Proc. ICASSP, pp. 713–716, 1990.

    Google Scholar 

  26. H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, “Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP),” Proc. EUROSPEECH, pp. 1367–1370, 1991.

    Google Scholar 

  27. H. G. Hirsch, P. Meyer, and H. W. Ruehl, “Improved speech recognition using high-pass filtering of subband envelopes,” Proc. EUROSPEECH, pp. 413–416, 1991.

    Google Scholar 

  28. T. Kitamura, E. Hayahara, and Y. Simazaki, “Speaker-independent word recognition in noisy environments using dynamic and averaged spectral features based on a two-dimensional mel-cepstrum,” Proc. ICSLP, pp. 1129–1132, 1990.

    Google Scholar 

  29. K. Aikawa, H. Singer, H. Kawahara, and Y. Tohkura, “A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition,” Proc. ICASSP, vol. II, pp. 668–671, 1993.

    Google Scholar 

  30. B. P. Milner and S. V. Vaseghi, “Speech modeling using cepstral-time feature vectors,” Proc. ICASSP, vol. 1, pp. 601–604, 1994.

    Google Scholar 

  31. H.-F. Pai and H.-C. Wang, “A study of the two-dimensional cepstrum approach for speech recognition,” Computer Speech and Language, vol. 6, pp. 361–375, 1992.

    Article  Google Scholar 

  32. S. Fund, “On the role of spectral transition for speech perception,” JASA, pp. 1016–1025, 1986.

    Google Scholar 

  33. J. D. Markel and A. H. Gray Jr., Linear Prediction of Speech. Springer-Verlag, 1976.

    MATH  Google Scholar 

  34. H. Hermansky, B. Hanson, and H. Wakita, “Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain,” Speech Communication, vol. 4, pp. 181–187, 1985.

    Article  Google Scholar 

  35. H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” JASA, vol. 87, no. 4, pp. 1738–1752, 1990.

    Google Scholar 

  36. S. B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. ASSP, vol. 28, pp. 357–366, Aug. 1980.

    Article  Google Scholar 

  37. B. A. Hanson and H. Wakita, “Spectral slope distance measures with linear prediction analysis for word recognition in noise,” IEEE Trans. ASSP, vol. 35, pp. 968–973, 1987.

    Article  Google Scholar 

  38. T. H. Applebaum and B. A. Hanson, “Perceptually-based dynamic spectrograms,” in Visual Representations of Speech Signals, edited by M. Cooke, S. Beet, and M. Crawford, ch. 11, pp. 153–160, Wiley, 1993.

    Google Scholar 

  39. K. Elenius and M. Blomberg, “Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system,” Proc. ICASSP, pp. 535–538, 1982.

    Google Scholar 

  40. V. N. Gupta, M. Lennig, and P. Mermelstein, “Integration of acoustic information in a large vocabulary word recognizer,” Proc. ICASSP, pp. 697–700, 1987.

    Google Scholar 

  41. K.-F. Lee, Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System. PhD thesis, Comp. Sci. Dept., Carnegie Mellon University, 1988.

    Google Scholar 

  42. K. Shikano, “Evaluation of LPC spectral matching measures for phonetic unit recognition,” CMU-CS-86–108, Comp. Sci. Dept., Carnegie Mellon University, 1986.

    Google Scholar 

  43. T. H. Applebaum and B. A. Hanson, “Robust speaker-independent word recognition using spectral smoothing and temporal derivatives,” Signal Processing V — Proc. EUSIPCO, pp. 1183–1186, Elsevier Science, 1990.

    Google Scholar 

  44. X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld, “The SPHINX-II speech recognition system: An overview,” Computer Speech and Language, vol. 2, pp. 137–148, 1993.

    Article  Google Scholar 

  45. N. R. Draper and H. Smith, Applied Regression Analysis. New York: Wiley, 1981.

    MATH  Google Scholar 

  46. T. H. Applebaum and B. A. Hanson, “Features for speaker-independent recognition of noisy and Lombard speech,” JASA Suppl. 1, vol. 88, Fall 1990. Reprinted in J. of Amer. Voice I/O Soc, vol. 14, pp. 73–80, 1993.

    Google Scholar 

  47. C.-H. Lee, E. Giachin, L. R. Rabiner, R. Pieraccini, and A. E. Rosenberg, “Improved acoustic modeling for continuous speech recognition,” Proc. DARPA Workshop on Speech Recognition, pp. 319–326, DARPA, 1990.

    Google Scholar 

  48. J. G. Wilpon, C.-H. Lee, and L. R. Rabiner, “Connected digit recognition based on improved acoustic resolution,” Computer Speech and Language, vol. 7, pp. 15–26, 1993.

    Article  Google Scholar 

  49. T. H. Applebaum and B. A. Hanson, “Tradeoffs in the design of regression features for word recognition,” Proc. EUROSPEECH, pp. 1203–1206, 1991.

    Google Scholar 

  50. B. A. Hanson and T. H. Applebaum, “Features for noise-robust speaker-independent word recognition,” Proc. ICSLP, pp. 1117–1120, 1990.

    Google Scholar 

  51. A. Acero and R. M. Stern, “Robust speech recognition by normalization of the acoustic space,” Proc. ICASSP, pp. 893–896, 1991.

    Google Scholar 

  52. Y. Ephraim, D. Malah, and B.-H. Juang, “On the application of hidden Markov models for enhancing noisy speech,” IEEE Trans. ASSP, vol. 37, pp. 1846–1856, 1989.

    Article  Google Scholar 

  53. V. L. Beattie and S. J. Young, “Noisy speech recognition using hidden Markov model state based filtering,” Proc. ICASSP, pp. 917–920, 1991.

    Google Scholar 

  54. B. S. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” JASA, vol. 55, pp. 1304–1312, 1974.

    Google Scholar 

  55. S. Fund, “Comparison of speaker recognition methods using statistical features and dynamic features,” IEEE Trans. ASSP, vol. 29, pp. 342–350, 1981.

    Article  Google Scholar 

  56. D. Geller, R. Haeb-Urabach, and H. Ney, “Improvements in speech recognition for voice dialing in the car environment,” Proc. ESCA Workshop on Speech Processing in Adverse Conditions, pp. 203–206, Nov. 1992.

    Google Scholar 

  57. R. Schwartz, T. Anastasakos, F. Kubala, J. Makhoul, L. Nguyen, and G. Zavaliagkos, “Comparitive experiments on large vocabulary speech recongition,” Proc. ARPA Workshop on Human Language Tech., March 1993.

    Google Scholar 

  58. B. A. Hanson and T. H. Applebaum, “Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech,” Proc. ICASSP, vol. II, pp. 79–82, 1993.

    Google Scholar 

  59. A. E. Rosenberg, C.-H. Lee, and F. K. Soong, “Cepstral channel normalization techniques for HMM-based speaker verification,” Proc. ICSLP, vol. 4, pp. 1835–1838, 1994.

    Google Scholar 

  60. T. Houtgast, H. J. M. Steeneken, and R. Plomp, “Predicting speech intelligibility in rooms from the modulation transfer function: I. General room acoustics,” Acustica, no. 46, pp. 60–72, 1980.

    Google Scholar 

  61. H. G. Hirsch and A. Corsten, “A new method to improve speech recognition in a noisy environment,” Signal Processing VProc. EUSIPCO, pp. 1187–1190, Elsevier Science, 1990.

    Google Scholar 

  62. H. Murveit, J. Butzburger, and M. Weintraub, “Reduced channel dependence for speech recognition,” Proc. DARPA Speech and Natural Language Workshop, pp. 280–284, Feb. 1992.

    Google Scholar 

  63. J. Smolders and D. V. Compernolle, “In search for the relevant parameters for speaker independent speech recognition,” Proc. ICASSP, vol. II, pp. 684–687, 1993.

    Google Scholar 

  64. S. F. Boll, “Supression of acoustic noise in speech using spectral subtraction,” IEEE Trans. ASSP, vol. 27, pp. 113–120, 1979.

    Article  Google Scholar 

  65. B. H. Juang and L. R. Rabiner, “Signal restoration by spectral mapping,” Proc. ICASSP, pp. 2368–2371, 1987.

    Google Scholar 

  66. M. J. F. Gales and S. J. Young, “Parallel model combination for speech recognition in additive and convolutional noise,” CUED/FINFENG/TR154, Cambridge U. Engineering Dept., Dec. 1993.

    Google Scholar 

  67. D. Dubois, “Comparison of time-dependant acoustic features for a speaker-independent speech recognition system,” Proc. EUROSPEECH, pp. 935–938, 1991.

    Google Scholar 

  68. J.-C. Junqua, S. Valente, D. Fohr, and J.-F. Mari, “An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone,” Proc. ICASSP, vol. 1, pp. 852–855, 1995.

    Google Scholar 

  69. R. A. Cole, K. Roginski, and M. Fanty, “English alphabet recognition with telephone speech,” Proc. EUROSPEECH, pp. 479–482, 1991.

    Google Scholar 

  70. C. Nadeu and B.-H. Juang, “Filtering of spectral parameters for speech recognition,” Proc. ICSLP, pp. 1927–1930, 1994.

    Google Scholar 

  71. B. E. P. Lindblom and M. Studdert-Kennedy, “On the role of formant transitions in vowel recognition,” JASA, vol. 42, pp. 830–843, 1967.

    Google Scholar 

  72. M. J. Hunt and C. Lefèbvre, “A comparison of several acoustic representations for speech recognition with degraded and undegraded speech,” Proc. ICASSP, pp. 262–265, 1989.

    Google Scholar 

  73. S. Furui, “Feature analysis based on articulatory and perceptual models,” Proc. IEEE Workshop on Automatic Speech Recognition, pp. 63–64, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Hanson, B.A., Applebaum, T.H., Junqua, JC. (1996). Spectral Dynamics for Speech Recognition Under Adverse Conditions. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_14

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics