Skip to main content

Towards Robust and Adaptive Speech Recognition Models

  • Conference paper
  • 690 Accesses

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 138))

Abstract

In this paper, we discuss a family of new Automatic Speech Recognition (ASR) approaches, which somewhat deviate from the usual ASR approaches but which have recently been shown to be more robust to nonstationary noise, without requiring specific adaptation or “multi-style” training. More specifically, we will motivate and briefly describe new approaches based on multi-stream and subband ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) streams representing the speech signal are processed by different (independent) “experts”, each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen J., “HOW do humans process and recognize speech?,” IEEE Trans, on Speech and Audio Processing, Vol. 2, no. 4, pp. 567–577, 1994.

    Article  Google Scholar 

  2. BEngio S., Bourlard H., and Weber K., “An EM Algorithm for HMMs with Emission Distributions Represented by HMMs,” IDIAP Research Report, IDIAP-RR-00-11, 2000.

    Google Scholar 

  3. Berthommier F. and Glotin H., “A new SNR-feature mapping for robust multistream speech recognition,” Intl. Conf. of Phonetic Sciences (ICPhS’99) (San Francisco), to appear, August 1999.

    Google Scholar 

  4. Bishop C.M., Neural Networks for Pattern Recognition, Clarendon Press (Oxford), 1995.

    Google Scholar 

  5. Bourlard H. and Morgan N., Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Publishers, 1994.

    Google Scholar 

  6. Bourlard H. and Dupont S., “A new ASR approach based on independent processing and combination of partial frequency bands,” Proc. of Intl. Conf. on Spoken Language Processing (Philadelphia), pp. 422–425, October 1996.

    Google Scholar 

  7. DE Vueth J., DE Wet F., Cranen B., and Boves L., “Missing feature theory in ASR: make sure you miss the right type of features,” Proceedings of the ESCA Workshop on Robust Speech Recognition (Tampere, Finland), May 25-26, 1999.

    Google Scholar 

  8. Duda R.O. and Hart P.E., Pattern Classification and Scene Analysis, John Wiley, 1973.

    Google Scholar 

  9. Greenberg S., “On the origins of speech intelligibility in the real world,” Proc. of the ESCA Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 23–32, ESCA, April 1997.

    Google Scholar 

  10. Hagen A., Morris A., and Bourlard H., “Subband-based speech recognition in noisy conditions: The full combination approach,” IDIAP Research Report no. IDIAP-RR-98-15, 1998.

    Google Scholar 

  11. Hagen A., Morris A., and Bourlard H., “Different weighting schemes in the full combination subbands approach for noise robust ASR,” Proceedings of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions (Tampere, Finland), May 25-26, 1999.

    Google Scholar 

  12. Hennebert J., Ris C, Bourlard H., REnals S., and Morgan N. (1997), “Estimation of Global Posteriors and Forward-Backward Training of Hybrid Systems,” Proceedings of EUROSPEECH’97 (Rhodes, Greece, Sep. 1997), pp. 1951–1954.

    Google Scholar 

  13. Hermansky H. and Morgan N., “RASTA processing of speech,” IEEE Trans. on Speech and Audio Processing, Vol. 2, no. 4, pp. 578–589, October 1994.

    Article  Google Scholar 

  14. Hermansky H., Pavel M., and Tribewala S., “Towards ASR using partially corrupted speech,” Proc. of Intl. Conf. on Spoken Language Processing (Philadelphia), pp. 458–461, October 1996.

    Google Scholar 

  15. Hermansky H. and Sharma S., “Temporal patterns (TRAPS) in ASR noisy speech,” Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Phoenix, AZ), pp. 289–292, March 1999.

    Google Scholar 

  16. Houtgast T. and Steeneken H.J.M., “A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria,” J. Acoust. Soc. Am., Vol. 77, no. 3, pp. 1069–1077, March 1985.

    Article  Google Scholar 

  17. Ikbal S., Bourlard H., Bengio S., and Weber K., “IDIAP HMM/HMM2 System: Theoretical Basis and Software Specifications” IDIAP Research Report, IDIAP-RR-01-27, 2001.

    Google Scholar 

  18. Kingsbury B., Morgan N., and Greenberg S., “Robust speech recognition using the modulation spectrogram,” Speech Communication, Vol. 25, nos. 1-3, pp. 117–132, 1998.

    Article  Google Scholar 

  19. Lippmann R.P. and Carlson B.A., “Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise,” Proc. Eurospeech’97 (Rhodes, Greece, September 1997), pp. KN37–40.

    Google Scholar 

  20. Mcgurk H. and Mcdonald J., “Hearing lips and seeing voices,” Nature, no. 264, pp. 746–748, 1976.

    Article  Google Scholar 

  21. Mirghafori N. and Morgan N., “Transmissions and transitions: A study of two common assumptions in multi-band ASR,” Intl. IEEE Conf. on Acoustics, Speech, and Signal Processing (Seattle, WA, May 1997), pp. 713–716.

    Google Scholar 

  22. Morris A.C., Coouke M.P., and Green P.D., “Some solutions to the missing features problem in data classification, with application to noise robust ASR,” Proc. Intl. Conf on Acoustics, Speech, and Signal Processing, pp. 737–740, 1998.

    Google Scholar 

  23. Morris A.C., Hagen A., and Bourlard H., “The full combination subbands approach to noise robust HMM/ANN-based ASR,” Proc. of Eurospeech’99 (Budapest, Sep. 99), to appear.

    Google Scholar 

  24. Moore B.C.J., An Introduction to the Psychology of Hearing (4th edition), Academic Press, 1997.

    Google Scholar 

  25. Nadeu C., Hernando J., and Gorricho M., “On the decorrelation of filterbank energies in speech recognition,” Proc. of Eurospeech’95 (Madrid, Spain), pp. 1381–1384, 1995.

    Google Scholar 

  26. Okawa S., Bocghieri E., and Potamianos A., “Multi-band speech recognition in noisy environment,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1998.

    Google Scholar 

  27. Rao S. and Pearlman W.A., “Analysis of linear prediction, coding, and spectral estimation from subbands,” IEEE Trans, on Information Theory, Vol. 42, pp. 1160–1178, July 1996.

    Article  MATH  Google Scholar 

  28. Tomlinson J., Rüssel M.J., and Brooke N.M., “Integrating audio and visual information to provide highly robust speech recognition,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Atlanta), May 1996.

    Google Scholar 

  29. Tomlinson M.J., Rüssel M.J., Moore R.K., Bucklan A.P., and Fawley M.A., “Modelling asynchrony in speech using elementary single-signal decomposition,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Munich), pp. 1247–1250, April 1997.

    Google Scholar 

  30. Varga A. and Moore R., “Hidden markov model decomposition of speech and noise,” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, pp. 845–848, 1990.

    Google Scholar 

  31. Weber K., Bengio S., and Bourlard H., “HMM2-Extraction of Formant Features and their Use for Robust ASR”, Proc. of Eurospeech, pp. 607–610, 2001.

    Google Scholar 

  32. Wellekens C.J., Kangasharju J., and Milesi C, “The use of meta-HMM in multistream HMM training for automatic speech recognition,” Proc. of Intl. Conference on Spoken Language Processing (Sydney), pp. 2991–2994, December 1998.

    Google Scholar 

  33. Wu S.-L., KIngsbury B.E., Morgan N., and Greenberg S., “Performance improvements through combining phone and syllable-scale information in automatic speech recognition,” Proc. Intl. Conf. on Spoken Language Processing (Sydney), pp. 459–462, Dec. 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this paper

Cite this paper

Bourlard, H., Bengio, S., Weber, K. (2004). Towards Robust and Adaptive Speech Recognition Models. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds) Mathematical Foundations of Speech and Language Processing. The IMA Volumes in Mathematics and its Applications, vol 138. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9017-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-9017-4_9

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-6484-2

  • Online ISBN: 978-1-4419-9017-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics