Towards Robust and Adaptive Speech Recognition Models

Bourlard, Herve; Bengio, Samy; Weber, Katrin

doi:10.1007/978-1-4419-9017-4_9

Towards Robust and Adaptive Speech Recognition Models

Herve Bourlard⁶,
Samy Bengio⁷ &
Katrin Weber

Conference paper

690 Accesses

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 138))

Abstract

In this paper, we discuss a family of new Automatic Speech Recognition (ASR) approaches, which somewhat deviate from the usual ASR approaches but which have recently been shown to be more robust to nonstationary noise, without requiring specific adaptation or “multi-style” training. More specifically, we will motivate and briefly describe new approaches based on multi-stream and subband ASR. These approaches extend the standard hidden Markov model (HMM) based approach by assuming that the different (frequency) streams representing the speech signal are processed by different (independent) “experts”, each expert focusing on a different characteristic of the signal, and that the different stream likelihoods (or posteriors) are combined at some (temporal) stage to yield a global recognition output. As a further extension to multi-stream ASR, we will finally introduce a new approach, referred to as HMM2, where the HMM emission probabilities are estimated via state specific feature based HMMs responsible for merging the stream information and modeling their possible correlation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen J., “HOW do humans process and recognize speech?,” IEEE Trans, on Speech and Audio Processing, Vol. 2, no. 4, pp. 567–577, 1994.
Article Google Scholar
BEngio S., Bourlard H., and Weber K., “An EM Algorithm for HMMs with Emission Distributions Represented by HMMs,” IDIAP Research Report, IDIAP-RR-00-11, 2000.
Google Scholar
Berthommier F. and Glotin H., “A new SNR-feature mapping for robust multistream speech recognition,” Intl. Conf. of Phonetic Sciences (ICPhS’99) (San Francisco), to appear, August 1999.
Google Scholar
Bishop C.M., Neural Networks for Pattern Recognition, Clarendon Press (Oxford), 1995.
Google Scholar
Bourlard H. and Morgan N., Connectionist Speech Recognition-A Hybrid Approach, Kluwer Academic Publishers, 1994.
Google Scholar
Bourlard H. and Dupont S., “A new ASR approach based on independent processing and combination of partial frequency bands,” Proc. of Intl. Conf. on Spoken Language Processing (Philadelphia), pp. 422–425, October 1996.
Google Scholar
DE Vueth J., DE Wet F., Cranen B., and Boves L., “Missing feature theory in ASR: make sure you miss the right type of features,” Proceedings of the ESCA Workshop on Robust Speech Recognition (Tampere, Finland), May 25-26, 1999.
Google Scholar
Duda R.O. and Hart P.E., Pattern Classification and Scene Analysis, John Wiley, 1973.
Google Scholar
Greenberg S., “On the origins of speech intelligibility in the real world,” Proc. of the ESCA Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 23–32, ESCA, April 1997.
Google Scholar
Hagen A., Morris A., and Bourlard H., “Subband-based speech recognition in noisy conditions: The full combination approach,” IDIAP Research Report no. IDIAP-RR-98-15, 1998.
Google Scholar
Hagen A., Morris A., and Bourlard H., “Different weighting schemes in the full combination subbands approach for noise robust ASR,” Proceedings of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions (Tampere, Finland), May 25-26, 1999.
Google Scholar
Hennebert J., Ris C, Bourlard H., REnals S., and Morgan N. (1997), “Estimation of Global Posteriors and Forward-Backward Training of Hybrid Systems,” Proceedings of EUROSPEECH’97 (Rhodes, Greece, Sep. 1997), pp. 1951–1954.
Google Scholar
Hermansky H. and Morgan N., “RASTA processing of speech,” IEEE Trans. on Speech and Audio Processing, Vol. 2, no. 4, pp. 578–589, October 1994.
Article Google Scholar
Hermansky H., Pavel M., and Tribewala S., “Towards ASR using partially corrupted speech,” Proc. of Intl. Conf. on Spoken Language Processing (Philadelphia), pp. 458–461, October 1996.
Google Scholar
Hermansky H. and Sharma S., “Temporal patterns (TRAPS) in ASR noisy speech,” Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Phoenix, AZ), pp. 289–292, March 1999.
Google Scholar
Houtgast T. and Steeneken H.J.M., “A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria,” J. Acoust. Soc. Am., Vol. 77, no. 3, pp. 1069–1077, March 1985.
Article Google Scholar
Ikbal S., Bourlard H., Bengio S., and Weber K., “IDIAP HMM/HMM2 System: Theoretical Basis and Software Specifications” IDIAP Research Report, IDIAP-RR-01-27, 2001.
Google Scholar
Kingsbury B., Morgan N., and Greenberg S., “Robust speech recognition using the modulation spectrogram,” Speech Communication, Vol. 25, nos. 1-3, pp. 117–132, 1998.
Article Google Scholar
Lippmann R.P. and Carlson B.A., “Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise,” Proc. Eurospeech’97 (Rhodes, Greece, September 1997), pp. KN37–40.
Google Scholar
Mcgurk H. and Mcdonald J., “Hearing lips and seeing voices,” Nature, no. 264, pp. 746–748, 1976.
Article Google Scholar
Mirghafori N. and Morgan N., “Transmissions and transitions: A study of two common assumptions in multi-band ASR,” Intl. IEEE Conf. on Acoustics, Speech, and Signal Processing (Seattle, WA, May 1997), pp. 713–716.
Google Scholar
Morris A.C., Coouke M.P., and Green P.D., “Some solutions to the missing features problem in data classification, with application to noise robust ASR,” Proc. Intl. Conf on Acoustics, Speech, and Signal Processing, pp. 737–740, 1998.
Google Scholar
Morris A.C., Hagen A., and Bourlard H., “The full combination subbands approach to noise robust HMM/ANN-based ASR,” Proc. of Eurospeech’99 (Budapest, Sep. 99), to appear.
Google Scholar
Moore B.C.J., An Introduction to the Psychology of Hearing (4th edition), Academic Press, 1997.
Google Scholar
Nadeu C., Hernando J., and Gorricho M., “On the decorrelation of filterbank energies in speech recognition,” Proc. of Eurospeech’95 (Madrid, Spain), pp. 1381–1384, 1995.
Google Scholar
Okawa S., Bocghieri E., and Potamianos A., “Multi-band speech recognition in noisy environment,” Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1998.
Google Scholar
Rao S. and Pearlman W.A., “Analysis of linear prediction, coding, and spectral estimation from subbands,” IEEE Trans, on Information Theory, Vol. 42, pp. 1160–1178, July 1996.
Article MATH Google Scholar
Tomlinson J., Rüssel M.J., and Brooke N.M., “Integrating audio and visual information to provide highly robust speech recognition,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Atlanta), May 1996.
Google Scholar
Tomlinson M.J., Rüssel M.J., Moore R.K., Bucklan A.P., and Fawley M.A., “Modelling asynchrony in speech using elementary single-signal decomposition,” Proc. of IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (Munich), pp. 1247–1250, April 1997.
Google Scholar
Varga A. and Moore R., “Hidden markov model decomposition of speech and noise,” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, pp. 845–848, 1990.
Google Scholar
Weber K., Bengio S., and Bourlard H., “HMM2-Extraction of Formant Features and their Use for Robust ASR”, Proc. of Eurospeech, pp. 607–610, 2001.
Google Scholar
Wellekens C.J., Kangasharju J., and Milesi C, “The use of meta-HMM in multistream HMM training for automatic speech recognition,” Proc. of Intl. Conference on Spoken Language Processing (Sydney), pp. 2991–2994, December 1998.
Google Scholar
Wu S.-L., KIngsbury B.E., Morgan N., and Greenberg S., “Performance improvements through combining phone and syllable-scale information in automatic speech recognition,” Proc. Intl. Conf. on Spoken Language Processing (Sydney), pp. 459–462, Dec. 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP), Swiss Federal Institute of Technology at Lausanne (EPFL), 4, Rue du Simplon, CH-1920, Martigny, Switzerland
Herve Bourlard
Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP), 4, Rue du Simplon, CH-1920, Martigny, Switzerland
Samy Bengio

Authors

Herve Bourlard
View author publications
You can also search for this author in PubMed Google Scholar
Samy Bengio
View author publications
You can also search for this author in PubMed Google Scholar
Katrin Weber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Cognitive and Linguistic Studies, Brown University, Providence, RI, 02912, USA
Mark Johnson
Dept. of ECE and Dept. of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
Sanjeev P. Khudanpur
Dept. of Electrical Engineering, University of Washington, Seattle, WA, 98195, USA
Mari Ostendorf
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Roni Rosenfeld

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bourlard, H., Bengio, S., Weber, K. (2004). Towards Robust and Adaptive Speech Recognition Models. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds) Mathematical Foundations of Speech and Language Processing. The IMA Volumes in Mathematics and its Applications, vol 138. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9017-4_9

Download citation

DOI: https://doi.org/10.1007/978-1-4419-9017-4_9
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-6484-2
Online ISBN: 978-1-4419-9017-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics