Macroscopic and Microscopic Analysis of Speech Recognition in Noise: What Can Be Understood at Which Level?

Brand, Thomas; Jürgens, Tim; Beutelmann, Rainer; Meyer, Ralf M.; Kollmeier, Birger

doi:10.1007/978-1-4419-5686-6_39

Thomas Brand⁴,
Tim Jürgens,
Rainer Beutelmann,
Ralf M. Meyer &
…
Birger Kollmeier

1299 Accesses

Abstract

A series of experiments and model developments were performed to quantitatively describe and predict speech recognition in listeners with normal and impaired hearing in quiet as well as in realistic, fluctuating, and spatially distributed noise environments. On a macroscopic level, classical speech-information-based models such as the Speech Intelligibility Index (SII) yield accurate predictions only for average intelligibility scores and for a limited set of acoustical situations. A binaural extension using a binaural preprocessing model provides surprisingly accurate predictions for a wide range of acoustically complex, spatial situations.

On a microscopic (i.e. phoneme-to-phoneme) scale, the combination of a psychoacoustically and physiologically motivated preprocessing model with a pattern recognition algorithm adopted from automatic speech recognition (ASR) technology allows for a detailed analysis of phoneme confusions and explains the “man-machine-gap” of approx. 12 dB in signal to noise ratio. This finding highlights the superiority of human world-knowledge-driven (top-down) speech pattern recognition in comparison to the training-data-driven (bottom-up) machine learning approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

ANSI (1997) Methods for the calculation of the speech intelligibility index. American national standard S3.5 - 1997, Standards Secretariat, Acoustical Society of America, New York, USA
Google Scholar
Barker J, Cooke M (2007) Modelling speaker intelligibility in noise. Speech Commun 49:402–417
Article Google Scholar
Beutelmann R, Brand T (2006) Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. J Acoust Soc Am 120:331–342
Article PubMed Google Scholar
Beutelmann R, Brand T, Kollmeier B (2010) Revision, extension, and evaluation of a binaural speech intelligibility model (BSIM). J Acoust Soc Am (in press)
Google Scholar
Beutelmann R, Brand T, Kollmeier B (2009) Prediction of binaural speech intelligibility with frequency-dependent interaural phase differences. J Acoust Soc Am 126(3):1359–1368
Article PubMed Google Scholar
Brand T, Kollmeier B (2002) Vorhersage der Sprachverständlichkeit in Ruhe und im Störgeräusch aufgrund des Reintonaudiogramms (Jahrestagung der Deutsche Gesellschaft für Audiologie)
Google Scholar
Dau T, Püschel D, Kohlrausch A (1996) A quantitative model of the ‘effective’ signal processing in the auditory system: I. model structure. J Acoust Soc Am 99:3615–3622
Article PubMed CAS Google Scholar
Dreschler WA, Verschuure H, Ludvigsen C, Westermann S (2001) Icra noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International collegium for rehabilitative audiology. Audiology 40:148–157
Article PubMed CAS Google Scholar
Durlach NI (1963) Equalization and cancellation theory of binaural masking-level differences. J Acoust Soc Am 35:1206–1218
Article Google Scholar
Jepsen ML, Ewert SD, Dau T (2008) A computational model of human auditory signal processing and perception. J Acoust Soc Am 124:422–438
Article PubMed Google Scholar
Jürgens T, Brand T, Kollmeier B (2007) Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model. In: Interspeech. Antwerp, Belgium, pp 410–413
Google Scholar
Jürgens T, Brand T (2009) Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model. J Acoust Soc Am 126:2635–2648
Article PubMed Google Scholar
Holube I, Kollmeier B (1996) Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. J Acoust Soc Am 100:1703–1716
Article PubMed CAS Google Scholar
Hohmann V (2002) Frequency analysis and synthesis using a gammatone filterbank. Acta acustica/Acustica 88:433–442
Google Scholar
Meyer RM, Brand T, Kollmeier B (2007a) Predicting speech intelligibility in fluctuating noise. In: 8th EFAS Congress/10th Congress of the German Society of Audiology, Deutsche Gesellschaft für Audiologie e.V., Heidelberg, CD-ROM
Google Scholar
Meyer B, Brand T, Kollmeier B (2007b) Phoneme confusions in human and automatic speech recognition. In: Interspeech. Antwerp, Belgium, pp 1485–1488
Google Scholar
Rhebergen K, Versfeld N (2005) A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. J Acoust Soc Am 117:2181–2192
Article PubMed Google Scholar
vom Hövel H (1984) Zur Bedeutung der Übertragungseigenschaften des Außenohrs sowie des binauralen Hörsystems bei gestörter Sprachübertragung. Dissertation, Fakultät für Elektrotechnik, RTWH Aachen
Google Scholar
Wagener K, Kühnel V, Kollmeier B (1999a) Entwicklung und Evaluation eines Satztests für die deutsche Sprache I. Zeitschrift für Audiologie 38(1):4–14
Google Scholar
Wagener K, Brand T, Kollmeier B (1999b) Entwicklung und Evaluation eines Satztests für die deutsche Sprache II. Zeitschrift für Audiologie 38(2):44–56
Google Scholar
Wagener K, Brand T, Kollmeier B (1999c) Entwicklung und Evaluation eines Satztests für die deutsche Sprache III. Zeitschrift für Audiologie 38(3):86–95
Google Scholar
Wagener K, Brand T, Kollmeier B (2006) The role of silent intervals for sentence intelligibility in fluctuating noise in hearing-impaired listeners. Int J Audiol 45:26–33
Article PubMed Google Scholar

Download references

Acknowledgment

This work was supported by DFG SFB TRR 39, The active auditory system. CEC-Project Hearcom, and the Audiologie-Initiative Niedersachsen.

Author information

Authors and Affiliations

Medical Physics, University of Oldenburg, Oldenburg, Germany
Thomas Brand

Authors

Thomas Brand
View author publications
You can also search for this author in PubMed Google Scholar
Tim Jürgens
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Beutelmann
View author publications
You can also search for this author in PubMed Google Scholar
Ralf M. Meyer
View author publications
You can also search for this author in PubMed Google Scholar
Birger Kollmeier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Brand .

Editor information

Editors and Affiliations

Inst. Neurociencias de Castilla y León, Universidad de Salamanca, Av. Alfonso X El Sabio s/n, Salamanca, 37007, Spain
Enrique A. Lopez-Poveda
MRC Inst.of Hearing Research, University Park, Nottingham, NG7 2RD, United Kingdom
Alan R. Palmer
University of Essex, Wivenhoe Park, Colchester, Essex, CO4 3SQ, United Kingdom
Ray Meddis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brand, T., Jürgens, T., Beutelmann, R., Meyer, R.M., Kollmeier, B. (2010). Macroscopic and Microscopic Analysis of Speech Recognition in Noise: What Can Be Understood at Which Level?. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_39

Download citation

DOI: https://doi.org/10.1007/978-1-4419-5686-6_39
Published: 16 February 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5685-9
Online ISBN: 978-1-4419-5686-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics