Abstract
A series of experiments and model developments were performed to quantitatively describe and predict speech recognition in listeners with normal and impaired hearing in quiet as well as in realistic, fluctuating, and spatially distributed noise environments. On a macroscopic level, classical speech-information-based models such as the Speech Intelligibility Index (SII) yield accurate predictions only for average intelligibility scores and for a limited set of acoustical situations. A binaural extension using a binaural preprocessing model provides surprisingly accurate predictions for a wide range of acoustically complex, spatial situations.
On a microscopic (i.e. phoneme-to-phoneme) scale, the combination of a psychoacoustically and physiologically motivated preprocessing model with a pattern recognition algorithm adopted from automatic speech recognition (ASR) technology allows for a detailed analysis of phoneme confusions and explains the “man-machine-gap” of approx. 12 dB in signal to noise ratio. This finding highlights the superiority of human world-knowledge-driven (top-down) speech pattern recognition in comparison to the training-data-driven (bottom-up) machine learning approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ANSI (1997) Methods for the calculation of the speech intelligibility index. American national standard S3.5 - 1997, Standards Secretariat, Acoustical Society of America, New York, USA
Barker J, Cooke M (2007) Modelling speaker intelligibility in noise. Speech Commun 49:402–417
Beutelmann R, Brand T (2006) Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. J Acoust Soc Am 120:331–342
Beutelmann R, Brand T, Kollmeier B (2010) Revision, extension, and evaluation of a binaural speech intelligibility model (BSIM). J Acoust Soc Am (in press)
Beutelmann R, Brand T, Kollmeier B (2009) Prediction of binaural speech intelligibility with frequency-dependent interaural phase differences. J Acoust Soc Am 126(3):1359–1368
Brand T, Kollmeier B (2002) Vorhersage der Sprachverständlichkeit in Ruhe und im Störgeräusch aufgrund des Reintonaudiogramms (Jahrestagung der Deutsche Gesellschaft für Audiologie)
Dau T, Püschel D, Kohlrausch A (1996) A quantitative model of the ‘effective’ signal processing in the auditory system: I. model structure. J Acoust Soc Am 99:3615–3622
Dreschler WA, Verschuure H, Ludvigsen C, Westermann S (2001) Icra noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International collegium for rehabilitative audiology. Audiology 40:148–157
Durlach NI (1963) Equalization and cancellation theory of binaural masking-level differences. J Acoust Soc Am 35:1206–1218
Jepsen ML, Ewert SD, Dau T (2008) A computational model of human auditory signal processing and perception. J Acoust Soc Am 124:422–438
Jürgens T, Brand T, Kollmeier B (2007) Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model. In: Interspeech. Antwerp, Belgium, pp 410–413
Jürgens T, Brand T (2009) Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model. J Acoust Soc Am 126:2635–2648
Holube I, Kollmeier B (1996) Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. J Acoust Soc Am 100:1703–1716
Hohmann V (2002) Frequency analysis and synthesis using a gammatone filterbank. Acta acustica/Acustica 88:433–442
Meyer RM, Brand T, Kollmeier B (2007a) Predicting speech intelligibility in fluctuating noise. In: 8th EFAS Congress/10th Congress of the German Society of Audiology, Deutsche Gesellschaft für Audiologie e.V., Heidelberg, CD-ROM
Meyer B, Brand T, Kollmeier B (2007b) Phoneme confusions in human and automatic speech recognition. In: Interspeech. Antwerp, Belgium, pp 1485–1488
Rhebergen K, Versfeld N (2005) A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. J Acoust Soc Am 117:2181–2192
vom Hövel H (1984) Zur Bedeutung der Übertragungseigenschaften des Außenohrs sowie des binauralen Hörsystems bei gestörter Sprachübertragung. Dissertation, Fakultät für Elektrotechnik, RTWH Aachen
Wagener K, Kühnel V, Kollmeier B (1999a) Entwicklung und Evaluation eines Satztests für die deutsche Sprache I. Zeitschrift für Audiologie 38(1):4–14
Wagener K, Brand T, Kollmeier B (1999b) Entwicklung und Evaluation eines Satztests für die deutsche Sprache II. Zeitschrift für Audiologie 38(2):44–56
Wagener K, Brand T, Kollmeier B (1999c) Entwicklung und Evaluation eines Satztests für die deutsche Sprache III. Zeitschrift für Audiologie 38(3):86–95
Wagener K, Brand T, Kollmeier B (2006) The role of silent intervals for sentence intelligibility in fluctuating noise in hearing-impaired listeners. Int J Audiol 45:26–33
Acknowledgment
This work was supported by DFG SFB TRR 39, The active auditory system. CEC-Project Hearcom, and the Audiologie-Initiative Niedersachsen.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this paper
Cite this paper
Brand, T., Jürgens, T., Beutelmann, R., Meyer, R.M., Kollmeier, B. (2010). Macroscopic and Microscopic Analysis of Speech Recognition in Noise: What Can Be Understood at Which Level?. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_39
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5686-6_39
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5685-9
Online ISBN: 978-1-4419-5686-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)