Skip to main content

Macroscopic and Microscopic Analysis of Speech Recognition in Noise: What Can Be Understood at Which Level?

  • Conference paper
  • First Online:
The Neurophysiological Bases of Auditory Perception

Abstract

A series of experiments and model developments were performed to quantitatively describe and predict speech recognition in listeners with normal and impaired hearing in quiet as well as in realistic, fluctuating, and spatially distributed noise environments. On a macroscopic level, classical speech-information-based models such as the Speech Intelligibility Index (SII) yield accurate predictions only for average intelligibility scores and for a limited set of acoustical situations. A binaural extension using a binaural preprocessing model provides surprisingly accurate predictions for a wide range of acoustically complex, spatial situations.

On a microscopic (i.e. phoneme-to-phoneme) scale, the combination of a psychoacoustically and physiologically motivated preprocessing model with a pattern recognition algorithm adopted from automatic speech recognition (ASR) technology allows for a detailed analysis of phoneme confusions and explains the “man-machine-gap” of approx. 12 dB in signal to noise ratio. This finding highlights the superiority of human world-knowledge-driven (top-down) speech pattern recognition in ­comparison to the training-data-driven (bottom-up) machine learning approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • ANSI (1997) Methods for the calculation of the speech intelligibility index. American national standard S3.5 - 1997, Standards Secretariat, Acoustical Society of America, New York, USA

    Google Scholar 

  • Barker J, Cooke M (2007) Modelling speaker intelligibility in noise. Speech Commun 49:402–417

    Article  Google Scholar 

  • Beutelmann R, Brand T (2006) Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. J Acoust Soc Am 120:331–342

    Article  PubMed  Google Scholar 

  • Beutelmann R, Brand T, Kollmeier B (2010) Revision, extension, and evaluation of a binaural speech intelligibility model (BSIM). J Acoust Soc Am (in press)

    Google Scholar 

  • Beutelmann R, Brand T, Kollmeier B (2009) Prediction of binaural speech intelligibility with frequency-dependent interaural phase differences. J Acoust Soc Am 126(3):1359–1368

    Article  PubMed  Google Scholar 

  • Brand T, Kollmeier B (2002) Vorhersage der Sprachverständlichkeit in Ruhe und im Störgeräusch aufgrund des Reintonaudiogramms (Jahrestagung der Deutsche Gesellschaft für Audiologie)

    Google Scholar 

  • Dau T, Püschel D, Kohlrausch A (1996) A quantitative model of the ‘effective’ signal processing in the auditory system: I. model structure. J Acoust Soc Am 99:3615–3622

    Article  PubMed  CAS  Google Scholar 

  • Dreschler WA, Verschuure H, Ludvigsen C, Westermann S (2001) Icra noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International collegium for rehabilitative audiology. Audiology 40:148–157

    Article  PubMed  CAS  Google Scholar 

  • Durlach NI (1963) Equalization and cancellation theory of binaural masking-level differences. J Acoust Soc Am 35:1206–1218

    Article  Google Scholar 

  • Jepsen ML, Ewert SD, Dau T (2008) A computational model of human auditory signal processing and perception. J Acoust Soc Am 124:422–438

    Article  PubMed  Google Scholar 

  • Jürgens T, Brand T, Kollmeier B (2007) Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model. In: Interspeech. Antwerp, Belgium, pp 410–413

    Google Scholar 

  • Jürgens T, Brand T (2009) Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model. J Acoust Soc Am 126:2635–2648

    Article  PubMed  Google Scholar 

  • Holube I, Kollmeier B (1996) Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. J Acoust Soc Am 100:1703–1716

    Article  PubMed  CAS  Google Scholar 

  • Hohmann V (2002) Frequency analysis and synthesis using a gammatone filterbank. Acta acustica/Acustica 88:433–442

    Google Scholar 

  • Meyer RM, Brand T, Kollmeier B (2007a) Predicting speech intelligibility in fluctuating noise. In: 8th EFAS Congress/10th Congress of the German Society of Audiology, Deutsche Gesellschaft für Audiologie e.V., Heidelberg, CD-ROM

    Google Scholar 

  • Meyer B, Brand T, Kollmeier B (2007b) Phoneme confusions in human and automatic speech recognition. In: Interspeech. Antwerp, Belgium, pp 1485–1488

    Google Scholar 

  • Rhebergen K, Versfeld N (2005) A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. J Acoust Soc Am 117:2181–2192

    Article  PubMed  Google Scholar 

  • vom Hövel H (1984) Zur Bedeutung der Übertragungseigenschaften des Außenohrs sowie des binauralen Hörsystems bei gestörter Sprachübertragung. Dissertation, Fakultät für Elektrotechnik, RTWH Aachen

    Google Scholar 

  • Wagener K, Kühnel V, Kollmeier B (1999a) Entwicklung und Evaluation eines Satztests für die deutsche Sprache I. Zeitschrift für Audiologie 38(1):4–14

    Google Scholar 

  • Wagener K, Brand T, Kollmeier B (1999b) Entwicklung und Evaluation eines Satztests für die deutsche Sprache II. Zeitschrift für Audiologie 38(2):44–56

    Google Scholar 

  • Wagener K, Brand T, Kollmeier B (1999c) Entwicklung und Evaluation eines Satztests für die deutsche Sprache III. Zeitschrift für Audiologie 38(3):86–95

    Google Scholar 

  • Wagener K, Brand T, Kollmeier B (2006) The role of silent intervals for sentence intelligibility in fluctuating noise in hearing-impaired listeners. Int J Audiol 45:26–33

    Article  PubMed  Google Scholar 

Download references

Acknowledgment

This work was supported by DFG SFB TRR 39, The active auditory system. CEC-Project Hearcom, and the Audiologie-Initiative Niedersachsen.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Brand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this paper

Cite this paper

Brand, T., Jürgens, T., Beutelmann, R., Meyer, R.M., Kollmeier, B. (2010). Macroscopic and Microscopic Analysis of Speech Recognition in Noise: What Can Be Understood at Which Level?. In: Lopez-Poveda, E., Palmer, A., Meddis, R. (eds) The Neurophysiological Bases of Auditory Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5686-6_39

Download citation

Publish with us

Policies and ethics