Skip to main content
Log in

Intelligibility prediction for distorted sentences by the normalized covariance measure

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Speech-transmission index (STI) has been extensively used for predicting the intelligibility of speech corrupted by reverberation and additive noise. This study further evaluated its performance in predicting the intelligibility of three types of distorted sentences, i.e., time-reversed stimuli, vocoded stimuli, and stimuli containing recovered envelope from Hilbert fine-structure condition (R-HFS). The distorted sentences were simulated, and the intelligibility was predicted by the normalized covariance measure (NCM), which was a STI-based index. The NCM measure was evaluated with the intelligibility scores available for the three types of distorted stimuli, and the performance was also compared with those obtained with the PESQ measure and coherence-based speech intelligibility index. It was found that the NCM measure consistently well predicted the intelligibility in all three conditions of speech distortion: (1) the intelligibility of time-reversed speech continuously declined till the segmentation duration for speech reversal increased to 200 ms; (2) the intelligibility of tone-vocoded and noise-vocoded stimuli improved with more channels used in vocoder, and the intelligibility of these two types of vocoded sentences showed a small difference; and (3) the intelligibility of R-HFS stimuli decreased when the number of analysis bands varied from one to eight. Supplementary to previous outcomes on speech intelligibility prediction, the results in present work support that the intelligibility of distorted sentences could be well predicted by the NCM measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • American National Standards Institute (1997). Methods for calculation of the speech intelligibility index, S3.5–1997.

  • Chen, F., & Loizou, P. C. (2010). Contribution of consonant landmarks to speech recognition in simulated acoustic-electric hearing. Ear and Hearing, 31, 259–267.

    Article  Google Scholar 

  • Chen, F., & Loizou, P. C. (2011a). Predicting the intelligibility of vocoded speech. Ear and Hearing, 32, 331–338.

    Article  Google Scholar 

  • Chen, F., & Loizou, P. C. (2011b). Predicting the intelligibility of vocoded and wideband Mandarin Chinese. Journal of Acoustical Society of America, 129, 3281–3290.

    Article  Google Scholar 

  • Dorman, M., Loizou, P. C., & Rainey, D. (1997). Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. Journal of Acoustical Society of America, 102, 2403–2411.

    Article  Google Scholar 

  • Friesen, L., Shannon, R., Baskent, D., & Wang, X. (2001). Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. Journal of Acoustical Society of America, 110, 1150–1163.

    Article  Google Scholar 

  • Gilberta, G., & Lorenzi, C. (2006). The ability of listeners to use recovered envelope cues from speech fine structure. Journal of Acoustical Society of America, 119, 2438–2444.

    Article  Google Scholar 

  • Goldsworthy, R., & Greenberg, J. (2004). Analysis of speech-based speech transmission index methods with implications for nonlinear operations. Journal of Acoustical Society of America, 116, 3679–3689.

    Article  Google Scholar 

  • Greenwood, D. A. (1990). Cochlear frequency-position function for several species—29 years later. Journal of Acoustical Society of America, 87, 2592–2605.

    Article  Google Scholar 

  • Houtgast, T., & Steeneken, H. (1971). Evaluation of speech transmission channels by using artificial signals. Acustica, 25, 355–367.

    Google Scholar 

  • Houtgast, T., & Steeneken, H. (1985). A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. Journal of Acoustical Society of America, 77, 1069–1077.

    Article  Google Scholar 

  • ITU-T (2000). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P. 862.

  • Ma, J., Hu, Y., & Loizou, P. C. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. Journal of Acoustical Society of America, 125, 3387–3405.

    Article  Google Scholar 

  • Moore, B., & Glasberg, B. (1993). Suggested formulas for calculation auditory-filter bandwidths and excitation patterns. Journal of Acoustical Society of America, 74, 750–753.

    Article  Google Scholar 

  • Nilsson, M., Soli, S., & Sullivan, J. (1994). Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and noise. Journal of Acoustical Society of America, 95, 1085–1099.

    Article  Google Scholar 

  • Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proc. IEEE int. conf. acoust., speech, signal process, Salt Lake City, USA (pp. 749–752).

    Google Scholar 

  • Saberi, K., & Perrott, D. R. (1999). Cognitive restoration of reversed speech. Nature (London), 398, 760.

    Article  Google Scholar 

  • Shannon, R., Zeng, F. G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270, 303–304.

    Article  Google Scholar 

  • Steeneken, H., & Houtgast, T. (1980). A physical method for measuring speech transmission quality. Journal of Acoustical Society of America, 67, 318–326.

    Article  Google Scholar 

  • Steeneken, H., & Houtgast, T. (1982). Some applications of the speech transmission index (STI) in auditoria. Acustica, 51, 229–234.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, F. Intelligibility prediction for distorted sentences by the normalized covariance measure. Int J Speech Technol 14, 237–243 (2011). https://doi.org/10.1007/s10772-011-9099-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9099-z

Keywords

Navigation