Intelligibility prediction for distorted sentences by the normalized covariance measure



Speech-transmission index (STI) has been extensively used for predicting the intelligibility of speech corrupted by reverberation and additive noise. This study further evaluated its performance in predicting the intelligibility of three types of distorted sentences, i.e., time-reversed stimuli, vocoded stimuli, and stimuli containing recovered envelope from Hilbert fine-structure condition (R-HFS). The distorted sentences were simulated, and the intelligibility was predicted by the normalized covariance measure (NCM), which was a STI-based index. The NCM measure was evaluated with the intelligibility scores available for the three types of distorted stimuli, and the performance was also compared with those obtained with the PESQ measure and coherence-based speech intelligibility index. It was found that the NCM measure consistently well predicted the intelligibility in all three conditions of speech distortion: (1) the intelligibility of time-reversed speech continuously declined till the segmentation duration for speech reversal increased to 200 ms; (2) the intelligibility of tone-vocoded and noise-vocoded stimuli improved with more channels used in vocoder, and the intelligibility of these two types of vocoded sentences showed a small difference; and (3) the intelligibility of R-HFS stimuli decreased when the number of analysis bands varied from one to eight. Supplementary to previous outcomes on speech intelligibility prediction, the results in present work support that the intelligibility of distorted sentences could be well predicted by the NCM measure.


Normalized covariance measure (NCM) Speech intelligibility 


  1. American National Standards Institute (1997). Methods for calculation of the speech intelligibility index, S3.5–1997. Google Scholar
  2. Chen, F., & Loizou, P. C. (2010). Contribution of consonant landmarks to speech recognition in simulated acoustic-electric hearing. Ear and Hearing, 31, 259–267. CrossRefGoogle Scholar
  3. Chen, F., & Loizou, P. C. (2011a). Predicting the intelligibility of vocoded speech. Ear and Hearing, 32, 331–338. CrossRefGoogle Scholar
  4. Chen, F., & Loizou, P. C. (2011b). Predicting the intelligibility of vocoded and wideband Mandarin Chinese. Journal of Acoustical Society of America, 129, 3281–3290. CrossRefGoogle Scholar
  5. Dorman, M., Loizou, P. C., & Rainey, D. (1997). Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. Journal of Acoustical Society of America, 102, 2403–2411. CrossRefGoogle Scholar
  6. Friesen, L., Shannon, R., Baskent, D., & Wang, X. (2001). Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. Journal of Acoustical Society of America, 110, 1150–1163. CrossRefGoogle Scholar
  7. Gilberta, G., & Lorenzi, C. (2006). The ability of listeners to use recovered envelope cues from speech fine structure. Journal of Acoustical Society of America, 119, 2438–2444. CrossRefGoogle Scholar
  8. Goldsworthy, R., & Greenberg, J. (2004). Analysis of speech-based speech transmission index methods with implications for nonlinear operations. Journal of Acoustical Society of America, 116, 3679–3689. CrossRefGoogle Scholar
  9. Greenwood, D. A. (1990). Cochlear frequency-position function for several species—29 years later. Journal of Acoustical Society of America, 87, 2592–2605. CrossRefGoogle Scholar
  10. Houtgast, T., & Steeneken, H. (1971). Evaluation of speech transmission channels by using artificial signals. Acustica, 25, 355–367. Google Scholar
  11. Houtgast, T., & Steeneken, H. (1985). A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. Journal of Acoustical Society of America, 77, 1069–1077. CrossRefGoogle Scholar
  12. ITU-T (2000). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P. 862. Google Scholar
  13. Ma, J., Hu, Y., & Loizou, P. C. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. Journal of Acoustical Society of America, 125, 3387–3405. CrossRefGoogle Scholar
  14. Moore, B., & Glasberg, B. (1993). Suggested formulas for calculation auditory-filter bandwidths and excitation patterns. Journal of Acoustical Society of America, 74, 750–753. CrossRefGoogle Scholar
  15. Nilsson, M., Soli, S., & Sullivan, J. (1994). Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and noise. Journal of Acoustical Society of America, 95, 1085–1099. CrossRefGoogle Scholar
  16. Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proc. IEEE int. conf. acoust., speech, signal process, Salt Lake City, USA (pp. 749–752). Google Scholar
  17. Saberi, K., & Perrott, D. R. (1999). Cognitive restoration of reversed speech. Nature (London), 398, 760. CrossRefGoogle Scholar
  18. Shannon, R., Zeng, F. G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270, 303–304. CrossRefGoogle Scholar
  19. Steeneken, H., & Houtgast, T. (1980). A physical method for measuring speech transmission quality. Journal of Acoustical Society of America, 67, 318–326. CrossRefGoogle Scholar
  20. Steeneken, H., & Houtgast, T. (1982). Some applications of the speech transmission index (STI) in auditoria. Acustica, 51, 229–234. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Electrical EngineeringThe University of Texas at DallasRichardsonUSA

Personalised recommendations