Intelligibility prediction for distorted sentences by the normalized covariance measure
Speech-transmission index (STI) has been extensively used for predicting the intelligibility of speech corrupted by reverberation and additive noise. This study further evaluated its performance in predicting the intelligibility of three types of distorted sentences, i.e., time-reversed stimuli, vocoded stimuli, and stimuli containing recovered envelope from Hilbert fine-structure condition (R-HFS). The distorted sentences were simulated, and the intelligibility was predicted by the normalized covariance measure (NCM), which was a STI-based index. The NCM measure was evaluated with the intelligibility scores available for the three types of distorted stimuli, and the performance was also compared with those obtained with the PESQ measure and coherence-based speech intelligibility index. It was found that the NCM measure consistently well predicted the intelligibility in all three conditions of speech distortion: (1) the intelligibility of time-reversed speech continuously declined till the segmentation duration for speech reversal increased to 200 ms; (2) the intelligibility of tone-vocoded and noise-vocoded stimuli improved with more channels used in vocoder, and the intelligibility of these two types of vocoded sentences showed a small difference; and (3) the intelligibility of R-HFS stimuli decreased when the number of analysis bands varied from one to eight. Supplementary to previous outcomes on speech intelligibility prediction, the results in present work support that the intelligibility of distorted sentences could be well predicted by the NCM measure.
KeywordsNormalized covariance measure (NCM) Speech intelligibility
- American National Standards Institute (1997). Methods for calculation of the speech intelligibility index, S3.5–1997. Google Scholar
- Houtgast, T., & Steeneken, H. (1971). Evaluation of speech transmission channels by using artificial signals. Acustica, 25, 355–367. Google Scholar
- ITU-T (2000). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. ITU-T Recommendation P. 862. Google Scholar
- Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. In Proc. IEEE int. conf. acoust., speech, signal process, Salt Lake City, USA (pp. 749–752). Google Scholar
- Steeneken, H., & Houtgast, T. (1982). Some applications of the speech transmission index (STI) in auditoria. Acustica, 51, 229–234. Google Scholar