Speaker verification under degraded condition: a perceptual study
This study analyzes the effect of degradation on human and automatic speaker verification (SV) tasks. The perceptual test is conducted by the subjects having knowledge about speaker verification. An automatic SV system is developed using the Mel-frequency cepstral coefficients (MFCC) and Gaussian mixture model (GMM). The human and automatic speaker verification performances are compared for clean train and different degraded test conditions. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. The perceptual cues that the human subjects used as speaker specific information are investigated and their importance in degraded condition is highlighted. The difference in the nature of human and automatic SV tasks is investigated in terms of falsely accepted and falsely rejected speech pairs. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. A discussion on human vs automatic speaker verification is carried out and the possibility of performance improvement of automatic speaker verification under degraded condition is suggested.
KeywordsSpeaker information Speaker verification Degraded condition Human vs automatic
- Alexandera, A., Bottib, F., Dessimozb, D., & Drygajlo, A. (2004). The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. In Forensic Science International (pp. 95–99). Google Scholar
- Hogg, R. V., & Ledolter, J. (1987). Engineering statistics. New York: Macmillan. Google Scholar
- Nielsen, A. S., & Crystal, T. H. (1998). Human vs. machine speaker identification with telephone speech. In Inter. conf. on spoken language processing, Sydney, Australia (pp. 221–224). Google Scholar
- Nielsen, A. S., & Crystal, T. H. (2000). Speaker verification by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data. Digital Signal Processing, 249–266. Google Scholar
- NIST (2003). NIST-speaker recognition evaluations. In [Online], Available: http://www.nist.gov/speech/tests/spk.
- Pelecanos, J., & Sridharan, S. (2001). Feature warping for robust speaker verification. In Speaker Odessy: the speaker recognition workshop (pp. 213–218). Google Scholar
- Prasanna, S. R. M., & Pradhan, G. (2011 in press). Significance of vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Audio, Speech, and Language Processing. Google Scholar
- Teunen, R., Shahshahani, B., & Heck, L. P. (2000). A model-based transformation approach to robust speaker recognition. In Proc. int. conf. on spoken language processing. Beijing, China (Vol. 2, pp. 495–498). Google Scholar