Environment-Related Robustness Issues

Zheng, Thomas Fang; Li, Lantian

doi:10.1007/978-981-10-3238-7_2

Environment-Related Robustness Issues

Thomas Fang Zheng³ &
Lantian Li³

Chapter
First Online: 07 April 2017

498 Accesses

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSIGNAL))

Abstract

In practical applications, many environment-related factors may influence the performance of speaker recognition. There is often no prior knowledge of these factors in advance, which makes the environment-related robustness issue more difficulty. In this chapter, three environment-related factors, background noise, cross channel and multiple-speaker, are summarized and their corresponding robustness issues are discussed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process 27(2):113–120
Article Google Scholar
Wang N, Ching PC, Zheng N et al (2011) Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans Audio Speech Lang Process 19(1):196–205
Article Google Scholar
Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589
Article Google Scholar
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272
Article Google Scholar
Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616
Article Google Scholar
Sadjadi SO, Hansen JHL (2010) Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions. INTERSPEECH, pp 2138–2141
Google Scholar
Hanilçi C, Kinnunen T, Saeidi R et al (2012) Comparing spectrum estimators in speaker verification under additive noise degradation. Acoustics, speech and signal processing (ICASSP), 2012 IEEE international conference on. IEEE, pp 4769–4772
Google Scholar
Lei Y, Burget L, Scheffer N (2013) A noise robust i-vector extractor using vector taylor series for speaker recognition. Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on. IEEE, pp 6788–6791
Google Scholar
Lei Y, McLaren M, Ferrer L et al (2014) Simplified vts-based i-vector extraction in noise-robust speaker recognition. Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE, pp 4037–4041
Google Scholar
Martinez D, Burget L, Stafylakis T et al (2014) Unscented transform for ivector-based noisy speaker recognition. Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE, pp 4042–4046
Google Scholar
Gales MJF, Young SJ (1996) Robust continuous speech recognition using parallel model combination. IEEE Trans Speech Audio Process 4(5):352–359
Article Google Scholar
Bellot O, Matrouf D, Merlin T et al (2000) Additive and convolutional noises compensation for speaker recognition. INTERSPEECH, pp 799–802
Google Scholar
Lei Y, Burget L, Ferrer L et al (2012) Towards noise-robust speaker recognition using probabilistic linear discriminant analysis. Acoustics, speech and signal processing (ICASSP), 2012 IEEE international conference on. IEEE, pp 4253–4256
Google Scholar
Doddington GR, Przybocki MA, Martin AF et al (2000) The NIST speaker recognition evaluation–overview, methodology, systems, results, perspective. Speech Commun 31(2):225–254
Article Google Scholar
The NIST year 2012 speaker recognition evaluation plan. https://www.nist.gov/sites/default/files/documents/itl/iad/mig/NIST_SRE12_evalplan-v17-r1.pdf
NIST 2016 speaker recognition evaluation plan. https://www.nist.gov/sites/default/files/documents/itl/iad/mig/SRE16_Eval_Plan_V1-0.pdf
Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272
Article Google Scholar
Reynolds DA (2003) Channel robust speaker verification via feature mapping. Acoustics, speech, and signal processing, 2003. Proceedings. (ICASSP’03). 2003 IEEE international conference on. IEEE, vol 2, pp 2–53
Google Scholar
Teunen R, Shahshahani B, Heck LP (2000) A model-based transformational approach to robust speaker recognition. INTERSPEECH, pp 495–498
Google Scholar
Wu W, Zheng TF, Xu MX et al (2007) A cohort-based speaker model synthesis for mismatched channels in speaker verification. IEEE Trans Audio Speech Lang Process 15(6):1893–1903
Article Google Scholar
Solomonoff A, Quillen C, Campbell WM (2004) Channel compensation for SVM speaker recognition. Odyssey, vol 4, pp 219–226
Google Scholar
Matrouf D, Scheffer N, Fauve BGB et al (2007) A straightforward and efficient implementation of the factor analysis model for speaker verification. INTERSPEECH, pp 1242–1245
Google Scholar
Kenny P, Boulianne G, Ouellet P et al (2007) Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans Audio Speech Lang Process 15(4):1435–1447
Article Google Scholar
Dehak N, Kenny PJ, Dehak R et al (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Hatch AO, Kajarekar SS, Stolcke A (2006) Within-class covariance normalization for SVM-based speaker recognition. INTERSPEECH
Google Scholar
McLaren M, Van Leeuwen D (2011) Source-normalised-and-weighted LDA for robust speaker recognition using i-vectors. Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on. IEEE, pp 5456–5459
Google Scholar
Ioffe S (2006) Probabilistic linear discriminant analysis. European conference on computer vision. Springer, Berlin, pp 531–542
Google Scholar
Prince SJD, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE, pp 1–8
Google Scholar
Kenny P (2010) Bayesian speaker verification with heavy-tailed priors. Odyssey, pp 14
Google Scholar
Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. INTERSPEECH, pp 249–252
Google Scholar
Burget L, Plchot O, Cumani S et al (2011) Discriminatively trained probabilistic linear discriminant analysis for speaker verification. Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on. IEEE, pp 4832–4835
Google Scholar
Cumani S, Brummer N, Burget L et al (2013) Pairwise discriminative speaker verification in the i-vector space. IEEE Trans Audio Speech Lang Process 21(6):1217–1227
Article Google Scholar
Hirano I, Lee KA, Zhang Z et al (2014) Single-sided approach to discriminative PLDA training for text-independent speaker verification without using expanded i-vector. Chinese spoken language processing (ISCSLP), 2014 9th international symposium on. IEEE, pp 59–63
Google Scholar
Wang J, Wang D, Zhu Z et al (2014) Discriminative scoring for speaker recognition based on i-vectors. Asia-pacific signal and information processing association, 2014 annual summit and conference (APSIPA). IEEE, pp 1–5
Google Scholar
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1–3):19–41
Article Google Scholar
Reynolds DA (1997) Comparison of background normalization methods for text-independent speaker verification. Eurospeech
Google Scholar
Auckenthaler R, Carey M, Lloyd-Thomas H (2000) Score normalization for text-independent speaker verification systems. Digit Signal Proc 10(1):42–54
Article Google Scholar
Bimbot F, Bonastre JF, Fredouille C et al (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Sig Process 2004:430–451
Article Google Scholar
Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for Tnorm in text-independent speaker verification. Acoustics, speech, and signal processing, 2005. Proceedings. (ICASSP’05). IEEE international conference on. IEEE, 1: I/741-I/744 vol 1
Google Scholar
Anguera X, Bozonnet S, Evans N et al (2012) Speaker diarization: a review of recent research. IEEE Trans Audio Speech Lang Process 20(2):356–370
Article Google Scholar
Martin AF, Przybocki MA (2001) Speaker recognition in a multi-speaker environment. INTERSPEECH, pp 787–790
Google Scholar
Lathoud G, McCowan IA (2003) Location based speaker segmentation. Multimedia and expo, 2003. ICME’03. Proceedings. 2003 international conference on. IEEE, vol 3, pp 3–621
Google Scholar
Pardo JM, Anguera X, Wooters C Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences. INTERSPEECH
Google Scholar
Friedland G, Vinyals O, Huang Y et al (2009) Prosodic and other long-term features for speaker diarization. IEEE Trans Audio Speech Lang Process 17(5):985–993
Article Google Scholar
Woubie A, Luque J, Hernando J (2015) Using voice-quality measurements with prosodic and spectral features for speaker diarization. INTERSPEECH, pp 3100–3104
Google Scholar
Woubie A, Luque J, Hernando J (2016) Short-and long-term speech features for hybrid HMM-i-Vector based speaker diarization system. Odyssey
Google Scholar
Castaldo F, Colibro D, Dalmasso E et al (2008) Stream-based speaker segmentation using speaker factors and eigenvoice. Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 4133–4136
Google Scholar
Desplanques B, Demuynck K, Martens JP (2015) Factor analysis for speaker segmentation and improved speaker diarization. INTERSPEECH. Abstracts and proceedings USB productions, pp 3081–3085
Google Scholar
Wang G, Zheng TF (2009) Speaker segmentation based on between-window correlation over speakers’ characteristics. Proceedings: APSIPA ASC, pp 817–820
Google Scholar
Chen K, Salman A (2011) Learning speaker-specific characteristics with a deep neural architecture. IEEE Trans Neural Networks 22(11):1744–1756
Article Google Scholar
Yella SH, Stolcke A (2015) A comparison of neural network feature transforms for speaker diarization. INTERSPEECH, pp 3026–3030
Google Scholar
Kotti M, Moschou V, Kotropoulos C (2008) Speaker segmentation and clustering. Sig Process 88(5):1091–1124
Article MATH Google Scholar
Meignier S, Bonastre JF, Igounet S (2001) E-HMM approach for learning and adapting sound models for speaker indexing. A speaker Odyssey-the speaker recognition workshop
Google Scholar
Meignier S, Bonastre JF, Fredouille C et al (2000) Evolutive HMM for multi-speaker tracking system. Acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings. 2000 IEEE international conference on. IEEE, vol 2, pp 1201–1204
Google Scholar
Ajmera J, Wooters C (2003) A robust speaker clustering algorithm. Automatic speech recognition and understanding, 2003. ASRU’03. 2003 IEEE Workshop on. IEEE, pp 411–416
Google Scholar
Wooters C, Huijbregts M (2008) The ICSI RT07s speaker diarization system. Multimodal technologies for perception of humans. Springer, Berlin, pp 509–519
Google Scholar
Evans N, Bozonnet S, Wang D et al (2012) A comparative study of bottom-up and top-down approaches to speaker diarization. IEEE Trans Audio Speech Lang Process 20(2):382–392
Article Google Scholar
Imseng D, Friedland G (2010) Tuning-robust initialization methods for speaker diarization. IEEE Trans Audio Speech Lang Process 18(8):2028–2037
Article Google Scholar
Fox EB, Sudderth EB, Jordan MI et al (2011) A sticky HDP-HMM with application to speaker diarization. The annals of applied statistics, pp 1020–1056
Google Scholar
Sell G, McCree A, Garcia-Romero D (2016) Priors for speaker counting and diarization with AHC. INTERSPEECH 2016, pp 2194–2198
Google Scholar
Chen S, Gopalakrishnan P (1998) Speaker, environment and channel change detection and clustering via the bayesian information criterion. Proc. DARPA broadcast news transcription and understanding workshop, vol 8, pp 127–132
Google Scholar
Gish H, Siu MH, Rohlicek R (1991) Segregation of speakers for speech recognition and speaker identification. Acoustics, speech, and signal processing, 1991. ICASSP-91, 1991 international conference on. IEEE, pp 873–876
Google Scholar
Siegler MA, Jain U, Raj B et al (1997) Automatic segmentation, classification and clustering of broadcast news audio. Proc. DARPA speech recognition workshop. 1997
Google Scholar
Fergani B, Davy M, Houacine A (2008) Speaker diarization using one-class support vector machines. Speech Commun 50(5):355–365
Article Google Scholar
Vijayasenan D, Valente F, Bourlard H (2007) Agglomerative information bottleneck for speaker diarization of meetings data. Automatic speech recognition and understanding, 2007. ASRU. IEEE workshop on. IEEE, pp 250–255
Google Scholar
Vijayasenan D, Valente F, Bourlard H (2008) Combination of agglomerative and sequential clustering for speaker diarization. Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 4361–4364
Google Scholar
Tawara N, Ogawa T, Kobayashi T (2015) A comparative study of spectral clustering for i-vector-based speaker clustering under noisy conditions. Acoustics, speech and signal processing (ICASSP), 2015 IEEE international conference on. IEEE, pp 2041–2045
Google Scholar
Milner R, Hain T (2016) DNN-based speaker clustering for speaker diarisation. Proceedings of the annual conference of the international speech communication association, INTERSPEECH. Sheffield, pp 2185–2189
Google Scholar
Prieto JJ, Vaquero C, García P (2016) Analysis of the impact of the audio database characteristics in the accuracy of a speaker clustering system. Odyssey, pp 393–399
Google Scholar
Moraru D, Meignier S, Fredouille C et al (2004) The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP’04). IEEE international conference on. IEEE, vol 1, pp 1–373
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology, Division of Technical Innovation and Development, Department of Computer Science and Technology, Center for Speech and Language Technologies, Research Institute of Information Technology, Tsinghua University, Beijing, 100084, China
Thomas Fang Zheng & Lantian Li

Authors

Thomas Fang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Lantian Li
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zheng, T.F., Li, L. (2017). Environment-Related Robustness Issues. In: Robustness-Related Issues in Speaker Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Singapore. https://doi.org/10.1007/978-981-10-3238-7_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-3238-7_2
Published: 07 April 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3237-0
Online ISBN: 978-981-10-3238-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics