Skip to main content

Environment-Related Robustness Issues

  • Chapter
  • First Online:
  • 498 Accesses

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSIGNAL))

Abstract

In practical applications, many environment-related factors may influence the performance of speaker recognition. There is often no prior knowledge of these factors in advance, which makes the environment-related robustness issue more difficulty. In this chapter, three environment-related factors, background noise, cross channel and multiple-speaker, are summarized and their corresponding robustness issues are discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Boll S (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process 27(2):113–120

    Article  Google Scholar 

  2. Wang N, Ching PC, Zheng N et al (2011) Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans Audio Speech Lang Process 19(1):196–205

    Article  Google Scholar 

  3. Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589

    Article  Google Scholar 

  4. Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272

    Article  Google Scholar 

  5. Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616

    Article  Google Scholar 

  6. Sadjadi SO, Hansen JHL (2010) Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions. INTERSPEECH, pp 2138–2141

    Google Scholar 

  7. Hanilçi C, Kinnunen T, Saeidi R et al (2012) Comparing spectrum estimators in speaker verification under additive noise degradation. Acoustics, speech and signal processing (ICASSP), 2012 IEEE international conference on. IEEE, pp 4769–4772

    Google Scholar 

  8. Lei Y, Burget L, Scheffer N (2013) A noise robust i-vector extractor using vector taylor series for speaker recognition. Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on. IEEE, pp 6788–6791

    Google Scholar 

  9. Lei Y, McLaren M, Ferrer L et al (2014) Simplified vts-based i-vector extraction in noise-robust speaker recognition. Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE, pp 4037–4041

    Google Scholar 

  10. Martinez D, Burget L, Stafylakis T et al (2014) Unscented transform for ivector-based noisy speaker recognition. Acoustics, speech and signal processing (ICASSP), 2014 IEEE international conference on. IEEE, pp 4042–4046

    Google Scholar 

  11. Gales MJF, Young SJ (1996) Robust continuous speech recognition using parallel model combination. IEEE Trans Speech Audio Process 4(5):352–359

    Article  Google Scholar 

  12. Bellot O, Matrouf D, Merlin T et al (2000) Additive and convolutional noises compensation for speaker recognition. INTERSPEECH, pp 799–802

    Google Scholar 

  13. Lei Y, Burget L, Ferrer L et al (2012) Towards noise-robust speaker recognition using probabilistic linear discriminant analysis. Acoustics, speech and signal processing (ICASSP), 2012 IEEE international conference on. IEEE, pp 4253–4256

    Google Scholar 

  14. Doddington GR, Przybocki MA, Martin AF et al (2000) The NIST speaker recognition evaluation–overview, methodology, systems, results, perspective. Speech Commun 31(2):225–254

    Article  Google Scholar 

  15. The NIST year 2012 speaker recognition evaluation plan. https://www.nist.gov/sites/default/files/documents/itl/iad/mig/NIST_SRE12_evalplan-v17-r1.pdf

  16. NIST 2016 speaker recognition evaluation plan. https://www.nist.gov/sites/default/files/documents/itl/iad/mig/SRE16_Eval_Plan_V1-0.pdf

  17. Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272

    Article  Google Scholar 

  18. Reynolds DA (2003) Channel robust speaker verification via feature mapping. Acoustics, speech, and signal processing, 2003. Proceedings. (ICASSP’03). 2003 IEEE international conference on. IEEE, vol 2, pp 2–53

    Google Scholar 

  19. Teunen R, Shahshahani B, Heck LP (2000) A model-based transformational approach to robust speaker recognition. INTERSPEECH, pp 495–498

    Google Scholar 

  20. Wu W, Zheng TF, Xu MX et al (2007) A cohort-based speaker model synthesis for mismatched channels in speaker verification. IEEE Trans Audio Speech Lang Process 15(6):1893–1903

    Article  Google Scholar 

  21. Solomonoff A, Quillen C, Campbell WM (2004) Channel compensation for SVM speaker recognition. Odyssey, vol 4, pp 219–226

    Google Scholar 

  22. Matrouf D, Scheffer N, Fauve BGB et al (2007) A straightforward and efficient implementation of the factor analysis model for speaker verification. INTERSPEECH, pp 1242–1245

    Google Scholar 

  23. Kenny P, Boulianne G, Ouellet P et al (2007) Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans Audio Speech Lang Process 15(4):1435–1447

    Article  Google Scholar 

  24. Dehak N, Kenny PJ, Dehak R et al (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798

    Article  Google Scholar 

  25. Hatch AO, Kajarekar SS, Stolcke A (2006) Within-class covariance normalization for SVM-based speaker recognition. INTERSPEECH

    Google Scholar 

  26. McLaren M, Van Leeuwen D (2011) Source-normalised-and-weighted LDA for robust speaker recognition using i-vectors. Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on. IEEE, pp 5456–5459

    Google Scholar 

  27. Ioffe S (2006) Probabilistic linear discriminant analysis. European conference on computer vision. Springer, Berlin, pp 531–542

    Google Scholar 

  28. Prince SJD, Elder JH (2007) Probabilistic linear discriminant analysis for inferences about identity. Computer vision, 2007. ICCV 2007. IEEE 11th international conference on. IEEE, pp 1–8

    Google Scholar 

  29. Kenny P (2010) Bayesian speaker verification with heavy-tailed priors. Odyssey, pp 14

    Google Scholar 

  30. Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. INTERSPEECH, pp 249–252

    Google Scholar 

  31. Burget L, Plchot O, Cumani S et al (2011) Discriminatively trained probabilistic linear discriminant analysis for speaker verification. Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on. IEEE, pp 4832–4835

    Google Scholar 

  32. Cumani S, Brummer N, Burget L et al (2013) Pairwise discriminative speaker verification in the i-vector space. IEEE Trans Audio Speech Lang Process 21(6):1217–1227

    Article  Google Scholar 

  33. Hirano I, Lee KA, Zhang Z et al (2014) Single-sided approach to discriminative PLDA training for text-independent speaker verification without using expanded i-vector. Chinese spoken language processing (ISCSLP), 2014 9th international symposium on. IEEE, pp 59–63

    Google Scholar 

  34. Wang J, Wang D, Zhu Z et al (2014) Discriminative scoring for speaker recognition based on i-vectors. Asia-pacific signal and information processing association, 2014 annual summit and conference (APSIPA). IEEE, pp 1–5

    Google Scholar 

  35. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1–3):19–41

    Article  Google Scholar 

  36. Reynolds DA (1997) Comparison of background normalization methods for text-independent speaker verification. Eurospeech

    Google Scholar 

  37. Auckenthaler R, Carey M, Lloyd-Thomas H (2000) Score normalization for text-independent speaker verification systems. Digit Signal Proc 10(1):42–54

    Article  Google Scholar 

  38. Bimbot F, Bonastre JF, Fredouille C et al (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Sig Process 2004:430–451

    Article  Google Scholar 

  39. Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for Tnorm in text-independent speaker verification. Acoustics, speech, and signal processing, 2005. Proceedings. (ICASSP’05). IEEE international conference on. IEEE, 1: I/741-I/744 vol 1

    Google Scholar 

  40. Anguera X, Bozonnet S, Evans N et al (2012) Speaker diarization: a review of recent research. IEEE Trans Audio Speech Lang Process 20(2):356–370

    Article  Google Scholar 

  41. Martin AF, Przybocki MA (2001) Speaker recognition in a multi-speaker environment. INTERSPEECH, pp 787–790

    Google Scholar 

  42. Lathoud G, McCowan IA (2003) Location based speaker segmentation. Multimedia and expo, 2003. ICME’03. Proceedings. 2003 international conference on. IEEE, vol 3, pp 3–621

    Google Scholar 

  43. Pardo JM, Anguera X, Wooters C Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences. INTERSPEECH

    Google Scholar 

  44. Friedland G, Vinyals O, Huang Y et al (2009) Prosodic and other long-term features for speaker diarization. IEEE Trans Audio Speech Lang Process 17(5):985–993

    Article  Google Scholar 

  45. Woubie A, Luque J, Hernando J (2015) Using voice-quality measurements with prosodic and spectral features for speaker diarization. INTERSPEECH, pp 3100–3104

    Google Scholar 

  46. Woubie A, Luque J, Hernando J (2016) Short-and long-term speech features for hybrid HMM-i-Vector based speaker diarization system. Odyssey

    Google Scholar 

  47. Castaldo F, Colibro D, Dalmasso E et al (2008) Stream-based speaker segmentation using speaker factors and eigenvoice. Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 4133–4136

    Google Scholar 

  48. Desplanques B, Demuynck K, Martens JP (2015) Factor analysis for speaker segmentation and improved speaker diarization. INTERSPEECH. Abstracts and proceedings USB productions, pp 3081–3085

    Google Scholar 

  49. Wang G, Zheng TF (2009) Speaker segmentation based on between-window correlation over speakers’ characteristics. Proceedings: APSIPA ASC, pp 817–820

    Google Scholar 

  50. Chen K, Salman A (2011) Learning speaker-specific characteristics with a deep neural architecture. IEEE Trans Neural Networks 22(11):1744–1756

    Article  Google Scholar 

  51. Yella SH, Stolcke A (2015) A comparison of neural network feature transforms for speaker diarization. INTERSPEECH, pp 3026–3030

    Google Scholar 

  52. Kotti M, Moschou V, Kotropoulos C (2008) Speaker segmentation and clustering. Sig Process 88(5):1091–1124

    Article  MATH  Google Scholar 

  53. Meignier S, Bonastre JF, Igounet S (2001) E-HMM approach for learning and adapting sound models for speaker indexing. A speaker Odyssey-the speaker recognition workshop

    Google Scholar 

  54. Meignier S, Bonastre JF, Fredouille C et al (2000) Evolutive HMM for multi-speaker tracking system. Acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings. 2000 IEEE international conference on. IEEE, vol 2, pp 1201–1204

    Google Scholar 

  55. Ajmera J, Wooters C (2003) A robust speaker clustering algorithm. Automatic speech recognition and understanding, 2003. ASRU’03. 2003 IEEE Workshop on. IEEE, pp 411–416

    Google Scholar 

  56. Wooters C, Huijbregts M (2008) The ICSI RT07s speaker diarization system. Multimodal technologies for perception of humans. Springer, Berlin, pp 509–519

    Google Scholar 

  57. Evans N, Bozonnet S, Wang D et al (2012) A comparative study of bottom-up and top-down approaches to speaker diarization. IEEE Trans Audio Speech Lang Process 20(2):382–392

    Article  Google Scholar 

  58. Imseng D, Friedland G (2010) Tuning-robust initialization methods for speaker diarization. IEEE Trans Audio Speech Lang Process 18(8):2028–2037

    Article  Google Scholar 

  59. Fox EB, Sudderth EB, Jordan MI et al (2011) A sticky HDP-HMM with application to speaker diarization. The annals of applied statistics, pp 1020–1056

    Google Scholar 

  60. Sell G, McCree A, Garcia-Romero D (2016) Priors for speaker counting and diarization with AHC. INTERSPEECH 2016, pp 2194–2198

    Google Scholar 

  61. Chen S, Gopalakrishnan P (1998) Speaker, environment and channel change detection and clustering via the bayesian information criterion. Proc. DARPA broadcast news transcription and understanding workshop, vol 8, pp 127–132

    Google Scholar 

  62. Gish H, Siu MH, Rohlicek R (1991) Segregation of speakers for speech recognition and speaker identification. Acoustics, speech, and signal processing, 1991. ICASSP-91, 1991 international conference on. IEEE, pp 873–876

    Google Scholar 

  63. Siegler MA, Jain U, Raj B et al (1997) Automatic segmentation, classification and clustering of broadcast news audio. Proc. DARPA speech recognition workshop. 1997

    Google Scholar 

  64. Fergani B, Davy M, Houacine A (2008) Speaker diarization using one-class support vector machines. Speech Commun 50(5):355–365

    Article  Google Scholar 

  65. Vijayasenan D, Valente F, Bourlard H (2007) Agglomerative information bottleneck for speaker diarization of meetings data. Automatic speech recognition and understanding, 2007. ASRU. IEEE workshop on. IEEE, pp 250–255

    Google Scholar 

  66. Vijayasenan D, Valente F, Bourlard H (2008) Combination of agglomerative and sequential clustering for speaker diarization. Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 4361–4364

    Google Scholar 

  67. Tawara N, Ogawa T, Kobayashi T (2015) A comparative study of spectral clustering for i-vector-based speaker clustering under noisy conditions. Acoustics, speech and signal processing (ICASSP), 2015 IEEE international conference on. IEEE, pp 2041–2045

    Google Scholar 

  68. Milner R, Hain T (2016) DNN-based speaker clustering for speaker diarisation. Proceedings of the annual conference of the international speech communication association, INTERSPEECH. Sheffield, pp 2185–2189

    Google Scholar 

  69. Prieto JJ, Vaquero C, García P (2016) Analysis of the impact of the audio database characteristics in the accuracy of a speaker clustering system. Odyssey, pp 393–399

    Google Scholar 

  70. Moraru D, Meignier S, Fredouille C et al (2004) The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation. Acoustics, speech, and signal processing, 2004. Proceedings. (ICASSP’04). IEEE international conference on. IEEE, vol 1, pp 1–373

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Cite this chapter

Zheng, T.F., Li, L. (2017). Environment-Related Robustness Issues. In: Robustness-Related Issues in Speaker Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Singapore. https://doi.org/10.1007/978-981-10-3238-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3238-7_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3237-0

  • Online ISBN: 978-981-10-3238-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics