Skip to main content
Log in

Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This work explores the use of phoneme level information in cohort selection to improve the performance of a speaker verification system. In speaker verification, cohort is used in score normalization to get a better performance. Score normalization is a technique to reduce the undesirable variation arising from acoustically mismatched conditions. Proper selection of cohort significantly improves speaker verification performance. In this paper, we investigate cohort selection based on a speaker model cluster under the i-vector framework that we call the i-vector model cluster (IMC). Two approaches for cohort selection are proposed. First approach utilizes speaker specific properties and called speaker specific cohort selection (SSCS). In this approach, speaker level information is used for cohort selection. The second approach is phoneme specific cohort selection (PSCS). This method improves cohort set selection by using phoneme level information. Phoneme level information is further employed in a late fusion approach that uses a majority voting method on normalized scores to improve the performance of the speaker verification system. Speaker verification experiments were conducted using the TIMIT, HINDI and YOHO databases. An equal error rate improvement of 19.01%, 14.61% and 19.4%is obtained for the proposed method compared to the standard ZT-Norm method for TIMIT, HINDI and YOHO datasets. Reasonable improvements in performance are also obtained in terms of minimum decision cost function (min DCF) and detection error trade-off (DET) curves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Apsingekar V, DeLeon P (2009) Speaker model clustering for efficient speaker identification in large population applications. IEEE Trans Acoust Speech Signal Process 17(4):848–853

    Google Scholar 

  2. Apsingekar V, DeLeon P (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118

    Article  Google Scholar 

  3. Auckenthaler R, Carey M, Lloyd-Thomas H (2000) Score normalization for text-independent speaker verification systems. Digital Signal Process 10(1–3):42–54

    Article  Google Scholar 

  4. Bimbot F, Bonastre J-F, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-García J, Petrovska-Delacrétaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Proc 2004:430–451

    Google Scholar 

  5. Campbell J Jr (1997) Speaker recognition: A tutorial. Proc IEEE 85(9):1437–1462

    Article  Google Scholar 

  6. Campbell JP (1995) Testing with the yoho cd-rom voice verification corpus 1995 international conference on acoustics, speech, and signal processing, 1995. ICASSP-95, vol 1. IEEE, pp 341–344

  7. Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using gmm supervectors for speaker verification. Signal Proc Lett IEEE 13(5):308–311

    Article  Google Scholar 

  8. Das RK, Jelil S, Prasanna SM (2016) Significance of constraining text in limited data text-independent speaker verification 2016 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5

  9. (2001) Database for indian languages, Speech and vision lab, IIT Madras, Chennai

  10. Dehak N, Dehak R, Glass J, Reynolds D, Kenny P (2010) Cosine similarity scoring without score norMalization techniques Proceedings Odyssey speaker and language recognition workshop

    Google Scholar 

  11. Eatock S, Mason J (1994) A quantitative assesment of the relative speaker discriminating properties of phonemes Proceedings of the ICASSP 1994, pp 133–136

    Google Scholar 

  12. Fienberg SE (1970) An iterative procedure for estimation in contingency tables. Annals of Mathematical Statistics 41(3):907–917

    Article  MathSciNet  MATH  Google Scholar 

  13. Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press Professional

  14. Garofolo JS (1993) Timit acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia

    Book  Google Scholar 

  15. Hatch AO, Kajarekar SS, Stolcke A (2006) Within-class covariance normalization for svm-based speaker recognition INTERSPEECH, pp 1471–1474

    Google Scholar 

  16. Hosom J-P, Vermeulen PJ, Shaw J (2016) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination, uS Patent 9,230,550

  17. Hultzen I, Jr JA, Miron M (1964) Tables of transitional frequencies of english phonemes. University of Illinois Press, Urbana, Il

  18. Jirouek R, Peuil S (1995) On the effective implementation of the iterative proportional fitting procedure. Comput Stat Data Anal 19(2):177–189

    Article  Google Scholar 

  19. Kenny P (2005) Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM Montreal (Report) CRIM 06:8–13

    Google Scholar 

  20. Kenny P, Stafylakis T, Alam J, Kockmann M (2015) An i-vector backend for speaker verification Proceedings interspeech, pp 2307–2310

    Google Scholar 

  21. Kinnunen T, Hautamäki V, Fränti P (2004) Fusion of spectral feature sets for accurate speaker identification 9th conference speech and computer

    Google Scholar 

  22. Kinnunen T, Kärkkäinen I, Fränti P Report series a, the mystery of cohort selection

  23. Kucera H, Francis W N (1967) Computational analysis of present day american english. Brown University Press

  24. Larcher A, Bousquet P, Lee K.A, Matrouf D, Li H, Bonastre J-F (2012) I-vectors in the context of phonetically-constrained short utterances for speaker verification 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4773–4776

  25. Lei Y, Scheffer N, Ferrer L, McLaren M (2014) A novel scheme for speaker recognition using a phonetically-aware deep neural network 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1695–1699

  26. Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance Proceedings eurospeech, vol 97, pp 1895–1898

  27. Matějka P, Glembek O, Castaldo F, Alam MJ, Plchot O, Kenny P, Burget L, Černocky J (2011) Full-covariance ubm and heavy-tailed plda in i-vector speaker verification 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4828–4831

  28. Nagineni S, Hegde R (2010) On line client-wise cohort set selection for speaker verification using iterative normalization of confusion matrices Proceedings eursipco, pp 576–580

  29. Najim D, Patrick K, Réda D, Pierre D, Pierre O (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798

    Article  Google Scholar 

  30. Ramos-Castro D, Fierrez-Aguilar J, Gonzalez-Rodriguez J, Ortega-Garcia J (2007) Speaker verification using speaker-and test-dependent fast score normalization. Pattern Recogn Lett 28(1):90–98

    Article  Google Scholar 

  31. Reynolds DA (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Comm 17(1–2):91–108

    Article  Google Scholar 

  32. Reynolds DA (1997) Comparison of background normalization methods for text-independent speaker verification Eurospeech

    Google Scholar 

  33. Reynolds DA, Campbell WM (2008) Text-independent speaker recognition Springer handbook of speech processing. Springer, pp 763–782

  34. Rosenberg AE (1976) Automatic speaker verification: A review. Proc IEEE 64 (4):475–487

    Article  Google Scholar 

  35. Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for tnorm in text-independent speaker verification ICASSP, pp 741–744

    Google Scholar 

  36. Vincent E, Watanabe S, Nugraha AA, Barker J, Marxer R An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech & Language

  37. Young S J, Young S (1993) The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge Department of Engineering

  38. Zeinali H, Sameti H, Burget L, Černockỳ J, Maghsoodi N, Matějka P (2016) i-vector/hmm based text-dependent speaker verification system for reddots challenge. Interspeech 2016:440–444

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Waquar Ahmad.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmad, W., Karnick, H. & Hegde, R.M. Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification. Multimed Tools Appl 77, 8273–8294 (2018). https://doi.org/10.1007/s11042-017-4723-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-4723-9

Keywords

Navigation