Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification

Ahmad, Waquar; Karnick, Harish; Hegde, Rajesh M.

doi:10.1007/s11042-017-4723-9

Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification

Published: 24 May 2017

Volume 77, pages 8273–8294, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Waquar Ahmad¹,
Harish Karnick² &
Rajesh M. Hegde³

161 Accesses
2 Citations
Explore all metrics

Abstract

This work explores the use of phoneme level information in cohort selection to improve the performance of a speaker verification system. In speaker verification, cohort is used in score normalization to get a better performance. Score normalization is a technique to reduce the undesirable variation arising from acoustically mismatched conditions. Proper selection of cohort significantly improves speaker verification performance. In this paper, we investigate cohort selection based on a speaker model cluster under the i-vector framework that we call the i-vector model cluster (IMC). Two approaches for cohort selection are proposed. First approach utilizes speaker specific properties and called speaker specific cohort selection (SSCS). In this approach, speaker level information is used for cohort selection. The second approach is phoneme specific cohort selection (PSCS). This method improves cohort set selection by using phoneme level information. Phoneme level information is further employed in a late fusion approach that uses a majority voting method on normalized scores to improve the performance of the speaker verification system. Speaker verification experiments were conducted using the TIMIT, HINDI and YOHO databases. An equal error rate improvement of 19.01%, 14.61% and 19.4%is obtained for the proposed method compared to the standard ZT-Norm method for TIMIT, HINDI and YOHO datasets. Reasonable improvements in performance are also obtained in terms of minimum decision cost function (min DCF) and detection error trade-off (DET) curves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Score-Level Solution to Speaker Verification Using UBM Pooling and Adaptive Cohort Selection

Sub-vector based biometric speaker verification using MLLR super-vector

Article 27 November 2015

A study on the roles of total variability space and session variability modeling in speaker recognition

Article 07 December 2015

References

Apsingekar V, DeLeon P (2009) Speaker model clustering for efficient speaker identification in large population applications. IEEE Trans Acoust Speech Signal Process 17(4):848–853
Google Scholar
Apsingekar V, DeLeon P (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118
Article Google Scholar
Auckenthaler R, Carey M, Lloyd-Thomas H (2000) Score normalization for text-independent speaker verification systems. Digital Signal Process 10(1–3):42–54
Article Google Scholar
Bimbot F, Bonastre J-F, Fredouille C, Gravier G, Magrin-Chagnolleau I, Meignier S, Merlin T, Ortega-García J, Petrovska-Delacrétaz D, Reynolds DA (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Proc 2004:430–451
Google Scholar
Campbell J Jr (1997) Speaker recognition: A tutorial. Proc IEEE 85(9):1437–1462
Article Google Scholar
Campbell JP (1995) Testing with the yoho cd-rom voice verification corpus 1995 international conference on acoustics, speech, and signal processing, 1995. ICASSP-95, vol 1. IEEE, pp 341–344
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machines using gmm supervectors for speaker verification. Signal Proc Lett IEEE 13(5):308–311
Article Google Scholar
Das RK, Jelil S, Prasanna SM (2016) Significance of constraining text in limited data text-independent speaker verification 2016 international conference on signal processing and communications (SPCOM). IEEE, pp 1–5
(2001) Database for indian languages, Speech and vision lab, IIT Madras, Chennai
Dehak N, Dehak R, Glass J, Reynolds D, Kenny P (2010) Cosine similarity scoring without score norMalization techniques Proceedings Odyssey speaker and language recognition workshop
Google Scholar
Eatock S, Mason J (1994) A quantitative assesment of the relative speaker discriminating properties of phonemes Proceedings of the ICASSP 1994, pp 133–136
Google Scholar
Fienberg SE (1970) An iterative procedure for estimation in contingency tables. Annals of Mathematical Statistics 41(3):907–917
Article MathSciNet MATH Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press Professional
Garofolo JS (1993) Timit acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia
Book Google Scholar
Hatch AO, Kajarekar SS, Stolcke A (2006) Within-class covariance normalization for svm-based speaker recognition INTERSPEECH, pp 1471–1474
Google Scholar
Hosom J-P, Vermeulen PJ, Shaw J (2016) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination, uS Patent 9,230,550
Hultzen I, Jr JA, Miron M (1964) Tables of transitional frequencies of english phonemes. University of Illinois Press, Urbana, Il
Jirouek R, Peuil S (1995) On the effective implementation of the iterative proportional fitting procedure. Comput Stat Data Anal 19(2):177–189
Article Google Scholar
Kenny P (2005) Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM Montreal (Report) CRIM 06:8–13
Google Scholar
Kenny P, Stafylakis T, Alam J, Kockmann M (2015) An i-vector backend for speaker verification Proceedings interspeech, pp 2307–2310
Google Scholar
Kinnunen T, Hautamäki V, Fränti P (2004) Fusion of spectral feature sets for accurate speaker identification 9th conference speech and computer
Google Scholar
Kinnunen T, Kärkkäinen I, Fränti P Report series a, the mystery of cohort selection
Kucera H, Francis W N (1967) Computational analysis of present day american english. Brown University Press
Larcher A, Bousquet P, Lee K.A, Matrouf D, Li H, Bonastre J-F (2012) I-vectors in the context of phonetically-constrained short utterances for speaker verification 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4773–4776
Lei Y, Scheffer N, Ferrer L, McLaren M (2014) A novel scheme for speaker recognition using a phonetically-aware deep neural network 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1695–1699
Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance Proceedings eurospeech, vol 97, pp 1895–1898
Matějka P, Glembek O, Castaldo F, Alam MJ, Plchot O, Kenny P, Burget L, Černocky J (2011) Full-covariance ubm and heavy-tailed plda in i-vector speaker verification 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4828–4831
Nagineni S, Hegde R (2010) On line client-wise cohort set selection for speaker verification using iterative normalization of confusion matrices Proceedings eursipco, pp 576–580
Najim D, Patrick K, Réda D, Pierre D, Pierre O (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Ramos-Castro D, Fierrez-Aguilar J, Gonzalez-Rodriguez J, Ortega-Garcia J (2007) Speaker verification using speaker-and test-dependent fast score normalization. Pattern Recogn Lett 28(1):90–98
Article Google Scholar
Reynolds DA (1995) Speaker identification and verification using gaussian mixture speaker models. Speech Comm 17(1–2):91–108
Article Google Scholar
Reynolds DA (1997) Comparison of background normalization methods for text-independent speaker verification Eurospeech
Google Scholar
Reynolds DA, Campbell WM (2008) Text-independent speaker recognition Springer handbook of speech processing. Springer, pp 763–782
Rosenberg AE (1976) Automatic speaker verification: A review. Proc IEEE 64 (4):475–487
Article Google Scholar
Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for tnorm in text-independent speaker verification ICASSP, pp 741–744
Google Scholar
Vincent E, Watanabe S, Nugraha AA, Barker J, Marxer R An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech & Language
Young S J, Young S (1993) The HTK hidden Markov model toolkit: Design and philosophy. University of Cambridge Department of Engineering
Zeinali H, Sameti H, Burget L, Černockỳ J, Maghsoodi N, Matějka P (2016) i-vector/hmm based text-dependent speaker verification system for reddots challenge. Interspeech 2016:440–444
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of ECE, NIT Sikkim, Ravangla, Sikkim, 737139, India
Waquar Ahmad
Department of Computer Science and Engineering, IIT Kanpur, Kanpur, Uttar Pradesh, 208016, India
Harish Karnick
Department of Electrical Engineering, IIT Kanpur, Kanpur, Uttar Pradesh, 208016, India
Rajesh M. Hegde

Authors

Waquar Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Harish Karnick
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh M. Hegde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Waquar Ahmad.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahmad, W., Karnick, H. & Hegde, R.M. Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification. Multimed Tools Appl 77, 8273–8294 (2018). https://doi.org/10.1007/s11042-017-4723-9

Download citation

Received: 16 August 2016
Revised: 06 April 2017
Accepted: 17 April 2017
Published: 24 May 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11042-017-4723-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification

Abstract

Access this article

Similar content being viewed by others

A Score-Level Solution to Speaker Verification Using UBM Pooling and Adaptive Cohort Selection

Sub-vector based biometric speaker verification using MLLR super-vector

A study on the roles of total variability space and session variability modeling in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Client-wise cohort set selection by combining speaker- and phoneme-specific I-vectors for speaker verification

Abstract

Access this article

Similar content being viewed by others

A Score-Level Solution to Speaker Verification Using UBM Pooling and Adaptive Cohort Selection

Sub-vector based biometric speaker verification using MLLR super-vector

A study on the roles of total variability space and session variability modeling in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation