Minimising Speaker Verification Utterance Length through Confidence Based Early Verification Decisions

  • Robbie Vogt
  • Sridha Sridharan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5558)


This paper presents a novel approach of estimating the confidence interval of speaker verification scores. This approach is utilised to minimise the utterance lengths required in order to produce a confident verification decision. The confidence estimation method is also extended to address both the problem of high correlation in consecutive frame scores, and robustness with very limited training samples. The proposed technique achieves a drastic reduction in the typical data requirements for producing confident decisions in an automatic speaker verification system. When evaluated on the NIST 2005 SRE, the early verification decision method demonstrates that an average of 5–10 seconds of speech is sufficient to produce verification rates approaching those achieved previously using an average in excess of 100 seconds of speech.


  1. 1.
    Gonzalez-Rodriguez, J., Drygajlo, A., Ramos-Castro, D., Garcia-Gomar, M., Ortega-Garcia, J.: Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech & Language 20(2-3), 331–355 (2006)Google Scholar
  2. 2.
    Campbell, W.M., Brady, K.J., Campbell, J.P., Granville, R., Reynolds, D.A.: Understanding scores in forensic speaker recognition. In: Odyssey: The Speaker and Language Recognition Workshop (2006)Google Scholar
  3. 3.
    Brümmer, N., du Preez, J.: Application-independent evaluation of speaker detection. Computer Speech & Language 20(2-3), 230–275 (2006)Google Scholar
  4. 4.
    Vogt, R., Sridharan, S., Mason, M.: Making confident speaker verification decisions with minimal speech. In: Interspeech, pp. 1405–1408 (2008)Google Scholar
  5. 5.
    Vogt, R., Sridharan, S.: Explicit modelling of session variability for speaker verification. Computer Speech & Language 22(1), 17–38 (2008)Google Scholar
  6. 6.
    Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10(1/2/3), 42–54 (2000)Google Scholar
  7. 7.
    Martin, A., Miller, D., Przybocki, M., Campbell, J., Nakasone, H.: Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004. In: International Conference on Language Resources and Evaluation, pp. 587–590 (2004)Google Scholar
  8. 8.
    Martin, A., Przybocki, M.: The NIST 1999 speaker recognition evaluation—an overview. Digital Signal Processing 10(1-3), 1–18 (2000)Google Scholar
  9. 9.
    Gauvain, J.L., Lee, C.H.: Bayesian adaptive learning and MAP estimation of HMM. In: Lee, C.H., Soong, F., Paliwal, K. (eds.) Automatic Speech and Speaker Recognition: Advanced Topics, pp. 83–107. Kluwer Academic, Boston (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Robbie Vogt
    • 1
  • Sridha Sridharan
    • 1
  1. 1.Speech and Audio Research LaboratoryQueensland University of TechnologyBrisbaneAustralia

Personalised recommendations