Abstract
This paper presents a method based on information theory to estimate the distortion between the enrolled speaker’s model and the test utterance in speaker verification system. It uses the cross entropy (CE) to compute the distance between two parametric models (such as GMMs). Different from the traditional average log-likelihood method, it considers the symmetry between the test utterance and the referenced model. In the verification phase, the zt-norm is used to compensate the session variability. Experiment results based on the TIMIT database show that the proposed method can efficiently reduce error rates over the standard log-likelihood scoring.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Reynold, D.A., Quatieri, T.F., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digtal Signal Processing 10, 19–41 (2000)
Bruce, S.F.: Recent advances in speaker recognition. In: Proc. ICASSP, pp. 429–440 (1989)
Tsai, W.H., Chang, W.W., Chu, Y.C., Huang, C.S.: Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification. In: Proc. Eurospeech, pp. 771–774 (2001)
Schmidt, M., Gish, H., Mielke, A.: Covariance estimation methods for channel robust text-independent speaker identification. In: Proc. ICASSP, pp. 333–336 (1995)
Aronowitz, H., Burshtein, D., Amir, A.: Speaker indexing in audio archives using gaussian mixture scoring simulation. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 243–252. Springer, Heidelberg (2005)
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification system. Digital Signal Process 10, 42–54 (2000)
Vogt, R., Baker, B., Sridharan, S.: Modeling session vari- ability in text-independent speaker verification. In: Proc. Eurospeech, Lisbon, Portugal, September 2005, pp. 3117–3120 (2005)
Higgins, A., Bahler, L., Porter, J.: Speaker verification using randomized phrase prompting. Digital Signal Processing 1, 89–106 (1991)
Olsen, P., Dharanipragada, S.: An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models. In: Proc. Eurospeech, Geneva, Switzerland, September 1-4, vol. 4, pp. 2509–2512 (2003)
Aronowitz, H., Burshtein, D.: Efficient Speaker Recognition Using Approximated Cross Entropy (ACE). IEEE Transactions, Speech, and Langauge Processing 15, 2033–2043 (2007)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., et al.: TIMIT acoustic-phonetic continuous speech corpus DB/CD (2007), http://www.ldc.upenn.edu/Catalog/
The 2000 NIST Speaker recognition evaluation, http://www.nist.gov/speech/tests/spk/2000/index.htm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, X., Yin, J. (2009). A Text-Independent Speaker Verification System Based on Cross Entropy. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2009. Communications in Computer and Information Science, vol 51. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04962-0_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-04962-0_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04961-3
Online ISBN: 978-3-642-04962-0
eBook Packages: Computer ScienceComputer Science (R0)