Abstract
Speaker indexing has recently emerged as an important task due to the rapidly growing volume of audio archives. Current filtration techniques still suffer from problems both in accuracy and efficiency. In this paper an efficient method to simulate GMM scoring is presented. Simulation is done by fitting a GMM not only to every target speaker but also to every test utterance, and then computing the likelihood of the test call using these GMMs instead of using the original data. GMM simulation is used to achieve very efficient speaker indexing in terms of both search time and index size. Results on the SPIDRE and NIST-2004 speaker evaluation corpuses show that our approach maintains and sometimes exceeds the accuracy of the conventional GMM algorithm and achieves efficient indexing capabilities: 6000 times faster than a conventional GMM with 1% overhead in storage.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech, pp. 963–966 (1997)
McLaughlin, J., Reynolds, D.A., Gleason, T.: A study of computation speed-ups of the GMM-UBM speaker recognition system. In: Proc. Eurospeech, pp. 1215–1218 (1999)
Schmidt, M., Gish, H., Mielke, A.: Covariance estimation methods for channel robust text-independent speaker identification. In: Proc. ICASSP, pp. 333–336 (1995)
Tsai, W.H., Chang, W.W., Chu, Y.C., Huang, C.S.: Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification. In: Proc. Eurospeech, pp. 771–774 (2001)
Foote, J.: An overview of audio information retrieval. ACM Multimedia Systems 7, 2–10 (1999)
Chagolleau, I.M., Vallès, N.P.: Audio indexing: What has been accomplished and the road ahead. In: JCIS, pp. 911–914 (2002)
Sturim, D.E., Reynolds, D.A., Singer, E., Campbell, J.P.: Speaker indexing in large audio databases using anchor models. In: Proc. ICASSP, pp. 429–432 (2001)
Linguistic Data Consortium, SPIDRE documentation file, http://www.ldc.upenn.edu/Catalog/readme_files/spidre.readme.html
Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithms, ETSI Standard: ETSI-ES-201-108-v1.1.2 (2000), http://www.etsi.org/stq
The NIST Year 2004, Speaker Recognition Evaluation Plan, http://www.nist.gov/speech/tests/spk/2004/SRE-04_evalplan-v1a.pdf
Aronowitz, H., Burshtein, D., Amir, A.: Speaker indexing in audio archives using test utterance Gaussian mixture modeling. In: Proc. ICSLP (2004) (to appear)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aronowitz, H., Burshtein, D., Amir, A. (2005). Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-30568-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)