Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation

Aronowitz, Hagai; Burshtein, David; Amir, Amihood

doi:10.1007/978-3-540-30568-2_21

Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation

Hagai Aronowitz¹⁸,
David Burshtein¹⁹ &
Amihood Amir^18,20

Conference paper

920 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Abstract

Speaker indexing has recently emerged as an important task due to the rapidly growing volume of audio archives. Current filtration techniques still suffer from problems both in accuracy and efficiency. In this paper an efficient method to simulate GMM scoring is presented. Simulation is done by fitting a GMM not only to every target speaker but also to every test utterance, and then computing the likelihood of the test call using these GMMs instead of using the original data. GMM simulation is used to achieve very efficient speaker indexing in terms of both search time and index size. Results on the SPIDRE and NIST-2004 speaker evaluation corpuses show that our approach maintains and sometimes exceeds the accuracy of the conventional GMM algorithm and achieves efficient indexing capabilities: 6000 times faster than a conventional GMM with 1% overhead in storage.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech, pp. 963–966 (1997)
Google Scholar
McLaughlin, J., Reynolds, D.A., Gleason, T.: A study of computation speed-ups of the GMM-UBM speaker recognition system. In: Proc. Eurospeech, pp. 1215–1218 (1999)
Google Scholar
Schmidt, M., Gish, H., Mielke, A.: Covariance estimation methods for channel robust text-independent speaker identification. In: Proc. ICASSP, pp. 333–336 (1995)
Google Scholar
Tsai, W.H., Chang, W.W., Chu, Y.C., Huang, C.S.: Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification. In: Proc. Eurospeech, pp. 771–774 (2001)
Google Scholar
Foote, J.: An overview of audio information retrieval. ACM Multimedia Systems 7, 2–10 (1999)
Article Google Scholar
Chagolleau, I.M., Vallès, N.P.: Audio indexing: What has been accomplished and the road ahead. In: JCIS, pp. 911–914 (2002)
Google Scholar
Sturim, D.E., Reynolds, D.A., Singer, E., Campbell, J.P.: Speaker indexing in large audio databases using anchor models. In: Proc. ICASSP, pp. 429–432 (2001)
Google Scholar
Linguistic Data Consortium, SPIDRE documentation file, http://www.ldc.upenn.edu/Catalog/readme_files/spidre.readme.html
Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithms, ETSI Standard: ETSI-ES-201-108-v1.1.2 (2000), http://www.etsi.org/stq
The NIST Year 2004, Speaker Recognition Evaluation Plan, http://www.nist.gov/speech/tests/spk/2004/SRE-04_evalplan-v1a.pdf
Aronowitz, H., Burshtein, D., Amir, A.: Speaker indexing in audio archives using test utterance Gaussian mixture modeling. In: Proc. ICSLP (2004) (to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Bar-Ilan University, Israel
Hagai Aronowitz & Amihood Amir
School of Electrical Engineering, Tel-Aviv University, Israel
David Burshtein
College of Computing, Georgia Tech, USA
Amihood Amir

Authors

Hagai Aronowitz
View author publications
You can also search for this author in PubMed Google Scholar
David Burshtein
View author publications
You can also search for this author in PubMed Google Scholar
Amihood Amir
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
IDIAP Research Institute, CH-1920, Martigny, Switzerland
Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aronowitz, H., Burshtein, D., Amir, A. (2005). Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-30568-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24509-4
Online ISBN: 978-3-540-30568-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics