Skip to main content

Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3361))

Abstract

Speaker indexing has recently emerged as an important task due to the rapidly growing volume of audio archives. Current filtration techniques still suffer from problems both in accuracy and efficiency. In this paper an efficient method to simulate GMM scoring is presented. Simulation is done by fitting a GMM not only to every target speaker but also to every test utterance, and then computing the likelihood of the test call using these GMMs instead of using the original data. GMM simulation is used to achieve very efficient speaker indexing in terms of both search time and index size. Results on the SPIDRE and NIST-2004 speaker evaluation corpuses show that our approach maintains and sometimes exceeds the accuracy of the conventional GMM algorithm and achieves efficient indexing capabilities: 6000 times faster than a conventional GMM with 1% overhead in storage.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech, pp. 963–966 (1997)

    Google Scholar 

  2. McLaughlin, J., Reynolds, D.A., Gleason, T.: A study of computation speed-ups of the GMM-UBM speaker recognition system. In: Proc. Eurospeech, pp. 1215–1218 (1999)

    Google Scholar 

  3. Schmidt, M., Gish, H., Mielke, A.: Covariance estimation methods for channel robust text-independent speaker identification. In: Proc. ICASSP, pp. 333–336 (1995)

    Google Scholar 

  4. Tsai, W.H., Chang, W.W., Chu, Y.C., Huang, C.S.: Explicit exploitation of stochastic characteristics of test utterance for text-independent speaker identification. In: Proc. Eurospeech, pp. 771–774 (2001)

    Google Scholar 

  5. Foote, J.: An overview of audio information retrieval. ACM Multimedia Systems 7, 2–10 (1999)

    Article  Google Scholar 

  6. Chagolleau, I.M., Vallès, N.P.: Audio indexing: What has been accomplished and the road ahead. In: JCIS, pp. 911–914 (2002)

    Google Scholar 

  7. Sturim, D.E., Reynolds, D.A., Singer, E., Campbell, J.P.: Speaker indexing in large audio databases using anchor models. In: Proc. ICASSP, pp. 429–432 (2001)

    Google Scholar 

  8. Linguistic Data Consortium, SPIDRE documentation file, http://www.ldc.upenn.edu/Catalog/readme_files/spidre.readme.html

  9. Speech processing, transmission and quality aspects (stq); distributed speech recognition; front-end feature extraction algorithm; compression algorithms, ETSI Standard: ETSI-ES-201-108-v1.1.2 (2000), http://www.etsi.org/stq

  10. The NIST Year 2004, Speaker Recognition Evaluation Plan, http://www.nist.gov/speech/tests/spk/2004/SRE-04_evalplan-v1a.pdf

  11. Aronowitz, H., Burshtein, D., Amir, A.: Speaker indexing in audio archives using test utterance Gaussian mixture modeling. In: Proc. ICSLP (2004) (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aronowitz, H., Burshtein, D., Amir, A. (2005). Speaker Indexing in Audio Archives Using Gaussian Mixture Scoring Simulation. In: Bengio, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2004. Lecture Notes in Computer Science, vol 3361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30568-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30568-2_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24509-4

  • Online ISBN: 978-3-540-30568-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics