Advertisement

Efficient speaker identification using spectral entropy

  • Fernando Luque-SuárezEmail author
  • Antonio Camarena-Ibarrola
  • Edgar Chávez
Article

Abstract

In voice recognition, the two main problems are speech recognition (what was said), and speaker recognition (who was speaking). The usual method for speaker recognition is to postulate a model where the speaker identity corresponds to the parameters of the model, which estimation could be time-consuming when the number of candidate speakers is large. In this paper, we model the speaker as a high dimensional point cloud of entropy-based features, extracted from the speech signal. The method allows indexing, and hence it can manage large databases. We experimentally assessed the quality of the identification with a publicly available database formed by extracting audio from a collection of YouTube videos of 1,000 different speakers. With 20 second audio excerpts, we were able to identify a speaker with 97% accuracy when the recording environment is not controlled, and with 99% accuracy for controlled recording environments.

Keywords

Speaker recognition Speaker identification Entropygrams 

Notes

References

  1. 1.
    Beltrán J, Chávez E, Favela J (2015) Scalable identification of mixed environmental sounds, recorded from heterogeneous sources. Pattern Recogn Lett 68:153–160CrossRefGoogle Scholar
  2. 2.
    Bernhardsson E Annoy: approximate nearest neighbors in C++/Python optimized for memory usage and loading/saving to disk. https://github.com/spotify/annoy
  3. 3.
    Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: IEEE transactions on acoustics, speech, and signal processing, vol 28, pp 357–366Google Scholar
  4. 4.
    Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. In: IEEE transactions on audio, speech and language processing, vol 19. pp 788–798Google Scholar
  5. 5.
    Greenberg C, Bansé D (2014) The NIST 2014 speaker recognition i-vector machine learning challenge. In: Proc the speaker and language recognition workshop, pp 224–230Google Scholar
  6. 6.
    Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Proc Mag 32(6):74–99CrossRefGoogle Scholar
  7. 7.
    Kenny P (2005) Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal,(Report) CRIM-06/08-13, pp 1–17Google Scholar
  8. 8.
    Kenny P, Mihoubi M, Dumouchel P (2003) New MAP estimators for speaker recognition. Interspeech, pp 1–4Google Scholar
  9. 9.
    Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40CrossRefGoogle Scholar
  10. 10.
    Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104CrossRefGoogle Scholar
  11. 11.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  12. 12.
    Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Processing: A Review Journal 10(1):19–41CrossRefGoogle Scholar
  13. 13.
    Schmidt L (2014) Large scale speaker identification. In: 2014 IEEE international conference on acoustic, speech and signal processing (ICASSP), pp 1669–1673Google Scholar
  14. 14.
    Shannon CE (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1):3MathSciNetCrossRefGoogle Scholar
  15. 15.
    Snyder D, Garcia-romero D, Povey D (2015) Time delay deep neural network-based universal background models for speaker recognition. In: 2015 IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE, pp 92–97Google Scholar
  16. 16.
    Uhlmann JK (1991) Satisfying general proximity / similarity queries with metric trees. Inf Process Lett 40:175–179CrossRefGoogle Scholar
  17. 17.
    Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. Annual ACM-SIAM Symposium on Discrete Algorithms, pp 311–321Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.CICESEEnsenadaMexico
  2. 2.Universidad MichoacanaMoreliaMexico

Personalised recommendations