Speaker Recognition Using Sparse Representation via Superimposed Features

  • Yashesh Gaur
  • Maulik C. Madhavi
  • Hemant A. Patil
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)


In this paper, we demonstrate the effectiveness of superimposed features for the purpose of template matching-based speaker recognition using sparse representations. The principle behind our hypothesis is, if the test template approximately lies in the linear span of the training templates of the genuine class, then so does any linear combination of test templates. In this paper, we introduce the notion of superimposed features for the first time. Using our initial trials on the TIMIT database, we have shown that superimposed features can result in reducing the complexity cost by 80 % with a very minor decrease in identification rate by 0.67 % and a minor increase in EER by 0.85 %.


Superimposed features sparse representations orthogonal matching pursuit template matching speaker recognition 


  1. 1.
    Campbell Jr., J.P.: Speaker recognition: a tutorial. Proc. of the IEEE 85(9), 1437–1462 (1997)CrossRefGoogle Scholar
  2. 2.
    Hazen, T., et al.: Multi-modal Face and Speaker Identification on a Handheld Device. In: Proc. Wkshp. Multimodal User Authentication, pp. 120–132 (2003)Google Scholar
  3. 3.
    Wright, J., et al.: Robust face recognition via sparse representation. IEEE Trans. on Pattern Analysis and Machine Intelligence 31(2), 210–227 (2009)CrossRefGoogle Scholar
  4. 4.
    Pillai, J.K., et al.: Secure and Robust Iris Recognition Using Random Projections and Sparse Representations. IEEE Trans. on Pattern Analysis and Machine Intelligence 33(9), 1877–1893 (2011)CrossRefGoogle Scholar
  5. 5.
    Yang, A.Y., et al.: Distributed recognition of human actions using wearable motion sensor networks. J. of Ambient Intelligence and Smart Environments 1(2), 103–115 (2009)Google Scholar
  6. 6.
    Naseem, I., Togneri, R., Bennamoun, M.: Sparse Representation for Speaker Identification. In: 20th Int. Conf. on Pattern Reco. (ICPR), pp. 4460–4463 (2010)Google Scholar
  7. 7.
    Boominathan, V., Sri Rama Murty, K.: Speaker recognition via sparse representations using orthogonal matching pursuit. In: Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 4381–4384 (2012)Google Scholar
  8. 8.
    Elad, M.: Sparse and Redundant Representations. Springer, New York (2009)Google Scholar
  9. 9.
    Zucker, S.W., Leclerc, Y.G., Mohammed, J.L.: Continuous Relaxation and Local Maxima Selection: Conditions for Equivalence. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI 3(2), 117–127 (1981)CrossRefzbMATHGoogle Scholar
  10. 10.
    Garofolo, J.S.: Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD (1988)Google Scholar
  11. 11.
    Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Process. 28(4), 357–366 (1980)CrossRefGoogle Scholar
  12. 12.
    Martin, A., et al.: The DET Curve in Assessment of Detection Task Performance. In: Proc. Eurospeech 1997, vol. 4, pp. 1899–1903 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yashesh Gaur
    • 1
  • Maulik C. Madhavi
    • 1
  • Hemant A. Patil
    • 1
  1. 1.Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)GandhinagarIndia

Personalised recommendations