An information set-based robust text-independent speaker authentication

  • Jeevan MedikondaEmail author
  • Saurabh Bhardwaj
  • Hanmandlu Madasu
Methodologies and Application


This paper presents a method for the extraction of twofold information set (TFIS) features for the text-independent speaker recognition. The method takes the Mel frequency cepstral coefficients from the frames of a sample speech signal and forms a matrix. From this, both spatial and temporal information components are derived based on the information set concept using the entropy framework. The TFIS features comprising their combination of two components are less in number thus reducing the computational time, complexity and improving the performance under the noisy environment. The proposed approach is tested on three datasets namely NIST-2003, VoxForge 2014 speech corpus and VCTK speech corpus in terms of speed, computational complexity, memory requirement and accuracy. Its performance is validated under different noisy environments at different signal-to-noise ratios.


Text-independent speaker recognition Information set theory Twofold information set features 



This is a part of the ongoing project on “Personal Authentication using Multimodal Behavioral Biometrics: Voice and Gait” and the authors express their gratitude to the Department of Science and Technology, Government of India (Grant No. SB/S3/EECE/0127/2013) for funding the project.

Compliance with ethical standards

Conflict of interest

The authors’ declare that they have no conflict of interest.

Human and animals rights

This article does not contain any studies with direct human participants or animals performed by any of the authors.


  1. Aggarwal M, Hanmandlu M (2015) Representing uncertainty with information sets. IEEE Trans Fuzzy Syst 24(1):1–15CrossRefGoogle Scholar
  2. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27CrossRefGoogle Scholar
  3. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366CrossRefGoogle Scholar
  4. Ephraim Y, Malah D (1984) Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 32(6):1109–1121CrossRefGoogle Scholar
  5. Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272CrossRefGoogle Scholar
  6. Hanmandlu M, Das A (2011) Content-based image retrieval by information theoretic measure. Def Sci J 61(5):415–430CrossRefGoogle Scholar
  7. Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589CrossRefGoogle Scholar
  8. Jawarkar NP, Holambe RS, Basu TK (2011) Use of fuzzy min–max neural network for speaker identification. In: 2011 international conference on recent trends in information technology (ICRTIT)Google Scholar
  9. Jayanna HS, Prasanna SRM (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Proc 3(3):189–204CrossRefGoogle Scholar
  10. Jeevan M, Madasu H, Panigrahi BK (2016) Information set based gait authentication system. Neurocomputing 207:1–14CrossRefGoogle Scholar
  11. Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans Audio Speech Lang Process 15(4):1435–1447CrossRefGoogle Scholar
  12. Kinnunen T, Hautamäki V, Fränti P (2006) On the use of long-term average spectrum in automatic speaker recognition. In: 5th international symposium on chinese spoken language processing (ISCSLP’06). Singapore, pp 559–567Google Scholar
  13. Kumar K, Kim C, Stern RM (2011) Delta-spectral cepstral coefficients for robust speech recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP)Google Scholar
  14. Lee KY (2004) Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recogn Lett 25(16):1811–1817CrossRefGoogle Scholar
  15. Longworth C, Gales MJF (2009) Combining derivative and parametric kernels for speaker verification. IEEE Trans Audio Speech Lang Process 17(4):748–757CrossRefGoogle Scholar
  16. Madasu H (2011) Information sets and information processing. Def Sci J 61(5):405–407CrossRefGoogle Scholar
  17. Mak MW, Pang X, Chien JT (2016) Mixture of PLDA for noise robust i-vector speaker verification. IEEE/ACM Trans Audio Speech Lang Process 24(1):130–142CrossRefGoogle Scholar
  18. Mamta B, Madasu H (2014a) A new entropy function and a classifier for thermal face recognition. Eng Appl Artif Intell 36:269–286CrossRefGoogle Scholar
  19. Mamta B, Madasu H (2014b) Robust authentication using the unconstrained infrared face images. Expert Syst Appl 41(14):6494–6511CrossRefGoogle Scholar
  20. Mandasari MI, Mitchell ML, van Leeuwen DA (2011) Evaluation of i-vector speaker recognition systems for forensic application. In: INTERSPEECHGoogle Scholar
  21. Markel J, Oshika B, Gray A (1977) Long-term feature averaging for speaker recognition. IEEE Trans Acoust Speech Signal Process 25(4):330–337CrossRefGoogle Scholar
  22. [Online] (2003) The NIST year 2003 speaker recognition evaluation plan.
  23. [Online] (2009) The Centre for Speech Technology Research VCTK CorpusGoogle Scholar
  24. Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification. A speaker odyssey—the speaker recognition workshop. Crete, Greece, International Speech Communication Association (ISCA), pp 213–218Google Scholar
  25. Pinheiro HNB, Vieira SRF, Ren TI, Cavalcanti GDC, de Mattos NPSG (2016). Type-2 fuzzy GMM for text-independent speaker verification under unseen noise conditions. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP)Google Scholar
  26. Pujol P, Macho D, Nadeu C (2006). On real-time mean-and-variance normalization of speech recognition features. In: 2006 IEEE international conference on acoustics speech and signal processing proceedingsGoogle Scholar
  27. Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17(1–2):91–108CrossRefGoogle Scholar
  28. Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83CrossRefGoogle Scholar
  29. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10(1–3):19–41CrossRefGoogle Scholar
  30. Lung S-Y (2004a) Adaptive fuzzy wavelet algorithm for text-independent speaker recognition. Pattern Recogn 37(10):2095–2096CrossRefGoogle Scholar
  31. Lung S-Y (2004b) Further reduced form of wavelet feature for text independent speaker recognition. Pattern Recogn 37(7):1565–1566CrossRefzbMATHGoogle Scholar
  32. Sohn J, Kim NS, Sung W (1999) A statistical model-based voice activity detection. IEEE Signal Process Lett 6(1):1–3CrossRefGoogle Scholar
  33. Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Trans Circuits Syst Mag 11(2):23–61CrossRefGoogle Scholar
  34. Wan V, Renals S (2005) Speaker verification using sequence discriminant support vector machines. IEEE Trans Speech Audio Process 13(2):203–210CrossRefGoogle Scholar
  35. Wang Y, Liu X, Xing Y, Li M (2008) A novel reduction method for text-independent speaker identification. In: 2008 fourth international conference on natural computationGoogle Scholar
  36. Zhao X, Wang DL (2013). Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP)Google Scholar
  37. Mirhassani SM, Ting H-N (2014) Fuzzy-based discriminative feature representation for children’s speech recognition. Digital Signal Process 31:102–114CrossRefGoogle Scholar
  38. Yuan ZX, Yu CZ, Fang Y (1993) Text independent speaker identification using fuzzy mathematical algorithm. In: IEEE international conference on acoustics, speech, and signal processing, ICASSPGoogle Scholar
  39. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353CrossRefzbMATHGoogle Scholar
  40. Zhao X, Shao Y, Wang DL (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Manipal Academy of Higher EducationManipalIndia
  2. 2.Thapar Institute of Engineering and TechnologyPatialaIndia
  3. 3.Indian Institute of TechnologyNew DelhiIndia

Personalised recommendations