Application to Speaker Recognition

Part of the SpringerBriefs in Electrical and Computer Engineering book series


Speaker recognition refers to a task of recognizing people by their voices. In speaker recognition, one is interested in extracting and characterizing the speaker-specific information embedded in speech signal. In a larger context, speaker recognition belongs to the field of biometrics, which refers to authenticating persons based on their physical and/or learned characteristics. There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or innocence.


Speech Signal Speaker Recognition Clean Speech Speaker Identification Noisy Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Prabhakar S, Pankanti S, Jain A (2003) Biometric recognition: security and privacy concerns. IEEE Secur Priv Mag 1:32–34CrossRefGoogle Scholar
  2. 2.
    Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE trans Circuits Syst Video Technol 14(1):4–20CrossRefGoogle Scholar
  3. 3.
    Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre JF, Mastrouf D (2009) Forensic speaker recognition: a need for caution. IEEE Signal Process Mag 26(2):95–103CrossRefGoogle Scholar
  4. 4.
    Wu JD, Lin BF (2009) Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Syst Appl 36:3136–3143MathSciNetCrossRefGoogle Scholar
  5. 5.
    Hayakawa S, Itakura F (1994) Text-dependent speaker recognition using the information in the higher frequency band. In: Proceedings of the IEEE international conference on acoustic speech and signal processing (ICASSP’94), Adelaide, pp 137–140Google Scholar
  6. 6.
    Mishra H, Ikbal S, Yegnanarayana B (2003) Speaker specific mapping for text-independent speaker recognition. Speech Commun 39:301–310CrossRefGoogle Scholar
  7. 7.
    Rabiner LR, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, IndiaGoogle Scholar
  8. 8.
    Patil HA, Basu TK (2004) Teager energy mel cepstrum for identification of twins in Marathi. In: IEEE India annual conference INDICON, vol 64, pp 58–61Google Scholar
  9. 9.
    Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Speech Audio Process 28(5):599–601CrossRefGoogle Scholar
  10. 10.
    Gish H, Schmidt M (1994) Text independent speaker identification. IEEE Signal Process Mag 11(4):18–32CrossRefGoogle Scholar
  11. 11.
    Huggins M, Grieco J (2002) Confidence metrics for speaker identification. In: Proceedings of the international conference on spoken language processing (ICSLP’02), Denver, CO, pp 1381–1384Google Scholar
  12. 12.
    Luck JE (1969) Automatic speaker verification using cepstral measurements. J Acoust Soc Am 46(2):1026–1032CrossRefGoogle Scholar
  13. 13.
    Pruzansky S (1963) Pattern matching procedure for automatic talker recognition. J Acoust Soc Am 35(3):354–358CrossRefGoogle Scholar
  14. 14.
    Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55:1304–1312CrossRefGoogle Scholar
  15. 15.
    Sambur MR (1975) Selection of acoustic features for speaker identification. IEEE Trans Acoust Speech Signal Process 23(2):176–182CrossRefGoogle Scholar
  16. 16.
    Rosenberg AE, Sambur MR (1975) New techniques for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 23(2):169–176CrossRefGoogle Scholar
  17. 17.
    Sambur MR (1976) Speaker recognition using orthogonal linear prediction. IEEE Trans Acoust Speech Signal Process 24(4):283–289CrossRefGoogle Scholar
  18. 18.
    Furui S (1986) Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34:52–59CrossRefGoogle Scholar
  19. 19.
    Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272CrossRefGoogle Scholar
  20. 20.
    Plumpe MD, Quatieri TF, Reynolds DA (1999) Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans Speech Audio Process 7(5):569–585CrossRefGoogle Scholar
  21. 21.
    Burton D (1987) Text-dependent speaker verification using vector quantization source coding. IEEE Trans Acoust Speech Signal Process 35(2):133–143CrossRefGoogle Scholar
  22. 22.
    He J, Liu L, Palm G (1999) A discriminative training algorithm for VQ-based speaker identification. IEEE Trans Acoust Speech Signal Process 7(3):353–356Google Scholar
  23. 23.
    Kinnunen T, Karpov E, Franti P (2006) Real-time speaker identification and verification. IEEE Trans Audio Speech Lang Process 14(1):277–288Google Scholar
  24. 24.
    Soong F, Rosenberg A (1988) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoust Speech Signal Process 36(6):871–879zbMATHCrossRefGoogle Scholar
  25. 25.
    Linde Y, Buzo A, Gray M (1980) An algorithm for vector quantization. IEEE Trans Commun 28(1):84–95Google Scholar
  26. 26.
    Soong F, Rosenberg A, Rabiner L, Juang B (1985) A vector quantization approach to speaker recognition. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Tampa, FL, pp 387–390Google Scholar
  27. 27.
    Kinnunen T, Saastamoinen J, Hautamaki V, Vini M, Franti P (2009) Comparative evaluation of maximum a posteriori vector quantization and Gaussian mixture models in speaker verification. Pattern Recognit Lett 30(4):341–347CrossRefGoogle Scholar
  28. 28.
    Bannani G, Gallinari P (1995) Neural networks for discrimination and modelization of speakers. Speech Commun 17:159–175CrossRefGoogle Scholar
  29. 29.
    Yegnanarayana B (1999) Artificial neural networks. Prentice-Hall, IndiaGoogle Scholar
  30. 30.
    Lipmann RP (1989) An introduction to computing with neural nets. IEEE Trans Acoust Speech Signal Process 4:4–22Google Scholar
  31. 31.
    Prasanna SRM, Gupta CS, Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear perdiction residual of speech. Speech Commun 48:1243–1261CrossRefGoogle Scholar
  32. 32.
    Yegnanarayana B, Prasanna SRM, Zachariach JM, Gupta SC (2005) Combining evidences from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582CrossRefGoogle Scholar
  33. 33.
    Murthy KSR, Yegnanarayana B (2006) Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Lett 13(1):52–56CrossRefGoogle Scholar
  34. 34.
    Yegnanarayana B, Reddy KS, Kishore SP (2001) Source and system features for speaker recognition using AANN models. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Salt Lake city, Utah, pp 409–412Google Scholar
  35. 35.
    Reynolds DA, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83CrossRefGoogle Scholar
  36. 36.
    Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17:91–108CrossRefGoogle Scholar
  37. 37.
    Reynolds DA, Quateri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture speaker models. Digit Signal Process 10:19–41CrossRefGoogle Scholar
  38. 38.
    Rosenberg AE, Parthasarathy S (1996) Speaker recognition models for conected digit password speaker verification. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP’96), Atlanta, GA, pp 81–84Google Scholar
  39. 39.
    Matsui T, Furui S (1994) Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs. IEEE Trans Speech Audio Process 2(3):456–459CrossRefGoogle Scholar
  40. 40.
    Kimball O, Schmidt M, Gish H, Waterman J (1997) Speaker verification with limited enrollment data. In: Proceedings of the European conference on speech communication and technology (EUROSPEECH’97), Rhodes, pp 967–970Google Scholar
  41. 41.
    Deshpande MS, Holambe RS (2008) Text-independent speaker identification using hidden markov model. In: Proceedings of first IEEE international conference on emerging trends in engineering and technology (ICETET’08), Nagpur, pp 641–644Google Scholar
  42. 42.
    Wan V, Renals S (2002) Evaluation of kernel methods for speaker verification and identification. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 669–672Google Scholar
  43. 43.
    Wan V, Renals S (2005) Speaker verification using sequence discriminant support vector machines. IEEE Trans Speech Audio Process 12:203–210CrossRefGoogle Scholar
  44. 44.
    Campbell W, Campbell J, Reynolds D, Singer E, Torres-Carrasquillo P (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2):210–229CrossRefGoogle Scholar
  45. 45.
    Campbell W, Sturim D, Reynolds D (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311CrossRefGoogle Scholar
  46. 46.
    Quatieri TF (2004) Discrete-time speech signal processing principles and practice. Pearson Education, Upper Saddle RiverGoogle Scholar
  47. 47.
    Rabiner LR, Shafer RW (1989) Digital signal processing of speech signals. Prentice-Hall, Englewood CliffsGoogle Scholar
  48. 48.
    Harris F (1978) On the use of windows for harmonic analysis with the discrete Fourier transform. Proc IEEE 66(1):51–84CrossRefGoogle Scholar
  49. 49.
    Hansen J, Proakis J (2000) Discrete-time processing of speech signals, 2nd edn. IEEE Press, New YorkGoogle Scholar
  50. 50.
    Proakis J, Manolakis D (1992) Digital signal prosessing: principles, algorithms and applications, 2nd edn. Macmillan Publishing Company, New YorkGoogle Scholar
  51. 51.
    Oppenheim A, Schafer R (1975) Digital signal processing. Prentice Hall, Englewood CliffszbMATHGoogle Scholar
  52. 52.
    Lu X, Dang J (2007) Physiological feature extraction for text independent speaker identification using non-uniform subband processing. In: Proceedings of the IEEE international conference on acoustic speech and signal processing (ICASSP’07), Adelaide, pp IV-461–464Google Scholar
  53. 53.
    Lu X, Dang J (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322CrossRefGoogle Scholar
  54. 54.
    Kvedalen E (2003) Signal processing using the Teager Energy Operator and other nonlinear operators. Candies Scientific Thesis, University of Oslo, NorwayGoogle Scholar
  55. 55.
    Jankowski CR (1996) Signal processing using the Teager energy operator and other nonlinear operators. Ph.D. thesis, MIT, USAGoogle Scholar
  56. 56.
    Jankowski CR, Quatieri TF, Reynolds DA (1995) Measuring fine structure in speech: application to speaker identification. In: Proceedings of the IEEE international conference acoustics, speech, and, signal processing, pp 325–328Google Scholar
  57. 57.
    Jabloun F, Cetin AE, Erzin E (1999) Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process Lett 6(10):159–261CrossRefGoogle Scholar
  58. 58.
  59. 59.
    Potamianos A, Maragos P (1996) Speech formant frequency and bandwidth tracking using multiband energy demodulation. J Acoust Soc Am 99(6):3795–3806CrossRefGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  1. 1.Department of InstrumentationSGGS Institute of Engineering and TechnologyVishnupuri, NandedIndia
  2. 2.Department of E & TC EngineeringSRES College of EngineeringKopargaonIndia

Personalised recommendations