Skip to main content

Application to Speaker Recognition

  • Chapter
  • First Online:
Advances in Non-Linear Modeling for Speech Processing

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

  • 633 Accesses

Abstract

Speaker recognition refers to a task of recognizing people by their voices. In speaker recognition, one is interested in extracting and characterizing the speaker-specific information embedded in speech signal. In a larger context, speaker recognition belongs to the field of biometrics, which refers to authenticating persons based on their physical and/or learned characteristics. There has long been a desire to be able to identify a person on the basis of his or her voice. For many years, judges, lawyers, detectives and law enforcement agencies have wanted to use forensic voice authentication to investigate a suspect or to confirm a judgment of guilt or innocence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Prabhakar S, Pankanti S, Jain A (2003) Biometric recognition: security and privacy concerns. IEEE Secur Priv Mag 1:32–34

    Article  Google Scholar 

  2. Jain AK, Ross A, Prabhakar S (2004) An introduction to biometric recognition. IEEE trans Circuits Syst Video Technol 14(1):4–20

    Article  Google Scholar 

  3. Campbell JP, Shen W, Campbell WM, Schwartz R, Bonastre JF, Mastrouf D (2009) Forensic speaker recognition: a need for caution. IEEE Signal Process Mag 26(2):95–103

    Article  Google Scholar 

  4. Wu JD, Lin BF (2009) Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Syst Appl 36:3136–3143

    Article  MathSciNet  Google Scholar 

  5. Hayakawa S, Itakura F (1994) Text-dependent speaker recognition using the information in the higher frequency band. In: Proceedings of the IEEE international conference on acoustic speech and signal processing (ICASSP’94), Adelaide, pp 137–140

    Google Scholar 

  6. Mishra H, Ikbal S, Yegnanarayana B (2003) Speaker specific mapping for text-independent speaker recognition. Speech Commun 39:301–310

    Article  Google Scholar 

  7. Rabiner LR, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, India

    Google Scholar 

  8. Patil HA, Basu TK (2004) Teager energy mel cepstrum for identification of twins in Marathi. In: IEEE India annual conference INDICON, vol 64, pp 58–61

    Google Scholar 

  9. Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Speech Audio Process 28(5):599–601

    Article  Google Scholar 

  10. Gish H, Schmidt M (1994) Text independent speaker identification. IEEE Signal Process Mag 11(4):18–32

    Article  Google Scholar 

  11. Huggins M, Grieco J (2002) Confidence metrics for speaker identification. In: Proceedings of the international conference on spoken language processing (ICSLP’02), Denver, CO, pp 1381–1384

    Google Scholar 

  12. Luck JE (1969) Automatic speaker verification using cepstral measurements. J Acoust Soc Am 46(2):1026–1032

    Article  Google Scholar 

  13. Pruzansky S (1963) Pattern matching procedure for automatic talker recognition. J Acoust Soc Am 35(3):354–358

    Article  Google Scholar 

  14. Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55:1304–1312

    Article  Google Scholar 

  15. Sambur MR (1975) Selection of acoustic features for speaker identification. IEEE Trans Acoust Speech Signal Process 23(2):176–182

    Article  Google Scholar 

  16. Rosenberg AE, Sambur MR (1975) New techniques for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 23(2):169–176

    Article  Google Scholar 

  17. Sambur MR (1976) Speaker recognition using orthogonal linear prediction. IEEE Trans Acoust Speech Signal Process 24(4):283–289

    Article  Google Scholar 

  18. Furui S (1986) Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34:52–59

    Article  Google Scholar 

  19. Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoust Speech Signal Process 29(2):254–272

    Article  Google Scholar 

  20. Plumpe MD, Quatieri TF, Reynolds DA (1999) Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans Speech Audio Process 7(5):569–585

    Article  Google Scholar 

  21. Burton D (1987) Text-dependent speaker verification using vector quantization source coding. IEEE Trans Acoust Speech Signal Process 35(2):133–143

    Article  Google Scholar 

  22. He J, Liu L, Palm G (1999) A discriminative training algorithm for VQ-based speaker identification. IEEE Trans Acoust Speech Signal Process 7(3):353–356

    Google Scholar 

  23. Kinnunen T, Karpov E, Franti P (2006) Real-time speaker identification and verification. IEEE Trans Audio Speech Lang Process 14(1):277–288

    Google Scholar 

  24. Soong F, Rosenberg A (1988) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoust Speech Signal Process 36(6):871–879

    Article  MATH  Google Scholar 

  25. Linde Y, Buzo A, Gray M (1980) An algorithm for vector quantization. IEEE Trans Commun 28(1):84–95

    Google Scholar 

  26. Soong F, Rosenberg A, Rabiner L, Juang B (1985) A vector quantization approach to speaker recognition. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Tampa, FL, pp 387–390

    Google Scholar 

  27. Kinnunen T, Saastamoinen J, Hautamaki V, Vini M, Franti P (2009) Comparative evaluation of maximum a posteriori vector quantization and Gaussian mixture models in speaker verification. Pattern Recognit Lett 30(4):341–347

    Article  Google Scholar 

  28. Bannani G, Gallinari P (1995) Neural networks for discrimination and modelization of speakers. Speech Commun 17:159–175

    Article  Google Scholar 

  29. Yegnanarayana B (1999) Artificial neural networks. Prentice-Hall, India

    Google Scholar 

  30. Lipmann RP (1989) An introduction to computing with neural nets. IEEE Trans Acoust Speech Signal Process 4:4–22

    Google Scholar 

  31. Prasanna SRM, Gupta CS, Yegnanarayana B (2006) Extraction of speaker-specific excitation information from linear perdiction residual of speech. Speech Commun 48:1243–1261

    Article  Google Scholar 

  32. Yegnanarayana B, Prasanna SRM, Zachariach JM, Gupta SC (2005) Combining evidences from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582

    Article  Google Scholar 

  33. Murthy KSR, Yegnanarayana B (2006) Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Lett 13(1):52–56

    Article  Google Scholar 

  34. Yegnanarayana B, Reddy KS, Kishore SP (2001) Source and system features for speaker recognition using AANN models. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Salt Lake city, Utah, pp 409–412

    Google Scholar 

  35. Reynolds DA, Rose R (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83

    Article  Google Scholar 

  36. Reynolds DA (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Commun 17:91–108

    Article  Google Scholar 

  37. Reynolds DA, Quateri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture speaker models. Digit Signal Process 10:19–41

    Article  Google Scholar 

  38. Rosenberg AE, Parthasarathy S (1996) Speaker recognition models for conected digit password speaker verification. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP’96), Atlanta, GA, pp 81–84

    Google Scholar 

  39. Matsui T, Furui S (1994) Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs. IEEE Trans Speech Audio Process 2(3):456–459

    Article  Google Scholar 

  40. Kimball O, Schmidt M, Gish H, Waterman J (1997) Speaker verification with limited enrollment data. In: Proceedings of the European conference on speech communication and technology (EUROSPEECH’97), Rhodes, pp 967–970

    Google Scholar 

  41. Deshpande MS, Holambe RS (2008) Text-independent speaker identification using hidden markov model. In: Proceedings of first IEEE international conference on emerging trends in engineering and technology (ICETET’08), Nagpur, pp 641–644

    Google Scholar 

  42. Wan V, Renals S (2002) Evaluation of kernel methods for speaker verification and identification. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 669–672

    Google Scholar 

  43. Wan V, Renals S (2005) Speaker verification using sequence discriminant support vector machines. IEEE Trans Speech Audio Process 12:203–210

    Article  Google Scholar 

  44. Campbell W, Campbell J, Reynolds D, Singer E, Torres-Carrasquillo P (2006) Support vector machines for speaker and language recognition. Comput Speech Lang 20(2):210–229

    Article  Google Scholar 

  45. Campbell W, Sturim D, Reynolds D (2006) Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311

    Article  Google Scholar 

  46. Quatieri TF (2004) Discrete-time speech signal processing principles and practice. Pearson Education, Upper Saddle River

    Google Scholar 

  47. Rabiner LR, Shafer RW (1989) Digital signal processing of speech signals. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  48. Harris F (1978) On the use of windows for harmonic analysis with the discrete Fourier transform. Proc IEEE 66(1):51–84

    Article  Google Scholar 

  49. Hansen J, Proakis J (2000) Discrete-time processing of speech signals, 2nd edn. IEEE Press, New York

    Google Scholar 

  50. Proakis J, Manolakis D (1992) Digital signal prosessing: principles, algorithms and applications, 2nd edn. Macmillan Publishing Company, New York

    Google Scholar 

  51. Oppenheim A, Schafer R (1975) Digital signal processing. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  52. Lu X, Dang J (2007) Physiological feature extraction for text independent speaker identification using non-uniform subband processing. In: Proceedings of the IEEE international conference on acoustic speech and signal processing (ICASSP’07), Adelaide, pp IV-461–464

    Google Scholar 

  53. Lu X, Dang J (2008) An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech Commun 50:312–322

    Article  Google Scholar 

  54. Kvedalen E (2003) Signal processing using the Teager Energy Operator and other nonlinear operators. Candies Scientific Thesis, University of Oslo, Norway

    Google Scholar 

  55. Jankowski CR (1996) Signal processing using the Teager energy operator and other nonlinear operators. Ph.D. thesis, MIT, USA

    Google Scholar 

  56. Jankowski CR, Quatieri TF, Reynolds DA (1995) Measuring fine structure in speech: application to speaker identification. In: Proceedings of the IEEE international conference acoustics, speech, and, signal processing, pp 325–328

    Google Scholar 

  57. Jabloun F, Cetin AE, Erzin E (1999) Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process Lett 6(10):159–261

    Article  Google Scholar 

  58. Noisex-92. http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html

  59. Potamianos A, Maragos P (1996) Speech formant frequency and bandwidth tracking using multiband energy demodulation. J Acoust Soc Am 99(6):3795–3806

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 The Author(s)

About this chapter

Cite this chapter

Holambe, R.S., Deshpande, M.S. (2012). Application to Speaker Recognition. In: Advances in Non-Linear Modeling for Speech Processing. SpringerBriefs in Electrical and Computer Engineering(). Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-1505-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1505-3_6

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4614-1504-6

  • Online ISBN: 978-1-4614-1505-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics