Skip to main content

Abstract

An introduction to automatic speaker recognition is presented in this chapter. The identifying characteristics of a personʼs voice that make it possible to automatically identify a speaker are discussed. Subtasks such as speaker identification, verification, and detection are described. An overview of the techniques used to build speaker models as well as issues related to system performance are presented. Finally, a few selected applications of speaker recognition are introduced to demonstrate the wide range of applications of speaker recognition technologies. Details of text-dependent and text-independent speaker recognition and their applications are covered in the following two chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:

automatic speech recognition

BIC:

Bayesian information criterion

CIS:

caller identification system

CMS:

cepstral mean subtraction

DCF:

detection cost function

DET:

detection error tradeoff

DFT:

discrete Fourier transform

EER:

equal error rate

EM:

expectation maximization

FFT:

fast Fourier transform

GLR:

generalized likelihood ratio

GMM:

Gaussian mixture model

HMM:

hidden Markov models

LLR:

(log) likelihood ratio

LPC:

linear predictive coding

MCE:

minimum classification error

MFCC:

mel-filter cepstral coefficient

ML:

maximum-likelihood

ROC:

receiver operating characteristic

SVM:

support vector machines

VQ:

vector quantization

References

  1. J.S. Dunn, F. Podio: Biometrics Consortium website, http://www.biometrics.org (2007)

  2. M.A. Przybocki, A.F. Martin: The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking, Eurospeech 1999 Proceedings (1999) pp. 2215-2218, http://www.nist.gov/speech/publications/index.htm

    Google Scholar 

  3. M.A. Przybocki, A.F. Martin: Nist speaker recognition evaluation chronicles, Odyssey Workshop 2004 Proc. (2004) pp. 15-22

    Google Scholar 

  4. H. Gish, M.-H. Siu, R. Rohlicek: Segregation of speakers for speech recognition and speaker identification, Proc. ICASSP (1991) pp. 873-876

    Google Scholar 

  5. L. Wilcox, F. Chen, D. Kimber, V. Balasubramanian: Segmentation of speech using speaker identification, Proc. ICASSP (1994) pp. 161-164

    Google Scholar 

  6. J.-L. Gauvain, L. Lamel, G. Adda: Partitioning and transcription of broadcast news data, Proc. of ICSLP (1998) pp. 1335-1338

    Google Scholar 

  7. S.E. Johnson: Who spoke when? - automatic segmentation and clustering for determining speaker turns, Proc. Eurospeech (1999) pp. 2211-2214

    Google Scholar 

  8. P. Delacourt, C.J. Wellekens: Distbic: A speaker-based segmentation for audio data indexing, Speech Commun. 32, 111-126 (2000)

    Article  Google Scholar 

  9. R.B. Dunn, D.A. Reynolds, T.F. Quatieri: Approaches to speaker detection and tracking in conversational speech, Digital Signal Process. 10, 93-112 (2000)

    Article  Google Scholar 

  10. S.E. Tranter, D.A. Reynolds: An overview of automatic speaker diarization systems, IEEE Trans. Speech Audio Process. 14, 1557-1565 (2006)

    Article  Google Scholar 

  11. L.H. Jamieson: Course notes for speech processing by computer, http://cobweb.ecn.purdue.edu ee649/notes/ (2007) Chap. 1

    Google Scholar 

  12. L.R. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs 1993)

    MATH  Google Scholar 

  13. S. Davis, P. Mermelstein: Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process. 28, 357-366 (1980)

    Article  Google Scholar 

  14. X. Huang, A. Acero, H.-W. Hon: Spoken Language Processing: A Guide to Theory, Algorithm and System Development (Prentice-Hall, Englewood Cliffs 2001)

    Google Scholar 

  15. J. Pelecanos, S. Sridharan: Feature warping for robust speaker verification, Proc. ISCA Workshop on Speaker Recognition - 2001: A Speaker Odyssey (2001)

    Google Scholar 

  16. B. Xiang, U. Chaudhari, J. Navratil, G. Ramaswamy, R. Gopinath: Short-time Gaussianization for robust speaker verification, Proc. ICASSP, Vol. 1 (2002) pp. 681-684

    Google Scholar 

  17. S. Furui: Comparison of speaker recognition methods using static features and dynamic features, IEEE Trans. Acoust. Speech Signal Process. 29, 342-350 (1981)

    Article  Google Scholar 

  18. J.P. Campbell, D.A. Reynolds, R.B. Dunn: Fusing high- and log-level features for speaker recognition, Proc. Eurospeech, Vol. 1 (2003)

    Google Scholar 

  19. W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)

    Book  Google Scholar 

  20. G. Doddington: Speaker recognition based on idiolectal differences between speakers, Proc. Eurospeech (2001) pp. 2521-2524

    Google Scholar 

  21. W.D. Andrews, M.A. Kohler, J.P. Campbell, J.J. Godfrey: Phonetic, idiolectal, and acoustic speaker recognition, Proceedings of Odyssey Workshop (2001)

    Google Scholar 

  22. A. Hatch, B. Peskin, A. Stolcke: Improved phonetic speaker recognition using lattice decoding, Proc. ICASSP, Vol. 1 (2005)

    Google Scholar 

  23. D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, B. Xiang: The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition, Proc. ICASSP (2003) pp. 784-787

    Google Scholar 

  24. A.E. Rosenberg: Automatic speaker verification: A review, Proc. IEEE 64, 475-487 (1976)

    Article  Google Scholar 

  25. K. Fukunaga: Introduction to Statistical Pattern Recognition, 2nd edn. (Elsevier, New York 1990)

    MATH  Google Scholar 

  26. A.L. Higgins, L.G. Bahler, J.E. Porter: Voice identification using nearest-neighbor distance measure, Proc. ICASSP (1993) pp. 375-378

    Google Scholar 

  27. Y. Linde, A. Buzo, R.M. Gray: An algorithm for vector quantization, IEEE Trans. Commun. 28, 94-95 (1980)

    Article  Google Scholar 

  28. F.K. Soong, A.E. Rosenberg, L.R. Rabiner, B.H. Juang: A vector quantization approach to speaker recognition, Proc. IEEE ICASSP (1985) pp. 387-390

    Google Scholar 

  29. D.A. Reynolds, R.C. Rose: Robust text independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3, 72-83 (1995)

    Article  Google Scholar 

  30. D.A. Reynolds, T.F. Quatieri, R.B. Dunn: Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10, 19-41 (2000)

    Article  Google Scholar 

  31. A.E. Rosenberg, S. Parthasarathy: Speaker background models for connected digit password speaker verification, Proc. ICASSP (1996) pp. 81-84

    Google Scholar 

  32. S. Parthasarathy, A.E. Rosenberg: General phrase speaker verification using sub-word background models and likelihood-ratio scoring, Proc. Int. Conf. Spoken Language Processing (1996) pp. 2403-2406

    Google Scholar 

  33. O. Siohan, A.E. Rosenberg, S. Parthasarathy: Speaker identification using minimum classification error training, Proc. ICASSP (1998) pp. 109-112

    Google Scholar 

  34. A.E. Rosenberg, O. Siohan, S. Parthasarathy: Small group speaker identification with common password phrases, Speech Commun. 31, 131-140 (2000)

    Article  Google Scholar 

  35. L. Heck, Y. Konig: Discriminative training of minimum cost speaker verification systems, Proc. RLA2C - Speaker Recognition Workshop (1998) pp. 93-96

    Google Scholar 

  36. A. Rosenberg, O. Siohan, S. Parthasarathy: Speaker verification using minimum verification error training, Proc. ICASSP (1998) pp. 105-108

    Google Scholar 

  37. J. Navratil, G. Ramaswamy: Detac - a discriminative criterion for speaker verification, Proc. Int. Conf. Spoken Language Processing (2002)

    Google Scholar 

  38. V.N. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)

    Book  MATH  Google Scholar 

  39. W.M. Campbell, D.A. Reynolds, J.P. Campbell: Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data, Proc. ODYSSEY 2004 - The Speaker and Language Recognition Workshop (2004) pp. 41-44

    Google Scholar 

  40. O. Thyes, R. Kuhn, P. Nguyen, J.-C. Junqua: Speaker identification and verification using eigenvoices, Proc. ICASSP (2000) pp. 242-245

    Google Scholar 

  41. K.R. Farrell, R. Mammone, K. Assaleh: Speaker recognition using neural networks and conventional classifiers, IEEE Trans. Speech Audio Process. 2, 194-205 (1994)

    Article  Google Scholar 

  42. D. Gillick, S. Stafford, B. Peskin: Speaker detection without models, Proc. ICASSP (2005)

    Google Scholar 

  43. G.N. Ramaswamy, R.D. Zilca, O. Alecksandrovich: A programmable policy manager for conversational biometrics, Proc. Eurospeech (2003)

    Google Scholar 

  44. H.V. Poor: An Introduction to Signal Detection and Estimation (Springer, Berlin, Heidelberg 1994)

    Book  MATH  Google Scholar 

  45. K.P. Li, J.E. Porter: Normalizations and selection of speech segments for speaker recognition scoring, Proc. IEEE ICASSP (1988) pp. 595-598

    Google Scholar 

  46. F. Bimbot: A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process. 4, 430-451 (2004)

    Article  Google Scholar 

  47. A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki: The det curve in assessment of detection task performance, Proc. Eurospeech (1997) pp. 1895-1898

    Google Scholar 

  48. A. Martin, M. Przybocki: The NIST 1999 speaker recognition evaluation - an overview, Digital Signal Process. 10, 1-18 (2000)

    Article  Google Scholar 

  49. M.A. Siegler, U. Jain, B. Raj, R.M. Stern: Automatic segmentation, classification, and clustering of broadcast news data, Proc. DARPA Speech Recognition Workshop (1997) pp. 97-99

    Google Scholar 

  50. A.E. Rosenberg, I. Magrin-Chagnolleau, S. Parthasarathy, Q. Huang: Speaker detection in broadcast news databases, Proc. Int. Conf. on Spoken Lang. Processing (1998) pp. 1339-1342

    Google Scholar 

  51. J.-F. Bonastre, P. Delacourt, C. Fredouille, T. Merlin, C. Wellekens: A speaker tracking system based on speaker turn detection for nist evaluation, Proc. ICASSP (2000) pp. 1177-1180

    Google Scholar 

  52. A.G. Adami, S.S. Kajarekar, H. Hermansky: A new speaker change detection method for two-speaker segmentation, Proc. ICASSP (2002) pp. 3908-3911

    Google Scholar 

  53. A.E. Rosenberg, A. Gorin, Z. Liu, S. Parthasarathy: Unsupervised segmentation of telephone conversations, Proc. Int. Conf. on Spoken Lang. Processing (2002) pp. 565-568

    Google Scholar 

  54. S.S. Chen, P.S. Gopalakrishnan: Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proc. DARPA Broadcast News Transcription and Understanding Workshop (1998), http://www.nist.gov/speech/publications/darpa98/index.htm

    Google Scholar 

  55. A. Tritschler, R. Gopinath: Improved speaker segmentation and segments clustering using the bayesian information criterion, Proc. Eurospeech (1999)

    Google Scholar 

  56. A.D. Gordon: Classification: Methods for the Exploratory Analysis of Multivariate Data (Chapman Hall, Englewood Cliffs 1981)

    MATH  Google Scholar 

  57. F. Kubala, H. Jin, R. Schwartz: Automatic speaker clustering, Proc. DARPA Speech Recognition Workshop (1997) pp. 108-111

    Google Scholar 

  58. D. Liu, F. Kubala: Online speaker clustering, Proc. ICASSP (2003) pp. 572-575

    Google Scholar 

  59. J.-F. Bonastre, F. Bimbot, L.-J. Boë, J. Campbell, D. Reynolds, I. Magrin-Chagnolleau: Person authentication by voice: a need for caution, Proc. Eurospeech (2003) pp. 33-36

    Google Scholar 

  60. Voice Identification and Acoustic Analysis Subcommittee of the International Association for Identification: Voice comparison standards, J. Forensic Identif. 41, 373-392 (1991)

    Google Scholar 

  61. A.E. Rosenberg, S. Parthasarathy, J. Hirschberg, S. Whittaker: Foldering voicemail messages by caller using text independent speaker recognition, Proc. Int. Conf. on Spoken Language Processing (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aaron E. Rosenberg Prof. , Frédéric Bimbot Ph.D or Sarangarajan Parthasarathy Dr. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rosenberg, A.E., Bimbot, F., Parthasarathy, S. (2008). Overview of Speaker Recognition. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics