Overview of Speaker Recognition

Rosenberg, Aaron E.; Bimbot, Frédéric; Parthasarathy, Sarangarajan

doi:10.1007/978-3-540-49127-9_36

Aaron E. Rosenberg Prof.⁴,
Frédéric Bimbot Ph.D⁵ &
Sarangarajan Parthasarathy Dr.⁶

Part of the book series: Springer Handbooks ((SHB))

8045 Accesses
6 Citations

Abstract

An introduction to automatic speaker recognition is presented in this chapter. The identifying characteristics of a personʼs voice that make it possible to automatically identify a speaker are discussed. Subtasks such as speaker identification, verification, and detection are described. An overview of the techniques used to build speaker models as well as issues related to system performance are presented. Finally, a few selected applications of speaker recognition are introduced to demonstrate the wide range of applications of speaker recognition technologies. Details of text-dependent and text-independent speaker recognition and their applications are covered in the following two chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ASR:: automatic speech recognition
BIC:: Bayesian information criterion
CIS:: caller identification system
CMS:: cepstral mean subtraction
DCF:: detection cost function
DET:: detection error tradeoff
DFT:: discrete Fourier transform
EER:: equal error rate
EM:: expectation maximization
FFT:: fast Fourier transform
GLR:: generalized likelihood ratio
GMM:: Gaussian mixture model
HMM:: hidden Markov models
LLR:: (log) likelihood ratio
LPC:: linear predictive coding
MCE:: minimum classification error
MFCC:: mel-filter cepstral coefficient
ML:: maximum-likelihood
ROC:: receiver operating characteristic
SVM:: support vector machines
VQ:: vector quantization

References

J.S. Dunn, F. Podio: Biometrics Consortium website, http://www.biometrics.org (2007)
M.A. Przybocki, A.F. Martin: The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking, Eurospeech 1999 Proceedings (1999) pp. 2215-2218, http://www.nist.gov/speech/publications/index.htm
Google Scholar
M.A. Przybocki, A.F. Martin: Nist speaker recognition evaluation chronicles, Odyssey Workshop 2004 Proc. (2004) pp. 15-22
Google Scholar
H. Gish, M.-H. Siu, R. Rohlicek: Segregation of speakers for speech recognition and speaker identification, Proc. ICASSP (1991) pp. 873-876
Google Scholar
L. Wilcox, F. Chen, D. Kimber, V. Balasubramanian: Segmentation of speech using speaker identification, Proc. ICASSP (1994) pp. 161-164
Google Scholar
J.-L. Gauvain, L. Lamel, G. Adda: Partitioning and transcription of broadcast news data, Proc. of ICSLP (1998) pp. 1335-1338
Google Scholar
S.E. Johnson: Who spoke when? - automatic segmentation and clustering for determining speaker turns, Proc. Eurospeech (1999) pp. 2211-2214
Google Scholar
P. Delacourt, C.J. Wellekens: Distbic: A speaker-based segmentation for audio data indexing, Speech Commun. 32, 111-126 (2000)
Article Google Scholar
R.B. Dunn, D.A. Reynolds, T.F. Quatieri: Approaches to speaker detection and tracking in conversational speech, Digital Signal Process. 10, 93-112 (2000)
Article Google Scholar
S.E. Tranter, D.A. Reynolds: An overview of automatic speaker diarization systems, IEEE Trans. Speech Audio Process. 14, 1557-1565 (2006)
Article Google Scholar
L.H. Jamieson: Course notes for speech processing by computer, http://cobweb.ecn.purdue.edu ee649/notes/ (2007) Chap. 1
Google Scholar
L.R. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs 1993)
MATH Google Scholar
S. Davis, P. Mermelstein: Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process. 28, 357-366 (1980)
Article Google Scholar
X. Huang, A. Acero, H.-W. Hon: Spoken Language Processing: A Guide to Theory, Algorithm and System Development (Prentice-Hall, Englewood Cliffs 2001)
Google Scholar
J. Pelecanos, S. Sridharan: Feature warping for robust speaker verification, Proc. ISCA Workshop on Speaker Recognition - 2001: A Speaker Odyssey (2001)
Google Scholar
B. Xiang, U. Chaudhari, J. Navratil, G. Ramaswamy, R. Gopinath: Short-time Gaussianization for robust speaker verification, Proc. ICASSP, Vol. 1 (2002) pp. 681-684
Google Scholar
S. Furui: Comparison of speaker recognition methods using static features and dynamic features, IEEE Trans. Acoust. Speech Signal Process. 29, 342-350 (1981)
Article Google Scholar
J.P. Campbell, D.A. Reynolds, R.B. Dunn: Fusing high- and log-level features for speaker recognition, Proc. Eurospeech, Vol. 1 (2003)
Google Scholar
W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)
Book Google Scholar
G. Doddington: Speaker recognition based on idiolectal differences between speakers, Proc. Eurospeech (2001) pp. 2521-2524
Google Scholar
W.D. Andrews, M.A. Kohler, J.P. Campbell, J.J. Godfrey: Phonetic, idiolectal, and acoustic speaker recognition, Proceedings of Odyssey Workshop (2001)
Google Scholar
A. Hatch, B. Peskin, A. Stolcke: Improved phonetic speaker recognition using lattice decoding, Proc. ICASSP, Vol. 1 (2005)
Google Scholar
D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, B. Xiang: The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition, Proc. ICASSP (2003) pp. 784-787
Google Scholar
A.E. Rosenberg: Automatic speaker verification: A review, Proc. IEEE 64, 475-487 (1976)
Article Google Scholar
K. Fukunaga: Introduction to Statistical Pattern Recognition, 2nd edn. (Elsevier, New York 1990)
MATH Google Scholar
A.L. Higgins, L.G. Bahler, J.E. Porter: Voice identification using nearest-neighbor distance measure, Proc. ICASSP (1993) pp. 375-378
Google Scholar
Y. Linde, A. Buzo, R.M. Gray: An algorithm for vector quantization, IEEE Trans. Commun. 28, 94-95 (1980)
Article Google Scholar
F.K. Soong, A.E. Rosenberg, L.R. Rabiner, B.H. Juang: A vector quantization approach to speaker recognition, Proc. IEEE ICASSP (1985) pp. 387-390
Google Scholar
D.A. Reynolds, R.C. Rose: Robust text independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3, 72-83 (1995)
Article Google Scholar
D.A. Reynolds, T.F. Quatieri, R.B. Dunn: Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10, 19-41 (2000)
Article Google Scholar
A.E. Rosenberg, S. Parthasarathy: Speaker background models for connected digit password speaker verification, Proc. ICASSP (1996) pp. 81-84
Google Scholar
S. Parthasarathy, A.E. Rosenberg: General phrase speaker verification using sub-word background models and likelihood-ratio scoring, Proc. Int. Conf. Spoken Language Processing (1996) pp. 2403-2406
Google Scholar
O. Siohan, A.E. Rosenberg, S. Parthasarathy: Speaker identification using minimum classification error training, Proc. ICASSP (1998) pp. 109-112
Google Scholar
A.E. Rosenberg, O. Siohan, S. Parthasarathy: Small group speaker identification with common password phrases, Speech Commun. 31, 131-140 (2000)
Article Google Scholar
L. Heck, Y. Konig: Discriminative training of minimum cost speaker verification systems, Proc. RLA2C - Speaker Recognition Workshop (1998) pp. 93-96
Google Scholar
A. Rosenberg, O. Siohan, S. Parthasarathy: Speaker verification using minimum verification error training, Proc. ICASSP (1998) pp. 105-108
Google Scholar
J. Navratil, G. Ramaswamy: Detac - a discriminative criterion for speaker verification, Proc. Int. Conf. Spoken Language Processing (2002)
Google Scholar
V.N. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)
Book MATH Google Scholar
W.M. Campbell, D.A. Reynolds, J.P. Campbell: Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data, Proc. ODYSSEY 2004 - The Speaker and Language Recognition Workshop (2004) pp. 41-44
Google Scholar
O. Thyes, R. Kuhn, P. Nguyen, J.-C. Junqua: Speaker identification and verification using eigenvoices, Proc. ICASSP (2000) pp. 242-245
Google Scholar
K.R. Farrell, R. Mammone, K. Assaleh: Speaker recognition using neural networks and conventional classifiers, IEEE Trans. Speech Audio Process. 2, 194-205 (1994)
Article Google Scholar
D. Gillick, S. Stafford, B. Peskin: Speaker detection without models, Proc. ICASSP (2005)
Google Scholar
G.N. Ramaswamy, R.D. Zilca, O. Alecksandrovich: A programmable policy manager for conversational biometrics, Proc. Eurospeech (2003)
Google Scholar
H.V. Poor: An Introduction to Signal Detection and Estimation (Springer, Berlin, Heidelberg 1994)
Book MATH Google Scholar
K.P. Li, J.E. Porter: Normalizations and selection of speech segments for speaker recognition scoring, Proc. IEEE ICASSP (1988) pp. 595-598
Google Scholar
F. Bimbot: A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process. 4, 430-451 (2004)
Article Google Scholar
A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki: The det curve in assessment of detection task performance, Proc. Eurospeech (1997) pp. 1895-1898
Google Scholar
A. Martin, M. Przybocki: The NIST 1999 speaker recognition evaluation - an overview, Digital Signal Process. 10, 1-18 (2000)
Article Google Scholar
M.A. Siegler, U. Jain, B. Raj, R.M. Stern: Automatic segmentation, classification, and clustering of broadcast news data, Proc. DARPA Speech Recognition Workshop (1997) pp. 97-99
Google Scholar
A.E. Rosenberg, I. Magrin-Chagnolleau, S. Parthasarathy, Q. Huang: Speaker detection in broadcast news databases, Proc. Int. Conf. on Spoken Lang. Processing (1998) pp. 1339-1342
Google Scholar
J.-F. Bonastre, P. Delacourt, C. Fredouille, T. Merlin, C. Wellekens: A speaker tracking system based on speaker turn detection for nist evaluation, Proc. ICASSP (2000) pp. 1177-1180
Google Scholar
A.G. Adami, S.S. Kajarekar, H. Hermansky: A new speaker change detection method for two-speaker segmentation, Proc. ICASSP (2002) pp. 3908-3911
Google Scholar
A.E. Rosenberg, A. Gorin, Z. Liu, S. Parthasarathy: Unsupervised segmentation of telephone conversations, Proc. Int. Conf. on Spoken Lang. Processing (2002) pp. 565-568
Google Scholar
S.S. Chen, P.S. Gopalakrishnan: Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proc. DARPA Broadcast News Transcription and Understanding Workshop (1998), http://www.nist.gov/speech/publications/darpa98/index.htm
Google Scholar
A. Tritschler, R. Gopinath: Improved speaker segmentation and segments clustering using the bayesian information criterion, Proc. Eurospeech (1999)
Google Scholar
A.D. Gordon: Classification: Methods for the Exploratory Analysis of Multivariate Data (Chapman Hall, Englewood Cliffs 1981)
MATH Google Scholar
F. Kubala, H. Jin, R. Schwartz: Automatic speaker clustering, Proc. DARPA Speech Recognition Workshop (1997) pp. 108-111
Google Scholar
D. Liu, F. Kubala: Online speaker clustering, Proc. ICASSP (2003) pp. 572-575
Google Scholar
J.-F. Bonastre, F. Bimbot, L.-J. Boë, J. Campbell, D. Reynolds, I. Magrin-Chagnolleau: Person authentication by voice: a need for caution, Proc. Eurospeech (2003) pp. 33-36
Google Scholar
Voice Identification and Acoustic Analysis Subcommittee of the International Association for Identification: Voice comparison standards, J. Forensic Identif. 41, 373-392 (1991)
Google Scholar
A.E. Rosenberg, S. Parthasarathy, J. Hirschberg, S. Whittaker: Foldering voicemail messages by caller using text independent speaker recognition, Proc. Int. Conf. on Spoken Language Processing (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Advanced Information Processing, Rutgers University, 96 Frelinghuysen Road, 08854-8088, Piscataway, NJ, USA
Aaron E. Rosenberg Prof.
IRISA (CNRS & INRIA) - METISS, Pièce C 320 - Campus Universitaire de Beaulieu, 35042, Rennes, France
Frédéric Bimbot Ph.D
Yahoo!, Applied Research, 1MC 743, 701 First Avenue, 94089-0703, Sunnyvale, CA, USA
Sarangarajan Parthasarathy Dr.

Authors

Aaron E. Rosenberg Prof.
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Bimbot Ph.D
View author publications
You can also search for this author in PubMed Google Scholar
Sarangarajan Parthasarathy Dr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Aaron E. Rosenberg Prof. , Frédéric Bimbot Ph.D or Sarangarajan Parthasarathy Dr. .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rosenberg, A.E., Bimbot, F., Parthasarathy, S. (2008). Overview of Speaker Recognition. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics