Abstract
An introduction to automatic speaker recognition is presented in this chapter. The identifying characteristics of a personʼs voice that make it possible to automatically identify a speaker are discussed. Subtasks such as speaker identification, verification, and detection are described. An overview of the techniques used to build speaker models as well as issues related to system performance are presented. Finally, a few selected applications of speaker recognition are introduced to demonstrate the wide range of applications of speaker recognition technologies. Details of text-dependent and text-independent speaker recognition and their applications are covered in the following two chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ASR:
-
automatic speech recognition
- BIC:
-
Bayesian information criterion
- CIS:
-
caller identification system
- CMS:
-
cepstral mean subtraction
- DCF:
-
detection cost function
- DET:
-
detection error tradeoff
- DFT:
-
discrete Fourier transform
- EER:
-
equal error rate
- EM:
-
expectation maximization
- FFT:
-
fast Fourier transform
- GLR:
-
generalized likelihood ratio
- GMM:
-
Gaussian mixture model
- HMM:
-
hidden Markov models
- LLR:
-
(log) likelihood ratio
- LPC:
-
linear predictive coding
- MCE:
-
minimum classification error
- MFCC:
-
mel-filter cepstral coefficient
- ML:
-
maximum-likelihood
- ROC:
-
receiver operating characteristic
- SVM:
-
support vector machines
- VQ:
-
vector quantization
References
J.S. Dunn, F. Podio: Biometrics Consortium website, http://www.biometrics.org (2007)
M.A. Przybocki, A.F. Martin: The 1999 NIST speaker recognition evaluation, using summed two-channel telephone data for speaker detection and speaker tracking, Eurospeech 1999 Proceedings (1999) pp. 2215-2218, http://www.nist.gov/speech/publications/index.htm
M.A. Przybocki, A.F. Martin: Nist speaker recognition evaluation chronicles, Odyssey Workshop 2004 Proc. (2004) pp. 15-22
H. Gish, M.-H. Siu, R. Rohlicek: Segregation of speakers for speech recognition and speaker identification, Proc. ICASSP (1991) pp. 873-876
L. Wilcox, F. Chen, D. Kimber, V. Balasubramanian: Segmentation of speech using speaker identification, Proc. ICASSP (1994) pp. 161-164
J.-L. Gauvain, L. Lamel, G. Adda: Partitioning and transcription of broadcast news data, Proc. of ICSLP (1998) pp. 1335-1338
S.E. Johnson: Who spoke when? - automatic segmentation and clustering for determining speaker turns, Proc. Eurospeech (1999) pp. 2211-2214
P. Delacourt, C.J. Wellekens: Distbic: A speaker-based segmentation for audio data indexing, Speech Commun. 32, 111-126 (2000)
R.B. Dunn, D.A. Reynolds, T.F. Quatieri: Approaches to speaker detection and tracking in conversational speech, Digital Signal Process. 10, 93-112 (2000)
S.E. Tranter, D.A. Reynolds: An overview of automatic speaker diarization systems, IEEE Trans. Speech Audio Process. 14, 1557-1565 (2006)
L.H. Jamieson: Course notes for speech processing by computer, http://cobweb.ecn.purdue.edu ee649/notes/ (2007) Chap. 1
L.R. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs 1993)
S. Davis, P. Mermelstein: Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences, IEEE Trans. Acoust. Speech Signal Process. 28, 357-366 (1980)
X. Huang, A. Acero, H.-W. Hon: Spoken Language Processing: A Guide to Theory, Algorithm and System Development (Prentice-Hall, Englewood Cliffs 2001)
J. Pelecanos, S. Sridharan: Feature warping for robust speaker verification, Proc. ISCA Workshop on Speaker Recognition - 2001: A Speaker Odyssey (2001)
B. Xiang, U. Chaudhari, J. Navratil, G. Ramaswamy, R. Gopinath: Short-time Gaussianization for robust speaker verification, Proc. ICASSP, Vol. 1 (2002) pp. 681-684
S. Furui: Comparison of speaker recognition methods using static features and dynamic features, IEEE Trans. Acoust. Speech Signal Process. 29, 342-350 (1981)
J.P. Campbell, D.A. Reynolds, R.B. Dunn: Fusing high- and log-level features for speaker recognition, Proc. Eurospeech, Vol. 1 (2003)
W. Hess: Pitch Determination of Speech Signals (Springer, Berlin, Heidelberg 1983)
G. Doddington: Speaker recognition based on idiolectal differences between speakers, Proc. Eurospeech (2001) pp. 2521-2524
W.D. Andrews, M.A. Kohler, J.P. Campbell, J.J. Godfrey: Phonetic, idiolectal, and acoustic speaker recognition, Proceedings of Odyssey Workshop (2001)
A. Hatch, B. Peskin, A. Stolcke: Improved phonetic speaker recognition using lattice decoding, Proc. ICASSP, Vol. 1 (2005)
D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, B. Xiang: The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition, Proc. ICASSP (2003) pp. 784-787
A.E. Rosenberg: Automatic speaker verification: A review, Proc. IEEE 64, 475-487 (1976)
K. Fukunaga: Introduction to Statistical Pattern Recognition, 2nd edn. (Elsevier, New York 1990)
A.L. Higgins, L.G. Bahler, J.E. Porter: Voice identification using nearest-neighbor distance measure, Proc. ICASSP (1993) pp. 375-378
Y. Linde, A. Buzo, R.M. Gray: An algorithm for vector quantization, IEEE Trans. Commun. 28, 94-95 (1980)
F.K. Soong, A.E. Rosenberg, L.R. Rabiner, B.H. Juang: A vector quantization approach to speaker recognition, Proc. IEEE ICASSP (1985) pp. 387-390
D.A. Reynolds, R.C. Rose: Robust text independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3, 72-83 (1995)
D.A. Reynolds, T.F. Quatieri, R.B. Dunn: Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10, 19-41 (2000)
A.E. Rosenberg, S. Parthasarathy: Speaker background models for connected digit password speaker verification, Proc. ICASSP (1996) pp. 81-84
S. Parthasarathy, A.E. Rosenberg: General phrase speaker verification using sub-word background models and likelihood-ratio scoring, Proc. Int. Conf. Spoken Language Processing (1996) pp. 2403-2406
O. Siohan, A.E. Rosenberg, S. Parthasarathy: Speaker identification using minimum classification error training, Proc. ICASSP (1998) pp. 109-112
A.E. Rosenberg, O. Siohan, S. Parthasarathy: Small group speaker identification with common password phrases, Speech Commun. 31, 131-140 (2000)
L. Heck, Y. Konig: Discriminative training of minimum cost speaker verification systems, Proc. RLA2C - Speaker Recognition Workshop (1998) pp. 93-96
A. Rosenberg, O. Siohan, S. Parthasarathy: Speaker verification using minimum verification error training, Proc. ICASSP (1998) pp. 105-108
J. Navratil, G. Ramaswamy: Detac - a discriminative criterion for speaker verification, Proc. Int. Conf. Spoken Language Processing (2002)
V.N. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)
W.M. Campbell, D.A. Reynolds, J.P. Campbell: Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data, Proc. ODYSSEY 2004 - The Speaker and Language Recognition Workshop (2004) pp. 41-44
O. Thyes, R. Kuhn, P. Nguyen, J.-C. Junqua: Speaker identification and verification using eigenvoices, Proc. ICASSP (2000) pp. 242-245
K.R. Farrell, R. Mammone, K. Assaleh: Speaker recognition using neural networks and conventional classifiers, IEEE Trans. Speech Audio Process. 2, 194-205 (1994)
D. Gillick, S. Stafford, B. Peskin: Speaker detection without models, Proc. ICASSP (2005)
G.N. Ramaswamy, R.D. Zilca, O. Alecksandrovich: A programmable policy manager for conversational biometrics, Proc. Eurospeech (2003)
H.V. Poor: An Introduction to Signal Detection and Estimation (Springer, Berlin, Heidelberg 1994)
K.P. Li, J.E. Porter: Normalizations and selection of speech segments for speaker recognition scoring, Proc. IEEE ICASSP (1988) pp. 595-598
F. Bimbot: A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process. 4, 430-451 (2004)
A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki: The det curve in assessment of detection task performance, Proc. Eurospeech (1997) pp. 1895-1898
A. Martin, M. Przybocki: The NIST 1999 speaker recognition evaluation - an overview, Digital Signal Process. 10, 1-18 (2000)
M.A. Siegler, U. Jain, B. Raj, R.M. Stern: Automatic segmentation, classification, and clustering of broadcast news data, Proc. DARPA Speech Recognition Workshop (1997) pp. 97-99
A.E. Rosenberg, I. Magrin-Chagnolleau, S. Parthasarathy, Q. Huang: Speaker detection in broadcast news databases, Proc. Int. Conf. on Spoken Lang. Processing (1998) pp. 1339-1342
J.-F. Bonastre, P. Delacourt, C. Fredouille, T. Merlin, C. Wellekens: A speaker tracking system based on speaker turn detection for nist evaluation, Proc. ICASSP (2000) pp. 1177-1180
A.G. Adami, S.S. Kajarekar, H. Hermansky: A new speaker change detection method for two-speaker segmentation, Proc. ICASSP (2002) pp. 3908-3911
A.E. Rosenberg, A. Gorin, Z. Liu, S. Parthasarathy: Unsupervised segmentation of telephone conversations, Proc. Int. Conf. on Spoken Lang. Processing (2002) pp. 565-568
S.S. Chen, P.S. Gopalakrishnan: Speaker, environment and channel change detection and clustering via the bayesian information criterion, Proc. DARPA Broadcast News Transcription and Understanding Workshop (1998), http://www.nist.gov/speech/publications/darpa98/index.htm
A. Tritschler, R. Gopinath: Improved speaker segmentation and segments clustering using the bayesian information criterion, Proc. Eurospeech (1999)
A.D. Gordon: Classification: Methods for the Exploratory Analysis of Multivariate Data (Chapman Hall, Englewood Cliffs 1981)
F. Kubala, H. Jin, R. Schwartz: Automatic speaker clustering, Proc. DARPA Speech Recognition Workshop (1997) pp. 108-111
D. Liu, F. Kubala: Online speaker clustering, Proc. ICASSP (2003) pp. 572-575
J.-F. Bonastre, F. Bimbot, L.-J. Boë, J. Campbell, D. Reynolds, I. Magrin-Chagnolleau: Person authentication by voice: a need for caution, Proc. Eurospeech (2003) pp. 33-36
Voice Identification and Acoustic Analysis Subcommittee of the International Association for Identification: Voice comparison standards, J. Forensic Identif. 41, 373-392 (1991)
A.E. Rosenberg, S. Parthasarathy, J. Hirschberg, S. Whittaker: Foldering voicemail messages by caller using text independent speaker recognition, Proc. Int. Conf. on Spoken Language Processing (2000)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rosenberg, A.E., Bimbot, F., Parthasarathy, S. (2008). Overview of Speaker Recognition. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)