Abstract
This chapter provides an overview of various feature and model-based approaches developed in past for robust speaker recognition. The advantages and disadvantages of some standard methods applied for robust speaker verification tasks have been highlighted. The main focus is to summarily introduce popular state-of-the-art techniques adopted for enhancing speaker verification performance in noisy conditions.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
J. Campbell, Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
F. Bimbot, J.F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacrétaz, D.A. Reynolds, A tutorial on text-independent speaker verification. EURASIP J. Adv. Signal Process. (Spec. Issue Biom. Signal Process.) 4(4), 430–451 (2004)
B.G.B. Fauve, D. Matrouf, N. Scheffer, J.F. Bonastre, J.S.D. Mason, State-of-the-art performance in text-independent speaker verification through open-source software. IEEE Trans. Audio Speech Lang. Process. 15(7), 1960–1968 (2007)
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)
S. Sarkar, Robust speaker recognition in noisy environments. Master’s thesis, School of Information Technology, Indian Institute of Technology Kharagpur, Mar 2014
R. Schafer, L. Rabiner, Digital representations of speech signals. Proc. IEEE 63(4), 662–677 (1975)
B. Atal, Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976)
J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
A. Acero, Acoustical and environmental robustness in automatic speech recognition. PhD thesis, Carnegie Mellon University, Sept 1990
D.A. Reynolds, Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech Audio Process. 2(4), 639–643 (1994)
R. Mammone, X. Zhang, R. Ramachandran, Robust speaker recognition: a feature-based approach. IEEE Signal Process. Mag. 13(5), 58–71 (1996)
D. Reynolds, The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, 1996, vol. 1, pp. 113–116
K.S. Rao, J. Yadav, S. Sarkar, S.G. Koolagudi, A.K. Vuppala, Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. (Springer) 15(3), 335–349 (2012)
S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)
A. Kocsor, L. Toth, Kernel-based feature extraction with a speech technology application. IEEE Trans. Signal Process. 52(8), 2250–2263 (2004)
T.G. Stockham, T.M. Cannon, R.B. Ingebretsen, Blind deconvolution through digital signal processing. Proc. IEEE 63(4), 678–692 (1975)
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
A. Erell, M. Weintraub, Spectral estimation for noise robust speech recognition, in Proceedings of DARPA Speech and Natural Language Workshop, Philadelphia, 1989
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
A. Acero, R.M. Stern, Environmental robustness in automatic speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’90), Albuquerque, 1990, vol. 2, pp. 849–852
S. Suhadi, S. Stan, T. Fingscheidt, C. Beaugeant, An evaluation of VTS and IMM for speaker verification in noise, in Proceedings of 4th Annual Conference of the International Speech Communication Association (INTERSPEECH ’03), Geneva, 2003, pp. 1669–1672
L. Deng, J. Droppo, A. Acero, Recursive estimation of non-stationary noise using iterative stochastic approximation for robust speech recognition. IEEE Trans. Speech Audio Process. 11(6), 568–580 (2003)
P.J. Moreno, B. Raj, R.M. Stern, Data-driven environmental compensation for speech recognition: a unified approach. Speech Commun. 24(4), 267–285 (1998)
L. Deng, A. Acero, M. Plumpe, X. Huang, Large-vocabulary speech recognition under adverse acoustic environments, in Proceedings of the International Conference of Spoken Language Processing (ICSLP ’00), Beijing, 2000, pp. 806–809
L. Deng, A. Acero, L. Jiang, J. Droppo, X. Huang, High-performance robust speech recognition using stereo training data, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, 2001, vol. 1, pp. 301–304
L. Buera, E. Lleida, A. Miguel, A. Ortega, Multi-environment models based linear normalization for speech recognition in car conditions, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’04), Montreal, 2004
M. Afify, X. Cui, Y. Gao, Stereo-based stochastic mapping for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(7), 1325–1334 (2009)
L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50, 782–796 (2008)
A.G. Adami, R. Mihaescu, D.A. Reynolds, J.J. Godfrey, Modeling prosodic dynamics for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’03), Hong Kong, 2003
L. Ferrer, E. Shriberg, S. Kajarekar, K. Sonmez, Parameterization of prosodic feature distributions for SVM modeling in speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’07), Honolulu, 2007, pp. 233–236
S.G. Koolkagudi, K.S. Rao, R. Reddy, A.K. Vuppala, S. Chakrabarti, Robust speaker recognition in noisy environments: using dynamics of speaker-specific prosody, in Forensic Speaker Recognition (Springer, New York, USA, 2013), pp. 183–204
E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, A. Stolckea, Modeling prosodic feature sequences for speaker recognition. Speech Commun. 46, 455–472 (2005)
G. Doddington, Speaker recognition based on idiolectal differences between speakers, in Proceedings of the European Conference of Speech Communication Technology (EUROSPEECH ’01), Aalborg, 2001, pp. 2521–2524
W.M. Campbel, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek, Phonetic speaker recognition with support vector machines, in Proceedings of the Neural Information Processing Systems Conference, Vancouver, 2003, pp. 1377–1384
K. yee Leung, M. wai Mak, M. Siu, S. yuan Kung, Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification. Speech Commun. 48, 71–84 (2006)
B. Ma, D. Zhu, H. Li, R. Tong, Speaker cluster based GMM tokenization for speaker recognition, in Proceeding of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH ’06), Pittsburgh, 2006
B. Ma, H. Li, R. Tong, Spoken language recognition using ensemble classifiers. IEEE Trans. Audio Speech Lang. Process. 15(7), 2053–2062 (2007)
D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adomi, Q. Jin, D. Kluracek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, S. Xiang’, The supersid project: exploiting high-level information for high-accuracy speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’03), Hong Kong, 2003
H. Hermansky, Perceptual linear prediction (PLP) analysis for speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)
L. Rabiner, B.H. Juang, Fundamentals of Speech Recognition, 1st edn. (Prentice-Hall, Englewood Cliffs, 1993)
X. Huang, A. Acero, H. Hon, Spoken Language Processing: a Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001)
S. Sarkar, K.S. Rao, D. Nandi, Multilingual speaker recognition on Indian languages, in IEEE INDICON, Mumbai (IIT Mumbai, Mumbai, 2013)
J.W. Suh, S.O. Sadjadi, G. Liu, T. Hasan, K.W. Godin, J.H. Hansen, Exploring Hilbert envelope based acoustic features in i-vector speaker verification using HT-PLDA, in Proceedings of NIST Speaker Recognition Evaluation Workshop, Gaithersburg, USA, 2011
C. Kim, R.M. Stern, Power-normalized cepstral coefficients (PNCC) for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’12), Kyoto, 2012
V. Mitra, H. Franco, M. Graciarena, A. Mandal, Normalized amplitude modulation features for large vocabulary noise-robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’12), Kyoto, 2012
A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Int. J. Adapt. Control Signal Process. 27(9), 781–792 (2013). Wiley
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imaging Syst. Eng. 6(3), 130–137 (2013)
K.S. Rao, S. Maity, V.R. Reddy, Pitch synchronous and glottal closure based speech analysis for language recognition. Int. J. Speech Technol. 16, 413–430 (2013). Springer
T. Kristjansson, B. Frey, Accounting for uncertainity in observations: a new paradigm for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’02), Orlando, 2002, vol. 1, pp. 61–64
C.H. Lee, On stochastic feature and model compensation approaches to robust speech recognition. Speech Commun. 25, 29–47 (1998)
C.H. Lee, Q. Huo, On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc. IEEE 88(8), 1241–1269 (2000)
T. Quatieri, D. Reynolds, G. O’Leary, Estimation of handset nonlinearity with application to speaker recognition. IEEE Trans. Speech Audio Process. 8, 567–584 (2000)
H.A. Murthy, F. Beaufays, L.P. Heck, M. Weintraub, Robust text-independent speaker identification over telephone channels. IEEE Trans. Speech Audio Process. 7(5), 554–568 (1999)
R. Teunen, B. Shahshahani, L. Heck, A model-based transformational approach to robust speaker recognition, in Proceeding of the Annual Conference of the International Speech Communication Association (INTERSPEECH ’00), Beijing, 2000, vol. 2, pp. 495–498
J. Gauvain, C. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)
C. Leggetter, P. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Comput. Speech Lang. 9, 171–185 (1995)
D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Acoust. Speech Signal Process. 3(1), 72–83 (1995)
D. Reynolds, T. Quatieri, R. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)
D. Zhu, B. Ma, H. Li, Joint MAP adaptation of feature transformation and Gaussian mixture model for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’09), Taipei, 2009, pp. 4045–4048
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006)
V. Digalakis, D. Rtischev, L. Neumeyer, E. Sa, Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)
S. Kozat, K. Visweswariah, R. Gopinath, Feature adaptation based on Gaussian posteriors, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, 2006, pp. 221–224
K.K. Yiu, M.W. Mak, S.Y. Kung, Environment adaptation for robust speaker verification, in Proceedings of the European Conference of Speech Communication and Technology (EUROSPEECH ’03), Geneva, 2003, vol. 2, pp. 2973–2976
M.J.F. Gales, S.J. Young, Robust speech recognition in additive and convolutional noise using parallel model combination. Comput. Speech Lang. 9, 289–307 (1995)
L.P. Wong, M. Russell, Text-dependent speaker verification under noisy conditions using parallel model combination, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’01), Salt Lake City, 2001, pp. 457–460
P. Moreno, Speech recognition in noisy environments. PhD thesis, Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, 1996
K.C. Sim, M.T. Luong, A trajectory-based parallel model combination with a unified static and dynamic parameter compensation for noisy speech recognition, in Proceedings of the Workshop on Automatic Speech Recognition and Understanding (ASRU ’11), Waikoloa, Dec 2011, pp. 107–112
P.J. Moreno, B. Raj, R.M. Stern, A vector Taylor series approach for environment-independent speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, 1996, pp. 733–736
A. Sankar, C.H. Lee, Stochastic matching for robust speech recognition. IEEE Signal Process. Lett. 1(8), 124–125 (1994)
H. Liao, M.J.F. Gales, Joint uncertainty decoding for noise robust speech recognition, in Proceedings of 6th Annual Conference of the International Speech Communication Association (INTERSPEECH ’05), Lisbon, 2005
J. Ming, D. Stewart, S. Vaseghi, Speaker identification in unknown noisy conditions – a universal compensation approach, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’05), Philadelphia, 2005
J. Ming, T.J. Hazen, J.R. Glass, D. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)
A. Drygajlo, M. El-Maliki, Speaker verification in noisy environment with combined spectral subtraction and missing data theory, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’98), Seattle, 1998
D. Burton, Text-dependent speaker verification using vector quantization source coding. IEEE Trans. Acoust. Speech Signal Process. 35(2), 133–143 (1987)
T. Kinnunen, E. Karpov, P. Franti, Real-time speaker identification and verification. IEEE Trans. Audio Speech Lang. Process. 14(1), 277–288 (2006)
M.W. Mak, R. Hsiao, B. Mak, A comparison of various adaptation methods for speaker verification with limited enrollment data, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’06), Toulouse, 2006, pp. 929–932
V. Hautamaki, T. Kinnunen, I. Karkkainen, M. Tuononen, J. Saastamoinen, P. Franti, Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Process. Lett. 15, 162–165 (2008)
P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Factor analysis simplified, in Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP ’05), Philadelphia, 2005, vol. 1, pp. 637–640
K. Farrell, R. Mammone, K. Assaleh, Speaker recognition using neural networks and conventional classifiers. IEEE Trans. Speech Audio Process. 2(1), 195–204 (1994)
L.P. Heck, Y. Konig, M. Sonmez, M. Weintraub, Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Commun. 31, 181–192 (2000)
B. Yegnanarayana, S.P. Kishore, AANN: an alternative to GMM for pattern recognition. Neural Netw. 15, 456–469 (2002)
W. Campbell, J. Campbell, D. Reynolds, E. Singer, P. Carrasquillo, Support vector machines for speaker and language recognition. Comput. Speech Lang. 20, 210–229 (2006)
W. Campbell, J. Campbell, D. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
V. Wan, S. Renals, Speaker verification using sequence discriminant support vector machines. IEEE Trans. Acoust. Speech Audio Process. 13(2), 203–210 (2005)
C.H. You, K.A. Lee, H. Li, An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Process. Lett. 16(1), 49–52 (2009)
A. Solomonoff, C. Quillen, I. Boardman, Channel compensation for SVM speaker recognition, in IEEE Workshop on Speaker and Language Recognition (Odyssey ’04), Toledo, 2004, pp. 57–62
A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceedings of the International Conference of Spoken Language Processing (ICSLP ’05), Lisbon, Portugal, 2005
P. Kenny, G. Boulianne, P. Dumouchel, Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)
N. Dehak, R. Dehak, P. Kenny, N. Brummer, P. Ouellet, P. Dumouchel, Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, in Proceeding of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH ’09), Brighton, 2009
N. Dehak, P. Kenny, R. Dehak, O. Glembek, P. Dumouchel, L. Burget, V. Hubeika, F. Castaldo, Support vector machines and joint factor analysis for speaker verification, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’09), Taipei, 2009, pp. 4237–4240
M. McLaren, D. van Leeuwen, Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Trans. Audio Speech Lang. Process. 20(3), 755–766 (2012)
T. Kinnunen, Spectral features for automatic text-independent speaker recognition. PhD thesis, Department of Computer Science, University of Joensuu, 2004
N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 The Author(s)
About this chapter
Cite this chapter
Rao, K.S., Sarkar, S. (2014). Robust Speaker Verification: A Review. In: Robust Speaker Recognition in Noisy Environments. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-07130-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-07130-5_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07129-9
Online ISBN: 978-3-319-07130-5
eBook Packages: EngineeringEngineering (R0)