Robust Speaker Verification: A Review

Rao, K. Sreenivasa; Sarkar, Sourjya

doi:10.1007/978-3-319-07130-5_2

Robust Speaker Verification: A Review

K. Sreenivasa Rao⁴ &
Sourjya Sarkar⁵

Chapter
First Online: 01 January 2014

789 Accesses
3 Citations

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

This chapter provides an overview of various feature and model-based approaches developed in past for robust speaker recognition. The advantages and disadvantages of some standard methods applied for robust speaker verification tasks have been highlighted. The main focus is to summarily introduce popular state-of-the-art techniques adopted for enhancing speaker verification performance in noisy conditions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

J. Campbell, Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Article Google Scholar
F. Bimbot, J.F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacrétaz, D.A. Reynolds, A tutorial on text-independent speaker verification. EURASIP J. Adv. Signal Process. (Spec. Issue Biom. Signal Process.) 4(4), 430–451 (2004)
Google Scholar
B.G.B. Fauve, D. Matrouf, N. Scheffer, J.F. Bonastre, J.S.D. Mason, State-of-the-art performance in text-independent speaker verification through open-source software. IEEE Trans. Audio Speech Lang. Process. 15(7), 1960–1968 (2007)
Article Google Scholar
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)
Article Google Scholar
S. Sarkar, Robust speaker recognition in noisy environments. Master’s thesis, School of Information Technology, Indian Institute of Technology Kharagpur, Mar 2014
Google Scholar
R. Schafer, L. Rabiner, Digital representations of speech signals. Proc. IEEE 63(4), 662–677 (1975)
Article Google Scholar
B. Atal, Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976)
Article Google Scholar
J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
Article Google Scholar
S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
A. Acero, Acoustical and environmental robustness in automatic speech recognition. PhD thesis, Carnegie Mellon University, Sept 1990
Google Scholar
D.A. Reynolds, Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech Audio Process. 2(4), 639–643 (1994)
Article Google Scholar
R. Mammone, X. Zhang, R. Ramachandran, Robust speaker recognition: a feature-based approach. IEEE Signal Process. Mag. 13(5), 58–71 (1996)
Article Google Scholar
D. Reynolds, The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, 1996, vol. 1, pp. 113–116
Google Scholar
K.S. Rao, J. Yadav, S. Sarkar, S.G. Koolagudi, A.K. Vuppala, Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. (Springer) 15(3), 335–349 (2012)
Google Scholar
S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
Article Google Scholar
H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)
Article Google Scholar
A. Kocsor, L. Toth, Kernel-based feature extraction with a speech technology application. IEEE Trans. Signal Process. 52(8), 2250–2263 (2004)
Article MathSciNet Google Scholar
T.G. Stockham, T.M. Cannon, R.B. Ingebretsen, Blind deconvolution through digital signal processing. Proc. IEEE 63(4), 678–692 (1975)
Article Google Scholar
S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
Article Google Scholar
A. Erell, M. Weintraub, Spectral estimation for noise robust speech recognition, in Proceedings of DARPA Speech and Natural Language Workshop, Philadelphia, 1989
Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Article Google Scholar
A. Acero, R.M. Stern, Environmental robustness in automatic speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’90), Albuquerque, 1990, vol. 2, pp. 849–852
Google Scholar
S. Suhadi, S. Stan, T. Fingscheidt, C. Beaugeant, An evaluation of VTS and IMM for speaker verification in noise, in Proceedings of 4th Annual Conference of the International Speech Communication Association (INTERSPEECH ’03), Geneva, 2003, pp. 1669–1672
Google Scholar
L. Deng, J. Droppo, A. Acero, Recursive estimation of non-stationary noise using iterative stochastic approximation for robust speech recognition. IEEE Trans. Speech Audio Process. 11(6), 568–580 (2003)
Article Google Scholar
P.J. Moreno, B. Raj, R.M. Stern, Data-driven environmental compensation for speech recognition: a unified approach. Speech Commun. 24(4), 267–285 (1998)
Article Google Scholar
L. Deng, A. Acero, M. Plumpe, X. Huang, Large-vocabulary speech recognition under adverse acoustic environments, in Proceedings of the International Conference of Spoken Language Processing (ICSLP ’00), Beijing, 2000, pp. 806–809
Google Scholar
L. Deng, A. Acero, L. Jiang, J. Droppo, X. Huang, High-performance robust speech recognition using stereo training data, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, 2001, vol. 1, pp. 301–304
Google Scholar
L. Buera, E. Lleida, A. Miguel, A. Ortega, Multi-environment models based linear normalization for speech recognition in car conditions, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’04), Montreal, 2004
Google Scholar
M. Afify, X. Cui, Y. Gao, Stereo-based stochastic mapping for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(7), 1325–1334 (2009)
Article Google Scholar
L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50, 782–796 (2008)
Article Google Scholar
A.G. Adami, R. Mihaescu, D.A. Reynolds, J.J. Godfrey, Modeling prosodic dynamics for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’03), Hong Kong, 2003
Google Scholar
L. Ferrer, E. Shriberg, S. Kajarekar, K. Sonmez, Parameterization of prosodic feature distributions for SVM modeling in speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’07), Honolulu, 2007, pp. 233–236
Google Scholar
S.G. Koolkagudi, K.S. Rao, R. Reddy, A.K. Vuppala, S. Chakrabarti, Robust speaker recognition in noisy environments: using dynamics of speaker-specific prosody, in Forensic Speaker Recognition (Springer, New York, USA, 2013), pp. 183–204
Google Scholar
E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, A. Stolckea, Modeling prosodic feature sequences for speaker recognition. Speech Commun. 46, 455–472 (2005)
Article Google Scholar
G. Doddington, Speaker recognition based on idiolectal differences between speakers, in Proceedings of the European Conference of Speech Communication Technology (EUROSPEECH ’01), Aalborg, 2001, pp. 2521–2524
Google Scholar
W.M. Campbel, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek, Phonetic speaker recognition with support vector machines, in Proceedings of the Neural Information Processing Systems Conference, Vancouver, 2003, pp. 1377–1384
Google Scholar
K. yee Leung, M. wai Mak, M. Siu, S. yuan Kung, Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification. Speech Commun. 48, 71–84 (2006)
Google Scholar
B. Ma, D. Zhu, H. Li, R. Tong, Speaker cluster based GMM tokenization for speaker recognition, in Proceeding of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH ’06), Pittsburgh, 2006
Google Scholar
B. Ma, H. Li, R. Tong, Spoken language recognition using ensemble classifiers. IEEE Trans. Audio Speech Lang. Process. 15(7), 2053–2062 (2007)
Article Google Scholar
D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adomi, Q. Jin, D. Kluracek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, S. Xiang’, The supersid project: exploiting high-level information for high-accuracy speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’03), Hong Kong, 2003
Google Scholar
H. Hermansky, Perceptual linear prediction (PLP) analysis for speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)
Article Google Scholar
L. Rabiner, B.H. Juang, Fundamentals of Speech Recognition, 1st edn. (Prentice-Hall, Englewood Cliffs, 1993)
Google Scholar
X. Huang, A. Acero, H. Hon, Spoken Language Processing: a Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001)
Google Scholar
S. Sarkar, K.S. Rao, D. Nandi, Multilingual speaker recognition on Indian languages, in IEEE INDICON, Mumbai (IIT Mumbai, Mumbai, 2013)
Google Scholar
J.W. Suh, S.O. Sadjadi, G. Liu, T. Hasan, K.W. Godin, J.H. Hansen, Exploring Hilbert envelope based acoustic features in i-vector speaker verification using HT-PLDA, in Proceedings of NIST Speaker Recognition Evaluation Workshop, Gaithersburg, USA, 2011
Google Scholar
C. Kim, R.M. Stern, Power-normalized cepstral coefficients (PNCC) for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’12), Kyoto, 2012
Google Scholar
V. Mitra, H. Franco, M. Graciarena, A. Mandal, Normalized amplitude modulation features for large vocabulary noise-robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’12), Kyoto, 2012
Google Scholar
A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Int. J. Adapt. Control Signal Process. 27(9), 781–792 (2013). Wiley
Google Scholar
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imaging Syst. Eng. 6(3), 130–137 (2013)
Article Google Scholar
K.S. Rao, S. Maity, V.R. Reddy, Pitch synchronous and glottal closure based speech analysis for language recognition. Int. J. Speech Technol. 16, 413–430 (2013). Springer
Google Scholar
T. Kristjansson, B. Frey, Accounting for uncertainity in observations: a new paradigm for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’02), Orlando, 2002, vol. 1, pp. 61–64
Google Scholar
C.H. Lee, On stochastic feature and model compensation approaches to robust speech recognition. Speech Commun. 25, 29–47 (1998)
Article Google Scholar
C.H. Lee, Q. Huo, On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc. IEEE 88(8), 1241–1269 (2000)
Article Google Scholar
T. Quatieri, D. Reynolds, G. O’Leary, Estimation of handset nonlinearity with application to speaker recognition. IEEE Trans. Speech Audio Process. 8, 567–584 (2000)
Article Google Scholar
H.A. Murthy, F. Beaufays, L.P. Heck, M. Weintraub, Robust text-independent speaker identification over telephone channels. IEEE Trans. Speech Audio Process. 7(5), 554–568 (1999)
Article Google Scholar
R. Teunen, B. Shahshahani, L. Heck, A model-based transformational approach to robust speaker recognition, in Proceeding of the Annual Conference of the International Speech Communication Association (INTERSPEECH ’00), Beijing, 2000, vol. 2, pp. 495–498
Google Scholar
J. Gauvain, C. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)
Article Google Scholar
C. Leggetter, P. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Comput. Speech Lang. 9, 171–185 (1995)
Article Google Scholar
D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Acoust. Speech Signal Process. 3(1), 72–83 (1995)
Article Google Scholar
D. Reynolds, T. Quatieri, R. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)
Article Google Scholar
D. Zhu, B. Ma, H. Li, Joint MAP adaptation of feature transformation and Gaussian mixture model for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’09), Taipei, 2009, pp. 4045–4048
Google Scholar
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006)
MATH Google Scholar
V. Digalakis, D. Rtischev, L. Neumeyer, E. Sa, Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)
Article Google Scholar
S. Kozat, K. Visweswariah, R. Gopinath, Feature adaptation based on Gaussian posteriors, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, 2006, pp. 221–224
Google Scholar
K.K. Yiu, M.W. Mak, S.Y. Kung, Environment adaptation for robust speaker verification, in Proceedings of the European Conference of Speech Communication and Technology (EUROSPEECH ’03), Geneva, 2003, vol. 2, pp. 2973–2976
Google Scholar
M.J.F. Gales, S.J. Young, Robust speech recognition in additive and convolutional noise using parallel model combination. Comput. Speech Lang. 9, 289–307 (1995)
Article Google Scholar
L.P. Wong, M. Russell, Text-dependent speaker verification under noisy conditions using parallel model combination, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’01), Salt Lake City, 2001, pp. 457–460
Google Scholar
P. Moreno, Speech recognition in noisy environments. PhD thesis, Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, 1996
Google Scholar
K.C. Sim, M.T. Luong, A trajectory-based parallel model combination with a unified static and dynamic parameter compensation for noisy speech recognition, in Proceedings of the Workshop on Automatic Speech Recognition and Understanding (ASRU ’11), Waikoloa, Dec 2011, pp. 107–112
Google Scholar
P.J. Moreno, B. Raj, R.M. Stern, A vector Taylor series approach for environment-independent speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, 1996, pp. 733–736
Google Scholar
A. Sankar, C.H. Lee, Stochastic matching for robust speech recognition. IEEE Signal Process. Lett. 1(8), 124–125 (1994)
Article Google Scholar
H. Liao, M.J.F. Gales, Joint uncertainty decoding for noise robust speech recognition, in Proceedings of 6th Annual Conference of the International Speech Communication Association (INTERSPEECH ’05), Lisbon, 2005
Google Scholar
J. Ming, D. Stewart, S. Vaseghi, Speaker identification in unknown noisy conditions – a universal compensation approach, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’05), Philadelphia, 2005
Google Scholar
J. Ming, T.J. Hazen, J.R. Glass, D. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)
Article Google Scholar
A. Drygajlo, M. El-Maliki, Speaker verification in noisy environment with combined spectral subtraction and missing data theory, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’98), Seattle, 1998
Google Scholar
D. Burton, Text-dependent speaker verification using vector quantization source coding. IEEE Trans. Acoust. Speech Signal Process. 35(2), 133–143 (1987)
Article Google Scholar
T. Kinnunen, E. Karpov, P. Franti, Real-time speaker identification and verification. IEEE Trans. Audio Speech Lang. Process. 14(1), 277–288 (2006)
Article Google Scholar
M.W. Mak, R. Hsiao, B. Mak, A comparison of various adaptation methods for speaker verification with limited enrollment data, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’06), Toulouse, 2006, pp. 929–932
Google Scholar
V. Hautamaki, T. Kinnunen, I. Karkkainen, M. Tuononen, J. Saastamoinen, P. Franti, Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Process. Lett. 15, 162–165 (2008)
Article Google Scholar
P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Factor analysis simplified, in Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP ’05), Philadelphia, 2005, vol. 1, pp. 637–640
Google Scholar
K. Farrell, R. Mammone, K. Assaleh, Speaker recognition using neural networks and conventional classifiers. IEEE Trans. Speech Audio Process. 2(1), 195–204 (1994)
Article Google Scholar
L.P. Heck, Y. Konig, M. Sonmez, M. Weintraub, Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Commun. 31, 181–192 (2000)
Article Google Scholar
B. Yegnanarayana, S.P. Kishore, AANN: an alternative to GMM for pattern recognition. Neural Netw. 15, 456–469 (2002)
Article Google Scholar
W. Campbell, J. Campbell, D. Reynolds, E. Singer, P. Carrasquillo, Support vector machines for speaker and language recognition. Comput. Speech Lang. 20, 210–229 (2006)
Article Google Scholar
W. Campbell, J. Campbell, D. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
V. Wan, S. Renals, Speaker verification using sequence discriminant support vector machines. IEEE Trans. Acoust. Speech Audio Process. 13(2), 203–210 (2005)
Article Google Scholar
C.H. You, K.A. Lee, H. Li, An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Process. Lett. 16(1), 49–52 (2009)
Article Google Scholar
A. Solomonoff, C. Quillen, I. Boardman, Channel compensation for SVM speaker recognition, in IEEE Workshop on Speaker and Language Recognition (Odyssey ’04), Toledo, 2004, pp. 57–62
Google Scholar
A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceedings of the International Conference of Spoken Language Processing (ICSLP ’05), Lisbon, Portugal, 2005
Google Scholar
P. Kenny, G. Boulianne, P. Dumouchel, Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)
Article Google Scholar
N. Dehak, R. Dehak, P. Kenny, N. Brummer, P. Ouellet, P. Dumouchel, Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, in Proceeding of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH ’09), Brighton, 2009
Google Scholar
N. Dehak, P. Kenny, R. Dehak, O. Glembek, P. Dumouchel, L. Burget, V. Hubeika, F. Castaldo, Support vector machines and joint factor analysis for speaker verification, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’09), Taipei, 2009, pp. 4237–4240
Google Scholar
M. McLaren, D. van Leeuwen, Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Trans. Audio Speech Lang. Process. 20(3), 755–766 (2012)
Article Google Scholar
T. Kinnunen, Spectral features for automatic text-independent speaker recognition. PhD thesis, Department of Computer Science, University of Joensuu, 2004
Google Scholar
N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology, Indian Institute of Technology, Kharagpur, West Bengal, India
K. Sreenivasa Rao
Indian Institute of Technology Kharagpur, Kharagpur, India
Sourjya Sarkar

Authors

K. Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
Sourjya Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rao, K.S., Sarkar, S. (2014). Robust Speaker Verification: A Review. In: Robust Speaker Recognition in Noisy Environments. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-07130-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-07130-5_2
Published: 21 May 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07129-9
Online ISBN: 978-3-319-07130-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics