Skip to main content

Robust Speaker Verification: A Review

  • Chapter
  • First Online:

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

This chapter provides an overview of various feature and model-based approaches developed in past for robust speaker recognition. The advantages and disadvantages of some standard methods applied for robust speaker verification tasks have been highlighted. The main focus is to summarily introduce popular state-of-the-art techniques adopted for enhancing speaker verification performance in noisy conditions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. J. Campbell, Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  2. F. Bimbot, J.F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacrétaz, D.A. Reynolds, A tutorial on text-independent speaker verification. EURASIP J. Adv. Signal Process. (Spec. Issue Biom. Signal Process.) 4(4), 430–451 (2004)

    Google Scholar 

  3. B.G.B. Fauve, D. Matrouf, N. Scheffer, J.F. Bonastre, J.S.D. Mason, State-of-the-art performance in text-independent speaker verification through open-source software. IEEE Trans. Audio Speech Lang. Process. 15(7), 1960–1968 (2007)

    Article  Google Scholar 

  4. T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)

    Article  Google Scholar 

  5. S. Sarkar, Robust speaker recognition in noisy environments. Master’s thesis, School of Information Technology, Indian Institute of Technology Kharagpur, Mar 2014

    Google Scholar 

  6. R. Schafer, L. Rabiner, Digital representations of speech signals. Proc. IEEE 63(4), 662–677 (1975)

    Article  Google Scholar 

  7. B. Atal, Automatic recognition of speakers from their voices. Proc. IEEE 64(4), 460–475 (1976)

    Article  Google Scholar 

  8. J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  9. S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  10. A. Acero, Acoustical and environmental robustness in automatic speech recognition. PhD thesis, Carnegie Mellon University, Sept 1990

    Google Scholar 

  11. D.A. Reynolds, Experimental evaluation of features for robust speaker identification. IEEE Trans. Speech Audio Process. 2(4), 639–643 (1994)

    Article  Google Scholar 

  12. R. Mammone, X. Zhang, R. Ramachandran, Robust speaker recognition: a feature-based approach. IEEE Signal Process. Mag. 13(5), 58–71 (1996)

    Article  Google Scholar 

  13. D. Reynolds, The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, 1996, vol. 1, pp. 113–116

    Google Scholar 

  14. K.S. Rao, J. Yadav, S. Sarkar, S.G. Koolagudi, A.K. Vuppala, Neural network based feature transformation for emotion independent speaker identification. Int. J. Speech Technol. (Springer) 15(3), 335–349 (2012)

    Google Scholar 

  15. S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)

    Article  Google Scholar 

  16. H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)

    Article  Google Scholar 

  17. A. Kocsor, L. Toth, Kernel-based feature extraction with a speech technology application. IEEE Trans. Signal Process. 52(8), 2250–2263 (2004)

    Article  MathSciNet  Google Scholar 

  18. T.G. Stockham, T.M. Cannon, R.B. Ingebretsen, Blind deconvolution through digital signal processing. Proc. IEEE 63(4), 678–692 (1975)

    Article  Google Scholar 

  19. S. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)

    Article  Google Scholar 

  20. A. Erell, M. Weintraub, Spectral estimation for noise robust speech recognition, in Proceedings of DARPA Speech and Natural Language Workshop, Philadelphia, 1989

    Google Scholar 

  21. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)

    Article  Google Scholar 

  22. A. Acero, R.M. Stern, Environmental robustness in automatic speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’90), Albuquerque, 1990, vol. 2, pp. 849–852

    Google Scholar 

  23. S. Suhadi, S. Stan, T. Fingscheidt, C. Beaugeant, An evaluation of VTS and IMM for speaker verification in noise, in Proceedings of 4th Annual Conference of the International Speech Communication Association (INTERSPEECH ’03), Geneva, 2003, pp. 1669–1672

    Google Scholar 

  24. L. Deng, J. Droppo, A. Acero, Recursive estimation of non-stationary noise using iterative stochastic approximation for robust speech recognition. IEEE Trans. Speech Audio Process. 11(6), 568–580 (2003)

    Article  Google Scholar 

  25. P.J. Moreno, B. Raj, R.M. Stern, Data-driven environmental compensation for speech recognition: a unified approach. Speech Commun. 24(4), 267–285 (1998)

    Article  Google Scholar 

  26. L. Deng, A. Acero, M. Plumpe, X. Huang, Large-vocabulary speech recognition under adverse acoustic environments, in Proceedings of the International Conference of Spoken Language Processing (ICSLP ’00), Beijing, 2000, pp. 806–809

    Google Scholar 

  27. L. Deng, A. Acero, L. Jiang, J. Droppo, X. Huang, High-performance robust speech recognition using stereo training data, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, 2001, vol. 1, pp. 301–304

    Google Scholar 

  28. L. Buera, E. Lleida, A. Miguel, A. Ortega, Multi-environment models based linear normalization for speech recognition in car conditions, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’04), Montreal, 2004

    Google Scholar 

  29. M. Afify, X. Cui, Y. Gao, Stereo-based stochastic mapping for robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 17(7), 1325–1334 (2009)

    Article  Google Scholar 

  30. L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Commun. 50, 782–796 (2008)

    Article  Google Scholar 

  31. A.G. Adami, R. Mihaescu, D.A. Reynolds, J.J. Godfrey, Modeling prosodic dynamics for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’03), Hong Kong, 2003

    Google Scholar 

  32. L. Ferrer, E. Shriberg, S. Kajarekar, K. Sonmez, Parameterization of prosodic feature distributions for SVM modeling in speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’07), Honolulu, 2007, pp. 233–236

    Google Scholar 

  33. S.G. Koolkagudi, K.S. Rao, R. Reddy, A.K. Vuppala, S. Chakrabarti, Robust speaker recognition in noisy environments: using dynamics of speaker-specific prosody, in Forensic Speaker Recognition (Springer, New York, USA, 2013), pp. 183–204

    Google Scholar 

  34. E. Shriberg, L. Ferrer, S. Kajarekar, A. Venkataraman, A. Stolckea, Modeling prosodic feature sequences for speaker recognition. Speech Commun. 46, 455–472 (2005)

    Article  Google Scholar 

  35. G. Doddington, Speaker recognition based on idiolectal differences between speakers, in Proceedings of the European Conference of Speech Communication Technology (EUROSPEECH ’01), Aalborg, 2001, pp. 2521–2524

    Google Scholar 

  36. W.M. Campbel, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek, Phonetic speaker recognition with support vector machines, in Proceedings of the Neural Information Processing Systems Conference, Vancouver, 2003, pp. 1377–1384

    Google Scholar 

  37. K. yee Leung, M. wai Mak, M. Siu, S. yuan Kung, Adaptive articulatory feature-based conditional pronunciation modeling for speaker verification. Speech Commun. 48, 71–84 (2006)

    Google Scholar 

  38. B. Ma, D. Zhu, H. Li, R. Tong, Speaker cluster based GMM tokenization for speaker recognition, in Proceeding of the 7th Annual Conference of the International Speech Communication Association (INTERSPEECH ’06), Pittsburgh, 2006

    Google Scholar 

  39. B. Ma, H. Li, R. Tong, Spoken language recognition using ensemble classifiers. IEEE Trans. Audio Speech Lang. Process. 15(7), 2053–2062 (2007)

    Article  Google Scholar 

  40. D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adomi, Q. Jin, D. Kluracek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, S. Xiang’, The supersid project: exploiting high-level information for high-accuracy speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’03), Hong Kong, 2003

    Google Scholar 

  41. H. Hermansky, Perceptual linear prediction (PLP) analysis for speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)

    Article  Google Scholar 

  42. L. Rabiner, B.H. Juang, Fundamentals of Speech Recognition, 1st edn. (Prentice-Hall, Englewood Cliffs, 1993)

    Google Scholar 

  43. X. Huang, A. Acero, H. Hon, Spoken Language Processing: a Guide to Theory, Algorithm, and System Development (Prentice Hall, Upper Saddle River, 2001)

    Google Scholar 

  44. S. Sarkar, K.S. Rao, D. Nandi, Multilingual speaker recognition on Indian languages, in IEEE INDICON, Mumbai (IIT Mumbai, Mumbai, 2013)

    Google Scholar 

  45. J.W. Suh, S.O. Sadjadi, G. Liu, T. Hasan, K.W. Godin, J.H. Hansen, Exploring Hilbert envelope based acoustic features in i-vector speaker verification using HT-PLDA, in Proceedings of NIST Speaker Recognition Evaluation Workshop, Gaithersburg, USA, 2011

    Google Scholar 

  46. C. Kim, R.M. Stern, Power-normalized cepstral coefficients (PNCC) for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’12), Kyoto, 2012

    Google Scholar 

  47. V. Mitra, H. Franco, M. Graciarena, A. Mandal, Normalized amplitude modulation features for large vocabulary noise-robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’12), Kyoto, 2012

    Google Scholar 

  48. A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Int. J. Adapt. Control Signal Process. 27(9), 781–792 (2013). Wiley

    Google Scholar 

  49. A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imaging Syst. Eng. 6(3), 130–137 (2013)

    Article  Google Scholar 

  50. K.S. Rao, S. Maity, V.R. Reddy, Pitch synchronous and glottal closure based speech analysis for language recognition. Int. J. Speech Technol. 16, 413–430 (2013). Springer

    Google Scholar 

  51. T. Kristjansson, B. Frey, Accounting for uncertainity in observations: a new paradigm for robust speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’02), Orlando, 2002, vol. 1, pp. 61–64

    Google Scholar 

  52. C.H. Lee, On stochastic feature and model compensation approaches to robust speech recognition. Speech Commun. 25, 29–47 (1998)

    Article  Google Scholar 

  53. C.H. Lee, Q. Huo, On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc. IEEE 88(8), 1241–1269 (2000)

    Article  Google Scholar 

  54. T. Quatieri, D. Reynolds, G. O’Leary, Estimation of handset nonlinearity with application to speaker recognition. IEEE Trans. Speech Audio Process. 8, 567–584 (2000)

    Article  Google Scholar 

  55. H.A. Murthy, F. Beaufays, L.P. Heck, M. Weintraub, Robust text-independent speaker identification over telephone channels. IEEE Trans. Speech Audio Process. 7(5), 554–568 (1999)

    Article  Google Scholar 

  56. R. Teunen, B. Shahshahani, L. Heck, A model-based transformational approach to robust speaker recognition, in Proceeding of the Annual Conference of the International Speech Communication Association (INTERSPEECH ’00), Beijing, 2000, vol. 2, pp. 495–498

    Google Scholar 

  57. J. Gauvain, C. Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)

    Article  Google Scholar 

  58. C. Leggetter, P. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density HMMs. Comput. Speech Lang. 9, 171–185 (1995)

    Article  Google Scholar 

  59. D.A. Reynolds, R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Acoust. Speech Signal Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  60. D. Reynolds, T. Quatieri, R. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1), 19–41 (2000)

    Article  Google Scholar 

  61. D. Zhu, B. Ma, H. Li, Joint MAP adaptation of feature transformation and Gaussian mixture model for speaker recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’09), Taipei, 2009, pp. 4045–4048

    Google Scholar 

  62. C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006)

    MATH  Google Scholar 

  63. V. Digalakis, D. Rtischev, L. Neumeyer, E. Sa, Speaker adaptation using constrained estimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)

    Article  Google Scholar 

  64. S. Kozat, K. Visweswariah, R. Gopinath, Feature adaptation based on Gaussian posteriors, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, 2006, pp. 221–224

    Google Scholar 

  65. K.K. Yiu, M.W. Mak, S.Y. Kung, Environment adaptation for robust speaker verification, in Proceedings of the European Conference of Speech Communication and Technology (EUROSPEECH ’03), Geneva, 2003, vol. 2, pp. 2973–2976

    Google Scholar 

  66. M.J.F. Gales, S.J. Young, Robust speech recognition in additive and convolutional noise using parallel model combination. Comput. Speech Lang. 9, 289–307 (1995)

    Article  Google Scholar 

  67. L.P. Wong, M. Russell, Text-dependent speaker verification under noisy conditions using parallel model combination, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’01), Salt Lake City, 2001, pp. 457–460

    Google Scholar 

  68. P. Moreno, Speech recognition in noisy environments. PhD thesis, Electrical & Computer Engineering Department, Carnegie Mellon University, Pittsburgh, 1996

    Google Scholar 

  69. K.C. Sim, M.T. Luong, A trajectory-based parallel model combination with a unified static and dynamic parameter compensation for noisy speech recognition, in Proceedings of the Workshop on Automatic Speech Recognition and Understanding (ASRU ’11), Waikoloa, Dec 2011, pp. 107–112

    Google Scholar 

  70. P.J. Moreno, B. Raj, R.M. Stern, A vector Taylor series approach for environment-independent speech recognition, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Atlanta, 1996, pp. 733–736

    Google Scholar 

  71. A. Sankar, C.H. Lee, Stochastic matching for robust speech recognition. IEEE Signal Process. Lett. 1(8), 124–125 (1994)

    Article  Google Scholar 

  72. H. Liao, M.J.F. Gales, Joint uncertainty decoding for noise robust speech recognition, in Proceedings of 6th Annual Conference of the International Speech Communication Association (INTERSPEECH ’05), Lisbon, 2005

    Google Scholar 

  73. J. Ming, D. Stewart, S. Vaseghi, Speaker identification in unknown noisy conditions – a universal compensation approach, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’05), Philadelphia, 2005

    Google Scholar 

  74. J. Ming, T.J. Hazen, J.R. Glass, D. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)

    Article  Google Scholar 

  75. A. Drygajlo, M. El-Maliki, Speaker verification in noisy environment with combined spectral subtraction and missing data theory, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’98), Seattle, 1998

    Google Scholar 

  76. D. Burton, Text-dependent speaker verification using vector quantization source coding. IEEE Trans. Acoust. Speech Signal Process. 35(2), 133–143 (1987)

    Article  Google Scholar 

  77. T. Kinnunen, E. Karpov, P. Franti, Real-time speaker identification and verification. IEEE Trans. Audio Speech Lang. Process. 14(1), 277–288 (2006)

    Article  Google Scholar 

  78. M.W. Mak, R. Hsiao, B. Mak, A comparison of various adaptation methods for speaker verification with limited enrollment data, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’06), Toulouse, 2006, pp. 929–932

    Google Scholar 

  79. V. Hautamaki, T. Kinnunen, I. Karkkainen, M. Tuononen, J. Saastamoinen, P. Franti, Maximum a posteriori adaptation of the centroid model for speaker verification. IEEE Signal Process. Lett. 15, 162–165 (2008)

    Article  Google Scholar 

  80. P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Factor analysis simplified, in Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP ’05), Philadelphia, 2005, vol. 1, pp. 637–640

    Google Scholar 

  81. K. Farrell, R. Mammone, K. Assaleh, Speaker recognition using neural networks and conventional classifiers. IEEE Trans. Speech Audio Process. 2(1), 195–204 (1994)

    Article  Google Scholar 

  82. L.P. Heck, Y. Konig, M. Sonmez, M. Weintraub, Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Commun. 31, 181–192 (2000)

    Article  Google Scholar 

  83. B. Yegnanarayana, S.P. Kishore, AANN: an alternative to GMM for pattern recognition. Neural Netw. 15, 456–469 (2002)

    Article  Google Scholar 

  84. W. Campbell, J. Campbell, D. Reynolds, E. Singer, P. Carrasquillo, Support vector machines for speaker and language recognition. Comput. Speech Lang. 20, 210–229 (2006)

    Article  Google Scholar 

  85. W. Campbell, J. Campbell, D. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)

    Article  Google Scholar 

  86. V. Wan, S. Renals, Speaker verification using sequence discriminant support vector machines. IEEE Trans. Acoust. Speech Audio Process. 13(2), 203–210 (2005)

    Article  Google Scholar 

  87. C.H. You, K.A. Lee, H. Li, An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Process. Lett. 16(1), 49–52 (2009)

    Article  Google Scholar 

  88. A. Solomonoff, C. Quillen, I. Boardman, Channel compensation for SVM speaker recognition, in IEEE Workshop on Speaker and Language Recognition (Odyssey ’04), Toledo, 2004, pp. 57–62

    Google Scholar 

  89. A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceedings of the International Conference of Spoken Language Processing (ICSLP ’05), Lisbon, Portugal, 2005

    Google Scholar 

  90. P. Kenny, G. Boulianne, P. Dumouchel, Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)

    Article  Google Scholar 

  91. N. Dehak, R. Dehak, P. Kenny, N. Brummer, P. Ouellet, P. Dumouchel, Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, in Proceeding of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH ’09), Brighton, 2009

    Google Scholar 

  92. N. Dehak, P. Kenny, R. Dehak, O. Glembek, P. Dumouchel, L. Burget, V. Hubeika, F. Castaldo, Support vector machines and joint factor analysis for speaker verification, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP ’09), Taipei, 2009, pp. 4237–4240

    Google Scholar 

  93. M. McLaren, D. van Leeuwen, Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources. IEEE Trans. Audio Speech Lang. Process. 20(3), 755–766 (2012)

    Article  Google Scholar 

  94. T. Kinnunen, Spectral features for automatic text-independent speaker recognition. PhD thesis, Department of Computer Science, University of Joensuu, 2004

    Google Scholar 

  95. N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 The Author(s)

About this chapter

Cite this chapter

Rao, K.S., Sarkar, S. (2014). Robust Speaker Verification: A Review. In: Robust Speaker Recognition in Noisy Environments. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-07130-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07130-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07129-9

  • Online ISBN: 978-3-319-07130-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics