Skip to main content

Speaker-Related Robustness Issues

  • Chapter
  • First Online:
Robustness-Related Issues in Speaker Recognition

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSIGNAL))

Abstract

Speaker dependent factors, such as gender, physical condition (cold or laryngitis), speaking style (emotion state, speech rate, etc.), cross-language, accent and session variations, are major concerns in speech signal processing. How they correlate with each other and what the key factors are in speech realization are real considerations in research [1]. The current mainstream research can be divided into five directions which will be described in the following subsections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang C, Chen T, Li SZ et al (2001) Analysis of speaker variability. In: INTERSPEECH. pp 1377–1380

    Google Scholar 

  2. Cumani S, Glembek O, Brümmer N et al (2012) Gender independent discriminative speaker recognition in i-vector space. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 4361–4364

    Google Scholar 

  3. McLaren M, van Leeuwen DA (2012) Gender-independent speaker recognition using source normalization. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 4373–4376

    Google Scholar 

  4. Tull RG, Rutledge JC (1996) Analysis of ‘‘cold-affected’’ speech for inclusion in speaker recognition systems. J Acoust Soc Am 99(4):2549–2574

    Article  Google Scholar 

  5. Tull RG, Rutledge JC (1996) ‘Cold speech’ for automatic speaker recognition. In: Acoustical Society of America 131st Meeting Lay Language Papers

    Google Scholar 

  6. Tull RG, Rutledge JC, Larson CR (1996) Cepstral analysis of ‘‘cold‐speech’’ for speaker recognition: a second look. Dissertation, ASA

    Google Scholar 

  7. Tull RG (1999) Acoustic analysis of cold-speech: implications for speaker recognition technology and the common cold. Northwestern University

    Google Scholar 

  8. Kwon OW, Chan K, Hao J et al (2003) Emotion recognition by speech signals. In: INTERSPEECH

    Google Scholar 

  9. Juang BH (1991) Speech recognition in adverse environments. Comput Speech Lang 5(3):275–294

    Article  Google Scholar 

  10. Lippmann R, Martin E, Paul D (1987) Multi-style training for robust isolated-word speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’87, vol 12. IEEE, pp 705–708

    Google Scholar 

  11. Bie F, Wang D, Zheng TF et al (2013) Emotional speaker verification with linear adaptation. In: 2013 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 91–94

    Google Scholar 

  12. Zetterholm E (1998) Prosody and voice quality in the expression of emotions. In: ICSLP

    Google Scholar 

  13. Wu T, Yang Y, Wu Z (2005) Improving speaker recognition by training on emotion-added models. In: International Conference on Affective Computing and Intelligent Interaction. Springer, Berlin, Heidelberg, pp 382–389

    Google Scholar 

  14. Pereira C, Watson CI (1998) Some acoustic characteristics of emotion. In: ICSLP

    Google Scholar 

  15. Scherer KR, Johnstone T, Klasmeyer G et al (2000) Can automatic speaker verification be improved by training the algorithms on emotional speech? In: INTERSPEECH. pp 807–810

    Google Scholar 

  16. Scherer KR, Grandjean D, Johnstone T et al Acoustic correlates of task load and stress. In: INTERSPEECH

    Google Scholar 

  17. Shahin I (2009) Speaker identification in emotional environments. Iran J Electr Comput Eng 8(1):41–46

    Google Scholar 

  18. Bie F, Wang D, Zheng TF et al (2013) Emotional adaptive training for speaker verification. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, pp 1–4

    Google Scholar 

  19. Wu W, Zheng TF, Xu MX et al (2006) Study on speaker verification on emotional speech. In: INTERSPEECH

    Google Scholar 

  20. Shan Z, Yang Y (2008) Learning polynomial function based neutral-emotion GMM transformation for emotional speaker recognition. In: 19th International Conference on Pattern Recognition, 2008, ICPR 2008. IEEE, pp 1–4

    Google Scholar 

  21. Atal BS (1976) Automatic recognition of speakers from their voices. Proc IEEE 64(4):460–475

    Article  Google Scholar 

  22. Rozi A, Li L, Wang D et al (2016) Feature transformation for speaker verification under speaking rate mismatch condition. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, pp 1–4

    Google Scholar 

  23. van Heerden CJ, Barnard E (2007) Speech rate normalization used to improve speaker verification. In: Proceedings of the Symposium of the Pattern Recognition Association of South Africa. pp 2–7

    Google Scholar 

  24. Erman B, Warren B (2000) The idiom principle and the open choice principle. Text-Interdisc J Study Discourse 20(1):29–62

    Google Scholar 

  25. Makkai A (1972) Idiom structure in English. Walter de Gruyter

    Google Scholar 

  26. Cacciari C, Glucksberg S (1991) Understanding idiomatic expressions: the contribution of word meanings. Adv Psychol 77:217–240

    Article  Google Scholar 

  27. Leech G, Garside R, Bryant M (1994) CLAWS4: the tagging of the British National Corpus. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 622–628

    Google Scholar 

  28. Doddington GR (2001) Speaker recognition based on idiolectal differences between speakers. In: INTERSPEECH. pp 2521–2524

    Google Scholar 

  29. Kajarekar SS, Ferrer L, Shriberg E et al (2005) SRI’s 2004 NIST speaker recognition evaluation system. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP’05), vol 1. IEEE, I/173–I/176

    Google Scholar 

  30. Stolcke GTESA, Kajarekar S (2007) Duration and pronunciation conditioned lexical modeling for speaker verification

    Google Scholar 

  31. Andrews WD, Kohler MA, Campbell JP et al (2002) Gender-dependent phonetic refraction for speaker recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol 1. IEEE, pp I-149–I-152

    Google Scholar 

  32. Navrátil J, Jin Q, Andrews WD et al (2003) Phonetic speaker recognition using maximum-likelihood binary-decision tree models. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03), vol 4. IEEE, p IV-796

    Google Scholar 

  33. Jin Q, Navratil J, Reynolds DA et al (2003) Combining cross-stream and time dimensions in phonetic speaker recognition. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03). IEEE, p IV-800

    Google Scholar 

  34. Hatch AO, Peskin B, Stolcke A (2005) Improved phonetic speaker recognition using lattice decoding. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 (ICASSP’05), vol 1. IEEE, pp I/169–I/172

    Google Scholar 

  35. Auckenthaler R, Carey MJ, Mason JSD (2001) Language dependency in text-independent speaker verification. In: Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001 (ICASSP’01), vol 1. IEEE, pp 441–444

    Google Scholar 

  36. Ma B, Meng H (2004) English-Chinese bilingual text-independent speaker verification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004 (ICASSP’04), vol 5. IEEE, p V-293

    Google Scholar 

  37. Askar R, Wang D, Bie F et al (2015) Cross-lingual speaker verification based on linear transform. In: 2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 519–523

    Google Scholar 

  38. Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605

    Google Scholar 

  39. Van Der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245

    MathSciNet  MATH  Google Scholar 

  40. Wang J, Johnson MT (2013) Vocal source features for bilingual speaker identification. In: 2013 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP). IEEE, pp 170–173

    Google Scholar 

  41. Akbacak M, Hansen JHL (2007) Language normalization for bilingual speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV-257–IV-260

    Google Scholar 

  42. Nagaraja BG, Jayanna HS (2013) Combination of features for multilingual speaker identification with the constraint of limited data. Int J Comput Appl 70(6)

    Google Scholar 

  43. Akbacak M, Hansen JHL (2007) Language normalization for bilingual speaker recognition systems. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV-257–IV-260

    Google Scholar 

  44. Lu L, Dong Y, Zhao X et al (2009) The effect of language factors for robust speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, ICASSP 2009. IEEE, pp 4217–4220

    Google Scholar 

  45. Kersta LG (1962) Voiceprint identification. J Acoust Soc Am 34(5):725

    Article  Google Scholar 

  46. Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872

    Article  Google Scholar 

  47. Bonastre JF, Bimbot F, Boë LJ et al (2003) Person authentication by voice: a need for caution. In: INTERSPEECH

    Google Scholar 

  48. Mishra P (2012) A vector quantization approach to speaker recognition. In: Proceedings of the International Conference on Innovation & Research in Technology for sustainable development (ICIRT 2012), vol 1. p 152

    Google Scholar 

  49. Kato T, Shimizu T (2003) Improved speaker, verification over the cellular phone network using phoneme-balanced and digit-sequence-preserving connected digit patterns. In: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003 (ICASSP’03), vol 3. IEEE, p II-57

    Google Scholar 

  50. Hébert M (2008) Text-dependent speaker recognition. In: Springer handbook of speech processing. Springer, Berlin, Heidelberg, pp 743–762

    Google Scholar 

  51. Bimbot F, Bonastre JF, Fredouille C et al (2004) A tutorial on text-independent speaker verification. EURASIP J Appl Signal Process, 430–451

    Google Scholar 

  52. Markel J, Davis S (1979) Text-independent speaker recognition from a large linguistically unconstrained time-spaced data base. IEEE Trans Acoust Speech Signal Process 27(1):74–82

    Article  Google Scholar 

  53. Beigi H (2009) Effects of time lapse on speaker recognition results. In: 16th International Conference on Digital Signal Processing, 2009. IEEE, pp 1–6

    Google Scholar 

  54. Beigi H (2011) Fundamentals of speaker recognition. Springer Science & Business Media

    Google Scholar 

  55. Lamel LF, Gauvain JL (2000) Speaker verification over the telephone. Speech Commun 31(2):141–154

    Article  Google Scholar 

  56. Kelly F, Harte N (2011) Effects of long-term ageing on speaker verification. In: European Workshop on Biometrics and Identity Management. Springer, Berlin, Heidelberg, pp 113–124

    Google Scholar 

  57. Kelly F, Drygajlo A, Harte N (2012) Speaker verification with long-term ageing data. In: 2012 5th IAPR International Conference on Biometrics (ICB). IEEE, pp 478–483

    Google Scholar 

  58. Wang L, Wang J, Li L et al (2016) Improving speaker verification performance against long-term speaker variability. Speech Commun 79:14–29

    Article  Google Scholar 

  59. Wang L, Zheng TF (2010) Creation of time-varying voiceprint database. In: Proc. Oriental-COCOSDA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 The Author(s)

About this chapter

Cite this chapter

Zheng, T.F., Li, L. (2017). Speaker-Related Robustness Issues. In: Robustness-Related Issues in Speaker Recognition. SpringerBriefs in Electrical and Computer Engineering(). Springer, Singapore. https://doi.org/10.1007/978-981-10-3238-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3238-7_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3237-0

  • Online ISBN: 978-981-10-3238-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics