Wireless Personal Communications

, Volume 104, Issue 3, pp 895–905 | Cite as

Fusion Multistyle Training for Speaker Identification of Disguised Speech

  • Swati PrasadEmail author
  • Ramjee Prasad


Determining the speaker of a given speech utterance from a group of people is referred to as speaker identification. When voice disguising is done by a person, which is commonly seen in crime scenes, a mismatch between the training and the test speech data occurs, referred to as mismatched problem. It markedly decreases the performance of the speaker identification system. To address this mismatched problem, various multistyle training strategies and a fusion method were previously studied by the authors. This paper further investigates the performance of three multiple-model methods at the decision level for this mismatched problem and compare its performance with the previously studied multistyle training strategies. It is found that the fusion of the two multistyle training strategies, outperformed all other single style training and the multiple-model methods investigated on an average across the different test speech data. This fusion multistyle training technique can be easily employed in a security conscious organization, where monitoring of the employees are required.


Multistyle training Multiple-model Voice disguise Robust speaker identification Biometric 



  1. 1.
    Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.CrossRefGoogle Scholar
  2. 2.
    Doddington, G. R. (1985). Speaker recognition—Identifying people by their voices. Proceedings of the IEEE, 73(11), 1651–1664.CrossRefGoogle Scholar
  3. 3.
    Gish, H., & Schmidt, M. (1994). Text-independent speaker identification. IEEE Signal Processing Magazine, 11(4), 18–32.CrossRefGoogle Scholar
  4. 4.
    Campbell, J. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437–1462.CrossRefGoogle Scholar
  5. 5.
    Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communications, 52(1), 12–40.CrossRefGoogle Scholar
  6. 6.
    Togneri, R., & Pullella, D. (2011). An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits and Systems Magazine, 11(2), 23–61.CrossRefGoogle Scholar
  7. 7.
    Jain, A. K., Ross, A., & Prabhakar, S. (2004). An introduction to biometric recognition. IEEE Transactions on Circuits and Systems for Video Technology, 14(1), 4–20.CrossRefGoogle Scholar
  8. 8.
    Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans. IEEE Signal Processing Magazine, 36, 74–99.CrossRefGoogle Scholar
  9. 9.
    Zheng, T. F., & Li, L. (2017). Robustness-related issues in speaker recognition. Berlin: Springer.CrossRefGoogle Scholar
  10. 10.
    Prasad, S., Tan, Z.-H., & Prasad, R. (2017). Feature frame selection for robust speaker identification: A hybrid approach. Wireless Personal Communications, 97(1), 1–18.CrossRefGoogle Scholar
  11. 11.
    Kim, K., & Kim, M. Y. (2010). Robust speaker recognition against background noise in an enhanced multicondition domain. IEEE Transactions on Consumer Electronics, 56(3), 1684–1688.CrossRefGoogle Scholar
  12. 12.
    Zao, L., & Coelho, R. (2011). Colored noise based multicondition training for robust speaker identification. IEEE Signal Processing Letters, 18(11), 675–678.CrossRefGoogle Scholar
  13. 13.
    Govindan, S. M., Duraisamy, P., & Yuan, X. (2014). Adaptive wavelet shrinkage for noise robust speaker recognition. Digital Signal Processing, 33, 180–190.CrossRefGoogle Scholar
  14. 14.
    Venturini, A., Zao, L., & Coelho, R. (2014). On speech features fusion, integration Gaussian modeling and multi-style training for noise robust speaker classification. IEEE/ACM Transactions on Audio, Speech and Language Processing, 22(12), 1951–1964.CrossRefGoogle Scholar
  15. 15.
    Ghiurcau, M. V., Rusu, C. & Astola, J. (2011). A study of the effect of emotional state upon text-independent speaker identification. In Proceedings of ICASSP, Prague, Czech Republic.Google Scholar
  16. 16.
    Gallardo, L. F., Mller, S., & Wagner, M. (2013). Human speaker identification of known voices transmitted through different user interfaces and transmission channels. Vancouver, BC: ICASSP.CrossRefGoogle Scholar
  17. 17.
    McLaren, M., Scheffer, N., Graciarena, M., Ferrer, L., & Lei, Y. (2013). Improving speaker identification robustness to highly channel-degraded speech through multiple system fusion. Vancouver, BC: ICASSP.CrossRefGoogle Scholar
  18. 18.
    Kelly, F., Drygajlo, A. & Harte, N. (2012). Speaker verification with long-term ageing data. In Proceedings of International Conference on Biometrics, New Delhi, India.Google Scholar
  19. 19.
    Perrot, P., Aversano, G., & Chollet, G. (2007). Voice disguise and automatic detection: Review and perspectives. InProgress in nonlinear speech processing, p. 101117.Google Scholar
  20. 20.
    Eriksson, A., & Wretling, P. (1997). How flexible is the human voice?—A case study of mimicry. Eurospeech, 97(2), 1043–1046.Google Scholar
  21. 21.
    Rodman, R. D. (1998). Speaker recognition of disguised voices: A program for research. In Proceedings of consortium on speech technology in conjunction with the conference on speaker recognition by man and machine: Directions for Forensic Applications COST250, Ankara, Turkey.Google Scholar
  22. 22.
    Reich, A. R., & Duke, J. E. (1979). Effects of selected vocal disguises upon speaker identification by listening. The Journal of the Acoustical Society of America, 66(4), 1023–1028.CrossRefGoogle Scholar
  23. 23.
    Reich, A. R., Kenneth, L. M., & Curtis, J. F. (1976). Effects of selected vocal disguises upon spectrographic speaker identification. The Journal of the Acoustical Society of America, 60(4), 919–925.CrossRefGoogle Scholar
  24. 24.
    Zhang, C., & Tan, T. (2008). Voice disguise and automatic speaker recognition. Forensic Science International, 175(2), 118–122.CrossRefGoogle Scholar
  25. 25.
    Dilda, G. S. & Hollien, H. (2015). Voice disguise in speaker identification. In Proceedings of meetings on acoustics, Vol. 25, no. 1.Google Scholar
  26. 26.
    Grimaldi, M., & Cummins, F. (2009). Speech style and speaker recognition: a case study. In INTERSPEECH, Bighton, UK.Google Scholar
  27. 27.
    Lippmann, R., Martin, E., & Paul, D. B. (1987). Multi-style training for robust isolated-word speech recognition. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, Dallas, Texas, USA.Google Scholar
  28. 28.
    Zao, L., & Coelho, R. (2011). Colored noise based multicondition training for robust speaker identification. IEEE Signal Processing Letters, 18(11), 675–678.CrossRefGoogle Scholar
  29. 29.
    Xu, H., Tan, Z.-H., Dalsgaard, P., & Lindberg, B. (2005). Robust speech recognition based on noise and SNR classification—A multiple-model framework. In INTERSPEECH, Lisbon, Portugal.Google Scholar
  30. 30.
    Prasad, S., Tan, Z.-H. & Prasad, R. (2013). Multistyle training and fusion for speaker identification of disguised voice. In 1st international conference on communications, connectivity, convergence, content and cooperation (IC5), Mumbai, India.Google Scholar
  31. 31.
    Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. In IEEE transactions on speech and audio processing, Dallas.Google Scholar
  32. 32.
    Cummins, F., & Grimaldi, M. (2006). The chains corpus: Characterizing individual speakers. In Proceedings of SPECOM, St. Petersburg, Russia.Google Scholar
  33. 33.
    Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. The HTK book version 3.4, Cambridge University Engineering Department. Accessed on 29 November 2017.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringBirla Institute of Technology, MesraRanchiIndia
  2. 2.Department of Business Development and TechnologyAarhus UniversityHerningDenmark

Personalised recommendations