Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification

Abstract

In this paper, we propose two new feature extraction methods for robust automatic speaker verification under noisy conditions. The first method, called Multi-taper Gammatone Hilbert Envelope Coefficients (MGHECs), employs multi-taper magnitude spectra that offer considerable advantages for spectrum estimates. The second method, called Multi-taper Chirp Group Delay Zeros-Phase Hilbert Envelope Coefficients (MCGDZPHECs) based on multi-tapers phase spectral. The chirp group delay technique is used to estimate the vocal tract from the chirp Fourier transform phase. The performance evaluation of the proposed methods and their extended variants are carried out on NIST 2008 corpus under noisy conditions, using various noise SNR levels which are extracted from NOISEX-92. Experimental results show that the proposed methods provide better representation of speech spectrum. Moreover, we obtained a significant improvement in performance under noisy conditions when compared to conventional Mean Hilbert Envelope Coefficients (MHECs) feature extraction.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. 1.

    Ajmera PK, Holambe RS (2013) Fractional Fourier transform based features for speaker recognition using support vector machine. Comput Electr Eng 39:550–557

    Article  Google Scholar 

  2. 2.

    Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5:15400–15413

    Article  Google Scholar 

  3. 3.

    Alam MJ, Kinnunen T, Kenny P, Ouellet P, O’Shaughnessy D (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Comm 55:237–251

    Article  Google Scholar 

  4. 4.

    Alsteris LD, Paliwal KK (2006) Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Process 17:578–616

    Article  Google Scholar 

  5. 5.

    Ambikairajah E, Kua JMK, Sethu V, Li H (2012) PNCC-ivector-SRC based speaker verification. In: Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), Asia-Pacific, pp. 1–7

  6. 6.

    Apsingekar VR, De Leon PL (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118

    Article  Google Scholar 

  7. 7.

    Asbai N, Amrouche A (2017) Boosting scores fusion approach using Front-End Diversity and adaboost Algorithm, for speaker verification. Comput Electr Eng 62:648–662 Elsevier

    Article  Google Scholar 

  8. 8.

    Babadi B, Brown EN (2014) A review of multitaper spectral analysis. IEEE Trans Biomed Eng 61:1555–1564

    Article  Google Scholar 

  9. 9.

    Banno H, Takeda K, Itakura F (2001) A study on perceptual distance measure for phase spectrum of stimuli. In: Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3297–3300

  10. 10.

    Bhattacharjee U, Pranab Das (2013) Performance Evaluation of Wiener Filter and Kalman Filter Combined with Spectral Subtraction in Speaker Verification System, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278–3075, vol-2

  11. 11.

    Bousquet PM, Bonastre JF, Matrouf D (2014) Exploring some limits of Gaussian PLDA modeling for i-vector distributions. In Odyssey: The Speaker and Language Recognition Workshop, 41–47

  12. 12.

    Bozkurt B, Couvreur L, Dutoit T (2007) Chirp group delay analysis of speech signals. Speech Comm 49:159–176

    Article  Google Scholar 

  13. 13.

    Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-Vectors and speech separation. In: IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), pp. 5415–5419

  14. 14.

    Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL (2009) Investigation on LP-residual representations for speaker identification. Pattern Recognition 42:487–494

    Article  MATH  Google Scholar 

  15. 15.

    Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19:788–798

    Article  Google Scholar 

  16. 16.

    Fedila M, Bengherabi M, Amrouche A (2015) Consolidating product spectrum and gammatone filterbank for robust speaker verification under noisy conditions. In: International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 347–352

  17. 17.

    Fedila M, Harizi F, Bengherabi M, Amrouche A (2014) Robust speaker verification using a new front end based on multitaper and gammatone filters. In: Tenth International Conference on Signal Image Technology and Internet-Based Systems (SITIS). IEEE, pp. 99–103

  18. 18.

    Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In Interspeech, pp. 249–252

  19. 19.

    Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32:74–99

    Article  Google Scholar 

  20. 20.

    Hasan T, Hansen JH (2013) Acoustic factor analysis for robust speaker verification. IEEE Trans Audio Speech Lang Process 21:842–853

    Article  Google Scholar 

  21. 21.

    Hegde RM, Murthy HA, Gadde VRR (2007) Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15(1):190–202

    Article  Google Scholar 

  22. 22.

    Introduction page for Chirp Group Delay processing. Available at: http://tcts.fpms.ac.be/demos/zzt/cgd.html. Accessed 25 Nov 2018

  23. 23.

    Jeevan M, Dhingra A, Hanmandlu M, Panigrahi BK (2017) Robust speaker verification using GFCC based i-Vectors. In: Proceedings of the International Conference on Signal, Networks, Computing, and Systems (pp. 85–91). Springer, New Delhi

  24. 24.

    Kanagasundaram A, Vogt RJ, Dean DB, Sridharan S (2012) PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition Workshop

  25. 25.

    Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus Eigenchannels inspeaker recognition. IEEE Trans Audio Speech and Lang Process 15:1435–1447

    Article  Google Scholar 

  26. 26.

    Kim S, Ji M, Kim H (2008) Noise Robust Speaker Recognition Using Subband Likelihoods and Reliable Feature Selection. ETRI J 30:89–100

    Article  Google Scholar 

  27. 27.

    Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Trans Audio Speech Lang Process 24:1315–1329

    Article  Google Scholar 

  28. 28.

    Kinnunen T, Alam MJ, Mate ˇjka P, Kenny P, C ˇ ernocky J, OShaughnessy D (2013) Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations. In: Proc. INTERSPEECH. Lyon, France, pp. 3122–3126

  29. 29.

    Kinnunen T, Rajan P (2013) A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In: Proceedings of ICASSP, pp. 7229–7233

  30. 30.

    Kinnunen T, Saeidi R, Sedlak F, Lee KA, Sandberg J, Hansson-Sandsten M, Li H (2012) Low-variance multitaper MFCC features: a case study in robust speaker verification. IEEE Trans Audio Speech Lang Proc, pp. 1990–2001

  31. 31.

    Krobba A, Debyeche M, Selouani SA (2018) Robust speaker verification system in acoustic noise mobile by using Multitaper Gammatone Hilbert envelope coefficients. 2nd International Conference on Natural Language and Speech Processing (ICNLSP), (pp. 1–6). IEEE

  32. 32.

    Narendra1 KC, Kumaraswamy R, Gurugopinath S, (2017). Performance comparison of multitaper techniques for speaker verification with expressive speech. International Journal of Speech Technology 1–10

  33. 33.

    Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40

    Article  Google Scholar 

  34. 34.

    Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406

    Article  Google Scholar 

  35. 35.

    Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723

    Article  Google Scholar 

  36. 36.

    Murthy HA, Yegnanarayana B (1991) Speech processing using group delay functions. Signal Process 22:259–267

    Article  Google Scholar 

  37. 37.

    Paliwal KK, Wojcicki K, Shannon B (2011) The importance of phase in speech enhancement. Speech Comm 53:465–494

    Article  Google Scholar 

  38. 38.

    Pohjalainen J, Hanilçi C, Kinnunen T, Alku P (2014) Mixture linear prediction in speaker verification under vocal effort mismatch. IEEE Signal Processing Letters 21(12):1516–1520

    Article  Google Scholar 

  39. 39.

    Prieto GA, Parker RL, Thomson DJ, Vernon FL, Graham RL (2007) Reducing the bias of multitaper spectrum estimates. Geophys J Int 171:1269–1281

    Article  Google Scholar 

  40. 40.

    Rao W, Mak MW (2013) Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Trans Audio Speech Lang Process 21(5):1012–1022

    Article  Google Scholar 

  41. 41.

    Rao KS, Sarkar S (2014) Stochastic feature compensation for robust speaker verification. In: Robust Speaker Recognition in Noisy Environments (pp. 49–76). Springer, Cham

  42. 42.

    Ravindran S, Anderson DV, Slaney M (2006) Improving the noise robustness of mel-frequency cepstral coefficients for speech processing. Proc. ISCA SAPA, Pittsburgh, pp 48–52

    Google Scholar 

  43. 43.

    Recommendation G (2003) 722.2: Wideband Coding of Speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)

  44. 44.

    Sadjadi SO, Hansen JHL (2011) Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. ICASSP, pp: 5448–5451

  45. 45.

    Sadjadi SO, Hansen JHL (2015) Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Comm 72:138–148

    Article  Google Scholar 

  46. 46.

    Sadjadi SO, Hasan T, Hansen JHL (2012) Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition. INTERSPEECH, pp: 1696–1699

  47. 47.

    Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Processing Letters 17(6):599–602

    Article  Google Scholar 

  48. 48.

    Seyed OS, Malcolm S, Heck L (2013) MSR identity toolbox v.1.0.A MATLAB toolbox for speaker recognition research. In proc, IEEE signal Process, Speech and Language Processing Technical Committee Newsletter

  49. 49.

    Tabibi S, Kegel A, Lai WK, Dillier N (2017) Investigating the use of a Gammatone filterbank for a cochlear implant coding strategy. J Neurosci Methods 277:63–74

    Article  Google Scholar 

  50. 50.

    The NIST Year (2008) Speaker recognition evaluation plan. Available: https://www.nist.gov/sites/default/files/documents/2017/09/26/sre08_evalplan_release4.pdf

  51. 51.

    Thomson DJ (1982) Spectrum estimation and harmonic analysis. Proc IEEE 70:1055–1096

    Article  Google Scholar 

  52. 52.

    Varga A, Steeneken HJ, Tomlinson M, Jones D (1992) The NOISEX-92 study on the effect of additive noise on automatic speech recognition. NOISEX92 CDROM

  53. 53.

    Ye L, Nie L, Han L, Zhang L, Rosenblum D (2015) Action2Activity: Recognizing Complex Activities from Sensor Data. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, pp 1617–1623

    Google Scholar 

  54. 54.

    Zhao X, Shao Y, Wang DL (2012) CASA-Based Robust Speaker Identification. IEEE Trans Audio, Speech and Language Processing 20(5):1608–1616

    Article  Google Scholar 

  55. 55.

    Zhu D, Paliwal K (2004) Product of power spectrum and group delay function for speech recognition. Proc of International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1:125–128

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ahmed Krobba.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Krobba, A., Debyeche, M. & Selouani, SA. Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification. Multimed Tools Appl 78, 19525–19542 (2019). https://doi.org/10.1007/s11042-019-7154-y

Download citation

Keywords

  • Speaker verification
  • Multi-taper chirp group zeros-phase Hilbert envelope coefficients (MCGDZPHECs)
  • Multi-taper Gammatone Hilbert envelope coefficients (MGHECs)
  • Mean Hilbert envelope coefficients (MHECs)
  • i-vetcor G-PLDA