Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 14, pp 19525–19542 | Cite as

Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification

  • Ahmed KrobbaEmail author
  • Mohamed Debyeche
  • Sid-Ahmed Selouani
Article
  • 57 Downloads

Abstract

In this paper, we propose two new feature extraction methods for robust automatic speaker verification under noisy conditions. The first method, called Multi-taper Gammatone Hilbert Envelope Coefficients (MGHECs), employs multi-taper magnitude spectra that offer considerable advantages for spectrum estimates. The second method, called Multi-taper Chirp Group Delay Zeros-Phase Hilbert Envelope Coefficients (MCGDZPHECs) based on multi-tapers phase spectral. The chirp group delay technique is used to estimate the vocal tract from the chirp Fourier transform phase. The performance evaluation of the proposed methods and their extended variants are carried out on NIST 2008 corpus under noisy conditions, using various noise SNR levels which are extracted from NOISEX-92. Experimental results show that the proposed methods provide better representation of speech spectrum. Moreover, we obtained a significant improvement in performance under noisy conditions when compared to conventional Mean Hilbert Envelope Coefficients (MHECs) feature extraction.

Keywords

Speaker verification Multi-taper chirp group zeros-phase Hilbert envelope coefficients (MCGDZPHECs) Multi-taper Gammatone Hilbert envelope coefficients (MGHECs) Mean Hilbert envelope coefficients (MHECs) i-vetcor G-PLDA 

Notes

References

  1. 1.
    Ajmera PK, Holambe RS (2013) Fractional Fourier transform based features for speaker recognition using support vector machine. Comput Electr Eng 39:550–557CrossRefGoogle Scholar
  2. 2.
    Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5:15400–15413CrossRefGoogle Scholar
  3. 3.
    Alam MJ, Kinnunen T, Kenny P, Ouellet P, O’Shaughnessy D (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Comm 55:237–251CrossRefGoogle Scholar
  4. 4.
    Alsteris LD, Paliwal KK (2006) Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Process 17:578–616CrossRefGoogle Scholar
  5. 5.
    Ambikairajah E, Kua JMK, Sethu V, Li H (2012) PNCC-ivector-SRC based speaker verification. In: Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), Asia-Pacific, pp. 1–7Google Scholar
  6. 6.
    Apsingekar VR, De Leon PL (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118CrossRefGoogle Scholar
  7. 7.
    Asbai N, Amrouche A (2017) Boosting scores fusion approach using Front-End Diversity and adaboost Algorithm, for speaker verification. Comput Electr Eng 62:648–662 ElsevierCrossRefGoogle Scholar
  8. 8.
    Babadi B, Brown EN (2014) A review of multitaper spectral analysis. IEEE Trans Biomed Eng 61:1555–1564CrossRefGoogle Scholar
  9. 9.
    Banno H, Takeda K, Itakura F (2001) A study on perceptual distance measure for phase spectrum of stimuli. In: Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3297–3300Google Scholar
  10. 10.
    Bhattacharjee U, Pranab Das (2013) Performance Evaluation of Wiener Filter and Kalman Filter Combined with Spectral Subtraction in Speaker Verification System, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278–3075, vol-2Google Scholar
  11. 11.
    Bousquet PM, Bonastre JF, Matrouf D (2014) Exploring some limits of Gaussian PLDA modeling for i-vector distributions. In Odyssey: The Speaker and Language Recognition Workshop, 41–47Google Scholar
  12. 12.
    Bozkurt B, Couvreur L, Dutoit T (2007) Chirp group delay analysis of speech signals. Speech Comm 49:159–176CrossRefGoogle Scholar
  13. 13.
    Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-Vectors and speech separation. In: IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), pp. 5415–5419Google Scholar
  14. 14.
    Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL (2009) Investigation on LP-residual representations for speaker identification. Pattern Recognition 42:487–494CrossRefzbMATHGoogle Scholar
  15. 15.
    Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19:788–798CrossRefGoogle Scholar
  16. 16.
    Fedila M, Bengherabi M, Amrouche A (2015) Consolidating product spectrum and gammatone filterbank for robust speaker verification under noisy conditions. In: International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 347–352Google Scholar
  17. 17.
    Fedila M, Harizi F, Bengherabi M, Amrouche A (2014) Robust speaker verification using a new front end based on multitaper and gammatone filters. In: Tenth International Conference on Signal Image Technology and Internet-Based Systems (SITIS). IEEE, pp. 99–103Google Scholar
  18. 18.
    Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In Interspeech, pp. 249–252Google Scholar
  19. 19.
    Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32:74–99CrossRefGoogle Scholar
  20. 20.
    Hasan T, Hansen JH (2013) Acoustic factor analysis for robust speaker verification. IEEE Trans Audio Speech Lang Process 21:842–853CrossRefGoogle Scholar
  21. 21.
    Hegde RM, Murthy HA, Gadde VRR (2007) Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15(1):190–202CrossRefGoogle Scholar
  22. 22.
    Introduction page for Chirp Group Delay processing. Available at: http://tcts.fpms.ac.be/demos/zzt/cgd.html. Accessed 25 Nov 2018
  23. 23.
    Jeevan M, Dhingra A, Hanmandlu M, Panigrahi BK (2017) Robust speaker verification using GFCC based i-Vectors. In: Proceedings of the International Conference on Signal, Networks, Computing, and Systems (pp. 85–91). Springer, New DelhiGoogle Scholar
  24. 24.
    Kanagasundaram A, Vogt RJ, Dean DB, Sridharan S (2012) PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition WorkshopGoogle Scholar
  25. 25.
    Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus Eigenchannels inspeaker recognition. IEEE Trans Audio Speech and Lang Process 15:1435–1447CrossRefGoogle Scholar
  26. 26.
    Kim S, Ji M, Kim H (2008) Noise Robust Speaker Recognition Using Subband Likelihoods and Reliable Feature Selection. ETRI J 30:89–100CrossRefGoogle Scholar
  27. 27.
    Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Trans Audio Speech Lang Process 24:1315–1329CrossRefGoogle Scholar
  28. 28.
    Kinnunen T, Alam MJ, Mate ˇjka P, Kenny P, C ˇ ernocky J, OShaughnessy D (2013) Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations. In: Proc. INTERSPEECH. Lyon, France, pp. 3122–3126Google Scholar
  29. 29.
    Kinnunen T, Rajan P (2013) A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In: Proceedings of ICASSP, pp. 7229–7233Google Scholar
  30. 30.
    Kinnunen T, Saeidi R, Sedlak F, Lee KA, Sandberg J, Hansson-Sandsten M, Li H (2012) Low-variance multitaper MFCC features: a case study in robust speaker verification. IEEE Trans Audio Speech Lang Proc, pp. 1990–2001Google Scholar
  31. 31.
    Krobba A, Debyeche M, Selouani SA (2018) Robust speaker verification system in acoustic noise mobile by using Multitaper Gammatone Hilbert envelope coefficients. 2nd International Conference on Natural Language and Speech Processing (ICNLSP), (pp. 1–6). IEEEGoogle Scholar
  32. 32.
    Narendra1 KC, Kumaraswamy R, Gurugopinath S, (2017). Performance comparison of multitaper techniques for speaker verification with expressive speech. International Journal of Speech Technology 1–10Google Scholar
  33. 33.
    Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40CrossRefGoogle Scholar
  34. 34.
    Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406CrossRefGoogle Scholar
  35. 35.
    Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723CrossRefGoogle Scholar
  36. 36.
    Murthy HA, Yegnanarayana B (1991) Speech processing using group delay functions. Signal Process 22:259–267CrossRefGoogle Scholar
  37. 37.
    Paliwal KK, Wojcicki K, Shannon B (2011) The importance of phase in speech enhancement. Speech Comm 53:465–494CrossRefGoogle Scholar
  38. 38.
    Pohjalainen J, Hanilçi C, Kinnunen T, Alku P (2014) Mixture linear prediction in speaker verification under vocal effort mismatch. IEEE Signal Processing Letters 21(12):1516–1520CrossRefGoogle Scholar
  39. 39.
    Prieto GA, Parker RL, Thomson DJ, Vernon FL, Graham RL (2007) Reducing the bias of multitaper spectrum estimates. Geophys J Int 171:1269–1281CrossRefGoogle Scholar
  40. 40.
    Rao W, Mak MW (2013) Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Trans Audio Speech Lang Process 21(5):1012–1022CrossRefGoogle Scholar
  41. 41.
    Rao KS, Sarkar S (2014) Stochastic feature compensation for robust speaker verification. In: Robust Speaker Recognition in Noisy Environments (pp. 49–76). Springer, ChamGoogle Scholar
  42. 42.
    Ravindran S, Anderson DV, Slaney M (2006) Improving the noise robustness of mel-frequency cepstral coefficients for speech processing. Proc. ISCA SAPA, Pittsburgh, pp 48–52Google Scholar
  43. 43.
    Recommendation G (2003) 722.2: Wideband Coding of Speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)Google Scholar
  44. 44.
    Sadjadi SO, Hansen JHL (2011) Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. ICASSP, pp: 5448–5451Google Scholar
  45. 45.
    Sadjadi SO, Hansen JHL (2015) Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Comm 72:138–148CrossRefGoogle Scholar
  46. 46.
    Sadjadi SO, Hasan T, Hansen JHL (2012) Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition. INTERSPEECH, pp: 1696–1699Google Scholar
  47. 47.
    Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Processing Letters 17(6):599–602CrossRefGoogle Scholar
  48. 48.
    Seyed OS, Malcolm S, Heck L (2013) MSR identity toolbox v.1.0.A MATLAB toolbox for speaker recognition research. In proc, IEEE signal Process, Speech and Language Processing Technical Committee NewsletterGoogle Scholar
  49. 49.
    Tabibi S, Kegel A, Lai WK, Dillier N (2017) Investigating the use of a Gammatone filterbank for a cochlear implant coding strategy. J Neurosci Methods 277:63–74CrossRefGoogle Scholar
  50. 50.
    The NIST Year (2008) Speaker recognition evaluation plan. Available: https://www.nist.gov/sites/default/files/documents/2017/09/26/sre08_evalplan_release4.pdf
  51. 51.
    Thomson DJ (1982) Spectrum estimation and harmonic analysis. Proc IEEE 70:1055–1096CrossRefGoogle Scholar
  52. 52.
    Varga A, Steeneken HJ, Tomlinson M, Jones D (1992) The NOISEX-92 study on the effect of additive noise on automatic speech recognition. NOISEX92 CDROMGoogle Scholar
  53. 53.
    Ye L, Nie L, Han L, Zhang L, Rosenblum D (2015) Action2Activity: Recognizing Complex Activities from Sensor Data. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, pp 1617–1623Google Scholar
  54. 54.
    Zhao X, Shao Y, Wang DL (2012) CASA-Based Robust Speaker Identification. IEEE Trans Audio, Speech and Language Processing 20(5):1608–1616CrossRefGoogle Scholar
  55. 55.
    Zhu D, Paliwal K (2004) Product of power spectrum and group delay function for speech recognition. Proc of International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1:125–128Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Ahmed Krobba
    • 1
    Email author
  • Mohamed Debyeche
    • 1
  • Sid-Ahmed Selouani
    • 2
  1. 1.Speech Communication and Signal Processing LaboratoryUniversity of USTHBAlgiersAlgeria
  2. 2.LARIHS Laboratory, Campus ShappaingUniversity of MonctonMonctonCanada

Personalised recommendations