Advertisement

Circuits, Systems, and Signal Processing

, Volume 37, Issue 8, pp 3412–3440 | Cite as

Detection of the Glottal Closure Instants Using Empirical Mode Decomposition

  • Rajib Sharma
  • S. R. M. Prasanna
  • Hugo Leonardo Rufiner
  • Gastón Schlotthauer
Article

Abstract

This work explores the effectiveness of the Intrinsic Mode Functions (IMFs) of the speech signal, in estimating its Glottal Closure Instants (GCIs). The IMFs of the speech signal, which are its AM–FM or oscillatory components, are obtained from two similar nonlinear and non-stationary signal analysis techniques—Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN), and Modified Empirical Mode Decomposition (MEMD). Both these techniques are advanced variants of the original technique—Empirical Mode Decomposition (EMD). MEMD is much faster than ICEEMDAN, whereas the latter curtails mode-mixing (a drawback of EMD) more effectively. It is observed that the partial summation of a certain subset of the IMFs results in a signal whose minima are aligned with the GCIs. Based on this observation, two different methods are devised for estimating the GCIs from the IMFs of ICEEMDAN and MEMD. The two methods are captioned ICEEMDAN-based GCIs Estimation (IGE) and MEMD-based GCIs Estimation (MGE). The results reveal that IGE and MGE provide consistent and reliable estimates of the GCIs, compared to the state-of-the-art methods, across different scenarios—clean, noisy, and telephone channel conditions.

Keywords

Glottal closure instants (GCIs) Empirical mode decomposition (EMD) Improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) Modified empirical mode decomposition (MEMD) Intrinsic mode functions (IMFs) 

References

  1. 1.
    T. Ananthapadmanabha, B. Yegnanarayana, Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Trans. Acoust. Speech Signal Process. 27(4), 309–319 (1979)CrossRefGoogle Scholar
  2. 2.
    J. Benesty, M.M. Sondhi, Y. Huang, Springer Handbook of Speech Processing (Springer, Berlin, 2008)CrossRefGoogle Scholar
  3. 3.
    A. Bouchikhi, A.O. Boudraa, Multicomponent am–fm signals analysis based on emd-b-splines esa. Signal Process. 92(9), 2214–2228 (2012)CrossRefGoogle Scholar
  4. 4.
    B. Bozkurt, T. Dutoit, Mixed-phase speech modeling and formant estimation, using differential phase spectrums, in ISCA Tutorial and Research Workshop on Voice Quality: Functions, Analysis and Synthesis (2003)Google Scholar
  5. 5.
    M. Brookes, Voicebox, in Speech Processing Toolbox for Matlab, Department of Electrical and Electronic Engineering, Imperial College (2009)Google Scholar
  6. 6.
    J.C. Cexus, A.O. Boudraa, Nonstationary signals analysis by teager-huang transform (tht), in Signal Processing Conference, 2006 14th European (IEEE, 2006), pp. 1–5Google Scholar
  7. 7.
    S. King, V. Karaiskos, in The Blizzard Challenge 2009, Centre for Speech Technology Research (CSTR) at the University of Edinburgh, UK (2009). http://www.festvox.org/blizzard/bc2009/summary_Blizzard2009.pdf
  8. 8.
    N. Chatlani, J.J. Soraghan, Emd-based filtering (emdf) of low-frequency noise for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 20(4), 1158–1166 (2012)CrossRefGoogle Scholar
  9. 9.
    K. Chen, X.C. Zhou, J.Q. Fang, P.F. Zheng, J. Wang, Fault feature extraction and diagnosis of gearbox based on EEMD and deep briefs network. Int. J. Rotating Mach. 2017 (2017).  https://doi.org/10.1155/2017/9602650
  10. 10.
    Y. Chen, Ct Wu, Hl Liu, Emd self-adaptive selecting relevant modes algorithm for fbg spectrum signal. Opt. Fiber Technol. 36, 63–67 (2017)CrossRefGoogle Scholar
  11. 11.
    M.A. Colominas, G. Schlotthauer, M.E. Torres, Improved complete ensemble emd: a suitable tool for biomedical signal processing. Biomed. Signal Process. Control 14, 19–29 (2014)CrossRefGoogle Scholar
  12. 12.
    M.A. Colominas, G. Schlotthauer, M.E. Torres, An unconstrained optimization approach to empirical mode decomposition. Digit. Signal Process. 40, 164–175 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    K. Deepak, S. Prasanna, Epoch extraction using zero band filtering from speech signal. Circuits Syst. Signal Process. 34(7), 2309–2333 (2015)CrossRefGoogle Scholar
  14. 14.
    T. Drugman, T. Dutoit, Glottal closure and opening instant detection from speech signals, in Interspeech—Tenth Annual Conference of the International Speech Communication Association, pp. 2891–2894 (2009)Google Scholar
  15. 15.
    T. Drugman, G. Wilfart, T. Dutoit, A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, in Tenth Annual Conference of the International Speech Communication Association (2009)Google Scholar
  16. 16.
    T. Drugman, M. Thomas, J. Gudnason, P. Naylor, T. Dutoit, Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans. Audio Speech Lang. Process. 20(3), 994–1006 (2012)CrossRefGoogle Scholar
  17. 17.
    P. Flandrin, Some aspects of huang’s empirical mode decomposition, from interpretation to applications. in International Conference on Computational Harmonic Analysis CHA, vol. 4 (2004)Google Scholar
  18. 18.
    P. Flandrin, P. Goncalves, Empirical mode decompositions as data-driven wavelet-like expansions. Int. J. Wavel. Multiresolut. Inf. Process. 2(04), 477–496 (2004)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    P. Flandrin, G. Rilling, P. Goncalves, Empirical mode decomposition as a filter bank. Signal Process. Lett. IEEE 11(2), 112–114 (2004)CrossRefGoogle Scholar
  20. 20.
    N.D. Gaubitch, P.A. Naylor, Spatiotemporal averaging method for enhancement of reverberant speech. in Digital Signal Processing, 2007 15th International Conference on (IEEE, 2007), pp. 607–610Google Scholar
  21. 21.
    Y. Guo, G.R. Naik, H. Nguyen, Single channel blind source separation based local mean decomposition for biomedical applications, in Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE (IEEE, 2013), pp. 6812–6815Google Scholar
  22. 22.
    Y. Guo, S. Huang, Y. Li, G.R. Naik, Edge effect elimination in single-mixture blind source separation. Circuits Syst. Signal Process. 32(5), 2317–2334 (2013)MathSciNetCrossRefGoogle Scholar
  23. 23.
    H. Hao, H. Wang, N. Rehman, A joint framework for multivariate signal denoising using multivariate empirical mode decomposition. Signal Process. 135, 263–273 (2017)CrossRefGoogle Scholar
  24. 24.
    W.J. Hardcastle, A. Marchal, Speech Production and Speech Modelling (Springer, Berlin, 1990). 55CrossRefGoogle Scholar
  25. 25.
    R.S. Holambe, M.S. Deshpande, Advances in Non-Linear Modeling for Speech Processing (Springer, Berlin, 2012)CrossRefMATHGoogle Scholar
  26. 26.
    N.E. Huang, Empirical mode decomposition and hilbert spectral analysis, in 69th Meeting of Shock and Vibration, Minneapolis, MN, United States (1998). https://ntrs.nasa.gov/search.jsp?R=19990078602
  27. 27.
    H. Huang, J. Pan, Speech pitch determination based on hilbert-huang transform. Signal Process. 86(4), 792–803 (2006)CrossRefMATHGoogle Scholar
  28. 28.
    N.E. Huang, S.S. Shen, Hilbert–Huang Transform and Its Applications, vol. 5 (World Scientific, Singapore, 2005)CrossRefMATHGoogle Scholar
  29. 29.
    N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, H.H. Liu, The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. So. Lond. Ser. A Math. Phys. Eng. Sci. 454(1971), 903–995 (1998)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    P. Jain, R.B. Pachori, Event-based method for instantaneous fundamental frequency estimation from voiced speech based on eigenvalue decomposition of the hankel matrix. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(10), 1467–1482 (2014)CrossRefGoogle Scholar
  31. 31.
    K. Khaldi, M.T.H. Alouane, A.O. Boudraa, A new emd denoising approach dedicated to voiced speech signals, in Signals, Circuits and Systems, 2008. SCS 2008. 2nd International Conference on, (IEEE, 2008), pp. 1–5Google Scholar
  32. 32.
    K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, M. Turki, Audio encoding using huang and hilbert transforms, in Communications, Control and Signal Processing (ISCCSP), 2010 4th International Symposium on, (IEEE, 2010), pp. 1–5Google Scholar
  33. 33.
    K. Khaldi, A.O. Boudraa, M. Turki, T. Chonavel, I. Samaali, Audio encoding based on the empirical mode decomposition, in Signal Processing Conference, 2009 17th European, (IEEE, 2009), pp. 924–928Google Scholar
  34. 34.
    K. Khaldi, A.O. Boudraa, On signals compression by emd. Electron. lett. 48(21), 1329–1331 (2012)CrossRefGoogle Scholar
  35. 35.
    K. Khaldi, A. Boudraa, Audio watermarking via emd. IEEE Trans. Audio Speech Lang. Process. 21(3), 675–680 (2013)CrossRefGoogle Scholar
  36. 36.
    K. Khaldi, A.O. Boudraa, A. Bouchikhi, M.T.H. Alouane, Speech enhancement via emd. EURASIP J. Adv. Signal Process. 2008(1), 873,204 (2008)CrossRefMATHGoogle Scholar
  37. 37.
    K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)CrossRefGoogle Scholar
  38. 38.
    K. Khaldi, A.O. Boudraa, B. Torresani, T. Chonavel, Hht-based audio coding. Signal Image Video Process. 9(1), 107–115 (2015)CrossRefGoogle Scholar
  39. 39.
    K. Khaldi, A.O. Boudraa, M. Turki, Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement. IET Signal Process. 10(1), 69–80 (2016)CrossRefGoogle Scholar
  40. 40.
    J. Kominek, A.W. Black, The cmu arctic speech databases, in Fifth ISCA Workshop on Speech Synthesis (2004)Google Scholar
  41. 41.
    C.D. Lin, C.M. Anderson-Cook, M.S. Hamada, L.M. Moore, R.R. Sitter, Using genetic algorithms to design experiments: a review. Qual. Reliab. Eng. Int. 31(2), 155–167 (2015).  https://doi.org/10.1002/qre.1591 CrossRefGoogle Scholar
  42. 42.
    E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9(5–6), 453–467 (1990)CrossRefGoogle Scholar
  43. 43.
    K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  44. 44.
    G.R. Naik, S.E. Selvan, H.T. Nguyen, Single-channel emg classification with ensemble-empirical-mode-decomposition-based ica for diagnosing neuromuscular disorders. IEEE Trans. Neural Syst. Rehabil. Eng. 24(7), 734–743 (2016)CrossRefGoogle Scholar
  45. 45.
    P.A. Naylor, A. Kounoudes, J. Gudnason, M. Brookes, Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)CrossRefGoogle Scholar
  46. 46.
    A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)CrossRefGoogle Scholar
  47. 47.
    L.R. Rabiner, R.W. Schafer, Digital Processing of Speech Signals, vol. 100 (Prentice-Hall, Englewood Cliffs, 1978)Google Scholar
  48. 48.
    L.R. Rabiner, R.W. Schafer, Introduction to digital speech processing. Found. Trends Signal Process. 1(1), 1–194 (2007)CrossRefMATHGoogle Scholar
  49. 49.
    G. Rilling, P. Flandrin, P. Goncalves, et al. On empirical mode decomposition and its algorithms, in IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, vol. 3,NSIP-03, Grado (I) (2003), pp. 8–11Google Scholar
  50. 50.
    G. Schlotthauer, M.E. Torres, H.L. Rufiner, Pathological voice analysis and classification based on empirical mode decomposition, in Development of Multimodal Interfaces: Active Listening and Synchrony, ed. by A. Esposito, N. Campbell, C. Vogel, A. Hussain, A. Nijholtt (Springer, 2010), pp. 364–381Google Scholar
  51. 51.
    G. Schlotthauer, M. Torres, H. Rufiner, Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies, in World Congress on Medical Physics and Biomedical Engineering, September 7–12, 2009, (Springer, Munich, Germany, 2010), pp. 984–987Google Scholar
  52. 52.
    R. Sharma, S.M. Prasanna, A better decomposition of speech obtained using modified empirical mode decomposition. Digit. Signal Process. 58, 26–39 (2016).  https://doi.org/10.1016/j.dsp.2016.07.012, URL http://www.sciencedirect.com/science/article/pii/S1051200416300975
  53. 53.
    R. Sharma, S.R.M. Prasanna, Characterizing glottal activity from speech using empirical mode decomposition, in National Conference on Communications 2015 (NCC-2015). (Mumbai, India, 2015)Google Scholar
  54. 54.
    R. Sharma, L. Vignolo, G. Schlotthauer, M. Colominas, H.L. Rufiner, S. Prasanna, Empirical mode decomposition for adaptive am-fm analysis of speech: a review. Speech Commun. 88, 39–64 (2017).  https://doi.org/10.1016/j.specom.2016.12.004, URL http://www.sciencedirect.com/science/article/pii/S0167639316302370
  55. 55.
    R. Smits, B. Yegnanarayana, Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3(5), 325–333 (1995)CrossRefGoogle Scholar
  56. 56.
    K. Sreenivasa Rao, S. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. Signal Process. Lett. IEEE 14(10), 762–765 (2007)CrossRefGoogle Scholar
  57. 57.
    Y. Stylianou, Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)CrossRefGoogle Scholar
  58. 58.
    D. Talkin, A robust algorithm for pitch tracking (rapt). Speech Coding Synth. 495, 518 (1995)Google Scholar
  59. 59.
    M.R. Thomas, J. Gudnason, P.A. Naylor, Data-driven voice source waveform modelling, in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, (IEEE, 2009), pp. 3965–3968Google Scholar
  60. 60.
    M.R. Thomas, J. Gudnason, P.A. Naylor, Estimation of glottal closing and opening instants in voiced speech using the yaga algorithm. IEEE Trans. Audio Speech Lang. Process. 20(1), 82–91 (2012)CrossRefGoogle Scholar
  61. 61.
    M.E. Torres, M.A. Colominas, G. Schlotthauer, P. Flandrin, A complete ensemble empirical mode decomposition with adaptive noise, in Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, (IEEE, 2011), pp. 4144–4147Google Scholar
  62. 62.
  63. 63.
  64. 64.
  65. 65.
    A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)CrossRefGoogle Scholar
  66. 66.
    G. Wang, X.Y. CHEN, F.L. Qiao, Z. Wu, N.E. Huang, On intrinsic mode function. Adv. Adapt. Data Anal. 2(03), 277–293 (2010)MathSciNetCrossRefGoogle Scholar
  67. 67.
    D. Wong, J. Markel, A. Gray, Least squares glottal inverse filtering from the acoustic speech waveform. IEEE Trans. Acoust. Speech Signal Process. 27(4), 350–355 (1979)CrossRefGoogle Scholar
  68. 68.
    Z. Wu, N.E. Huang, A study of the characteristics of white noise using the empirical mode decomposition method. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 460(2046), 1597–1611 (2004)CrossRefMATHGoogle Scholar
  69. 69.
    Z. Wu, N.E. Huang, Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1(01), 1–41 (2009)CrossRefGoogle Scholar
  70. 70.
    J.D. Wu, Y.J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011)CrossRefGoogle Scholar
  71. 71.
    B. Yegnanarayana, S.V. Gangashetty, Epoch-based analysis of speech signals. Sadhana 36(5), 651–697 (2011)CrossRefGoogle Scholar
  72. 72.
    J.R. Yeh, J.S. Shieh, N.E. Huang, Complementary ensemble empirical mode decomposition: a novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2(02), 135–156 (2010)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Signal Informatics Laboratory, Department of Electronics and Electrical EngineeringIndian Institute of Technology GuwahatiGuwahatiIndia
  2. 2.Research Institute for Signals, Systems and Computational Intelligence – sinc(i), Facultad de Ingeniería y Ciencias HídricasUniversidad Nacional del LitoralSanta FeArgentina
  3. 3.Laboratorio de Señales y Dinámicas no Lineales, CITER - CONICET, Facultad de IngenieríaUniversidad Nacional de Entre RíosOro VerdeArgentina

Personalised recommendations