, 44:54 | Cite as

Efficient band selection for improving the robustness of the EMD-based cepstral features

  • Ehsan Samadi
  • Ghasem AlipoorEmail author


Mel-Frequency Cepstral Coefficients (MFCC) are features widely and successfully used for various speech processing applications. These features are extracted using Fourier transform. However, this transform suffers from some crucial restrictions when used for analyzing nonlinear and non-stationary signals such as speech. To address this problem, in the present study, we investigate the application of Empirical Mode Decomposition (EMD) in extracting more efficient and robust features for automatic gender identification. In particular, in the proposed approach, the speech signal is first decomposed into a set of narrow-band oscillatory modes, using EMD, from which mel-frequency cepstral features can be extracted. On the other hand, multi-band decomposition of all modes results in some redundant and even irrelevant features that can degrade the performance of the classification. Therefore, we propose to efficiently select the most discriminative frequency bands over all modes. The minimal-redundancy-maximal-relevance (mRMR) feature selection algorithm is also examined for this purpose. The proposed EMD-based features are then extracted by applying DCT on log power values calculated over the selected mel-scale bands of the IMFs. Simulation results show that, using the proposed features for automatic gender identification considerably improves the performance of the system, in particular in noisy environments.


Automatic gender identification empirical mode decomposition (EMD) mel-frequency cepstral coefficients (MFCC) feature selection minimal-redundancy-maximal-relevance 


  1. 1.
    Khelif K, Mombrun Y, Backfried G, Sahito F, Scarpato L, Motlicek P, Madikeri S, Kelly D, Hazzani G and Chatzigavriil E 2017 Towards a breakthrough speaker identification approach for law enforcement agencies: SIIP. In: European Conference on Intelligence and Security Informatics, pp. 32–39Google Scholar
  2. 2.
    Shahin I 2017 Speaker verification in emotional talking environments based on three-stage framework. In: International Conference on Electrical and Computing Technologies and Applications (ICECTA), pp. 1–5Google Scholar
  3. 3.
    Zhang Y et al 2017 A paralinguistic approach to speaker diarisation: using age, gender, voice likability and personality traits. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 387–392Google Scholar
  4. 4.
    Zeng Y et al 2006 Robust GMM based gender classification using pitch and RASTA-PLP parameters of speech. In: Conference on Machine Learning and Cybernetics, International, pp. 3376–3379Google Scholar
  5. 5.
    Levitan S I, Mishra T and Bangalore S 2016 Automatic identification of gender from speech. In: Proceedings of Speech Prosody, pp. 84–88Google Scholar
  6. 6.
    Harb H and Chen L 2005 Voice-based gender identification in multimedia applications. J. Intell. Inf. Syst. 24(2): 179–198CrossRefGoogle Scholar
  7. 7.
    Childers D G and Wu K 1991 Gender recognition from speech. Part II: Fine analysis. J. Acoust. Soc. Am. 90(4): 1841–1856CrossRefGoogle Scholar
  8. 8.
    Spoorthy S and Ramamurthy G 2011 Gender identification using significant intrinsic mode functions and Fourier-Bessel expansion. In: International Conference on Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), pp. 86–89Google Scholar
  9. 9.
    Yücesoy E N and Nabiyev V V 2013 Gender identification of a speaker using MFCC and GMM. In: 8th International Conference on Electrical and Electronics Engineering (ELECO), pp. 626–629Google Scholar
  10. 10.
    Safavi S, Russell M and Jančovič P 2018 Automatic speaker, age-group and gender identification from children’s speech. Comput. Speech Lang. 50: 141–156CrossRefGoogle Scholar
  11. 11.
    Ranjan S, Liu G and Hansen J H L 2015 An i-vector PLDA based gender identification approach for severely distorted and multilingual DARPA RATS data. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 331–337Google Scholar
  12. 12.
    Chen O T C and Gu J 2015 Improved gender/age recognition system using arousal-selection and feature-selection schemes. In: International Conference on Digital Signal Processing (DSP), pp. 148–152Google Scholar
  13. 13.
    Huang N E et al 1998 The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. A: Math. Phys. Eng. Sci. 454(1971): 903–995MathSciNetCrossRefGoogle Scholar
  14. 14.
    Flandrin P, Rilling G and Goncalves P 2004 Empirical mode decomposition as a filter bank. IEEE Signal Process. Lett. 11(2): 112–114CrossRefGoogle Scholar
  15. 15.
    Hanchuan P, Fuhui L and Ding C 2005 Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8): 1226–1238CrossRefGoogle Scholar
  16. 16.
    Wu Z and Huang N E 2009 Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1(01): 1–41CrossRefGoogle Scholar
  17. 17.
    Torres M E et al 2011 A complete ensemble empirical mode decomposition with adaptive noise. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4144–4147Google Scholar
  18. 18.
    Faragallah O S 2018 Robust noise MKMFCC–SVM automatic speaker identification. Int. J. Speech Technol. 21(2): 185–192CrossRefGoogle Scholar
  19. 19.
    Khelifa M O M et al 2017 Constructing accurate and robust HMM/GMM models for an arabic speech recognition system. Int. J. Speech Technol. 20(4): 937–949CrossRefGoogle Scholar
  20. 20.
    TIMIT 1993 “DARPA TIMIT-Acoustic-Phonetic Continuous Speech Corpus,” National Institute of Standards and Technology document NISTIR 4930Google Scholar

Copyright information

© Indian Academy of Sciences 2019

Authors and Affiliations

  1. 1.Electrical Engineering DepartmentHamedan University of TechnologyHamedanIran

Personalised recommendations