Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification

  • Shipra MishraEmail author
  • Mahesh Chandra


This paper presents wavelet-based power normalized spectrum for computing robust cepstral features named WP-PNCC features. The proposed technique computes wavelet packet-based short-time spectrum of speech signal. A nonlinear function is defined as relating power spectrum of clean speech to the power spectrum of speech corrupted with noise. The constants of function are computed from longer-duration speech spectrum, and the short-time spectrum for each frame is weighted with the power function. The weighted speech spectrum is processed with logarithmic and discrete cosine transform operation to compute cepstral coefficients. The cepstral coefficients thus obtained are processed with quantile-based cepstral dynamics normalization technique. The proposed features are examined with hidden Markov model classifier on TIFR database for Hindi phoneme classification task and on TIMIT database for English phoneme classification task along with mel-frequency cepstral coefficients, power normalized cepstral coefficients and 24-band wavelet-based features in clean and noisy environments. Different noises from NOISEX-92 database are used for preparing noisy database with SNR ranging from 20 dB to 0 dB. The results show enhanced performance of proposed features in all the considered cases. The simulations are performed on MATLAB 2015b. The performance of proposed features is also evaluated on hidden Markov model toolkit-based speech recognition system. The comparative results confirm the robustness of proposed features with sufficient improvement over other features examined in this paper.


Hindi phoneme English phoneme Wavelet packet decomposition Nonlinear power function QCN HMM HTK 


Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    L.R. Bahl et al., A Maximum likelihood approach to continuous speech recognition. IEEE PAMI 2, 179–190 (1983)CrossRefGoogle Scholar
  2. 2.
    A. Biswas, P.K. Sahu, A. Bhowmick, M. Chandra, Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Proc. 9(6), 511–519 (2015)CrossRefGoogle Scholar
  3. 3.
    A. Biswas, P.K. Sahu, M. Chandra, Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Comput. Electr. Eng. 40(4), 1111–1122 (2014)CrossRefGoogle Scholar
  4. 4.
    H. Boril, Robust speech recognition: analysis and equalization of lombard effect in Czech Corpora. Ph. D. Thesis, Czech Technical University in Prague, Czech Republic (2008). Accessed 14 Jan 2018
  5. 5.
    H. Boril, J.H.L. Hansen, Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments. IEEE Trans. Audio Speech Lang. Process. 18(6), 1379–1393 (2010)CrossRefGoogle Scholar
  6. 6.
    G. Choueiter, J. Glass, An implementation of rational wavelets and filter design for phonetic classification. IEEE Trans. Audio Speech Lang. Process. 15(3), 939–948 (2007)CrossRefGoogle Scholar
  7. 7.
    G. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Speech Audio Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  8. 8.
    S. Davis, P. Mermelstein, Comparison of parametric representation of monosyllabic in continuous spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRefGoogle Scholar
  9. 9.
    O. Farooq, S. Datta, Wavelet-based robust sub-band features for phoneme recognition. IEE Proc. Vis. Image Signal Process. 151(3), 187–193 (2004)CrossRefGoogle Scholar
  10. 10.
    O. Farooq, S. Datta, Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process. Lett. 8(7), 196–198 (2001)CrossRefGoogle Scholar
  11. 11.
    O. Farooq, S. Datta, M.C. Shrotriya, Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int. J. Wavelets Multiresolut. Inf. Process. 8(6), 847–859 (2010)CrossRefGoogle Scholar
  12. 12.
    J.S. Garofolo et al., TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download (Linguistic Data Consortium, Philadelphia, 1993)CrossRefGoogle Scholar
  13. 13.
    G. Gelbart, N. Morgan, Evaluating long-term spectral subtraction for reverberant ASR, in Proceeding IEEE Workshop Automatic Speech Recognition and Understanding, pp. 103–106 (2001)Google Scholar
  14. 14.
    H. Hermansky, Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRefGoogle Scholar
  15. 15.
    H. Hermansky, N. Morgan, RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)CrossRefGoogle Scholar
  16. 16.
    G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath et al., Deep neural networks for acoustic modelling in speech recognition: the shared view of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  17. 17.
    C. Kim, H. Chiu, R. M. Stern, Physiologically-motivated synchrony-based processing for robust automatic speech recognition, in Interspeech, pp. 1975–1978 (2006)Google Scholar
  18. 18.
    C. Kim, B. Raj, R.M. Stern, Missing-feature methods for robust automatic speech recognition. IEEE Signal Process. Mag. 22, 101–116 (2005)Google Scholar
  19. 19.
    C. Kim, R. M. Stern, Power function-based power distribution normalization algorithm for robust speech recognition, in Proceeding IEEE Automatic Speech Recognition and Understanding Workshop, pp. 188–193 (2009)Google Scholar
  20. 20.
    C. Kim, R.M. Stern, Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1315–1329 (2016)CrossRefGoogle Scholar
  21. 21.
    Y. Litivin, I. Cohen, Single-channel source separation of audio signals using bark scale wavelet packet decomposition. J Signal Process. Syst. 65, 339–350 (2011)CrossRefGoogle Scholar
  22. 22.
    S. Molau, F. Hilger, H. Ney, Feature space normalization in adverse conditions, in Proceedings of ICASSP, pp. 656–659 (2003)Google Scholar
  23. 23.
    E. Pavez, J.F. Silva, Analysis and design of wavelet-packet cepstral coefficients for automatic speech recognition. Speech Commun. 54, 814–835 (2012)CrossRefGoogle Scholar
  24. 24.
    L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, in Proceedings of the IEEE, vol 77, No. 2, pp. 257–286 (1989)Google Scholar
  25. 25.
    P.K. Sahu, A. Biswas, A. Bhowmick, M. Chandra, Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition. Eng. Sci. Technol. Int. J. 17, 145–151 (2014)CrossRefGoogle Scholar
  26. 26.
    K. Samudravijaya, P.V.S. Rao, S.S. Agrawal, Hindi speech database, in International Conference on Spoken Language Processing (ICSLP00), Beijing, pp. 456–459 (2002)Google Scholar
  27. 27.
    R. Sarikaya, J.H.L. Hansen, High resolution speech feature parameterization for monophone-based stressed speech recognition. IEEE Signal Process. Lett. 7(7), 182–185 (2000)CrossRefGoogle Scholar
  28. 28.
    Y. Shao, C.-H. Chang, A novel hybrid neuro-wavelet system for robust speech recognition, in Proceedings of International Symposium on Circuits and Systems, Greece, pp. 1852–1855 (2006)Google Scholar
  29. 29.
    R.M. Stern, B. Raj, P.J. Moreno, Compensation for environmental degradation in automatic speech recognition, in Proceedings of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 33–42 (1997)Google Scholar
  30. 30.
    Q. Yan, S. Vaseghi, E. Zavarehei, B. Milner, J. Darch, P. White, I. Andrianakis, Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing. Comput. Speech Lang. 21(3), 543–561 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringBirla Institute of Technology, MesraRanchiIndia

Personalised recommendations