Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification
This paper presents wavelet-based power normalized spectrum for computing robust cepstral features named WP-PNCC features. The proposed technique computes wavelet packet-based short-time spectrum of speech signal. A nonlinear function is defined as relating power spectrum of clean speech to the power spectrum of speech corrupted with noise. The constants of function are computed from longer-duration speech spectrum, and the short-time spectrum for each frame is weighted with the power function. The weighted speech spectrum is processed with logarithmic and discrete cosine transform operation to compute cepstral coefficients. The cepstral coefficients thus obtained are processed with quantile-based cepstral dynamics normalization technique. The proposed features are examined with hidden Markov model classifier on TIFR database for Hindi phoneme classification task and on TIMIT database for English phoneme classification task along with mel-frequency cepstral coefficients, power normalized cepstral coefficients and 24-band wavelet-based features in clean and noisy environments. Different noises from NOISEX-92 database are used for preparing noisy database with SNR ranging from 20 dB to 0 dB. The results show enhanced performance of proposed features in all the considered cases. The simulations are performed on MATLAB 2015b. The performance of proposed features is also evaluated on hidden Markov model toolkit-based speech recognition system. The comparative results confirm the robustness of proposed features with sufficient improvement over other features examined in this paper.
KeywordsHindi phoneme English phoneme Wavelet packet decomposition Nonlinear power function QCN HMM HTK
Compliance with Ethical Standards
Conflict of interest
The authors declare that they have no conflict of interest.
- 4.H. Boril, Robust speech recognition: analysis and equalization of lombard effect in Czech Corpora. Ph. D. Thesis, Czech Technical University in Prague, Czech Republic (2008). http://www.utdallas.edu/hynek. Accessed 14 Jan 2018
- 13.G. Gelbart, N. Morgan, Evaluating long-term spectral subtraction for reverberant ASR, in Proceeding IEEE Workshop Automatic Speech Recognition and Understanding, pp. 103–106 (2001)Google Scholar
- 17.C. Kim, H. Chiu, R. M. Stern, Physiologically-motivated synchrony-based processing for robust automatic speech recognition, in Interspeech, pp. 1975–1978 (2006)Google Scholar
- 18.C. Kim, B. Raj, R.M. Stern, Missing-feature methods for robust automatic speech recognition. IEEE Signal Process. Mag. 22, 101–116 (2005)Google Scholar
- 19.C. Kim, R. M. Stern, Power function-based power distribution normalization algorithm for robust speech recognition, in Proceeding IEEE Automatic Speech Recognition and Understanding Workshop, pp. 188–193 (2009)Google Scholar
- 22.S. Molau, F. Hilger, H. Ney, Feature space normalization in adverse conditions, in Proceedings of ICASSP, pp. 656–659 (2003)Google Scholar
- 24.L.R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, in Proceedings of the IEEE, vol 77, No. 2, pp. 257–286 (1989)Google Scholar
- 26.K. Samudravijaya, P.V.S. Rao, S.S. Agrawal, Hindi speech database, in International Conference on Spoken Language Processing (ICSLP00), Beijing, pp. 456–459 (2002)Google Scholar
- 28.Y. Shao, C.-H. Chang, A novel hybrid neuro-wavelet system for robust speech recognition, in Proceedings of International Symposium on Circuits and Systems, Greece, pp. 1852–1855 (2006)Google Scholar
- 29.R.M. Stern, B. Raj, P.J. Moreno, Compensation for environmental degradation in automatic speech recognition, in Proceedings of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 33–42 (1997)Google Scholar