Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Alam, Md. Jahangir; Kenny, Patrick; O’Shaughnessy, Douglas

doi:10.1007/978-3-642-38847-7_22

Md. Jahangir Alam^21,22,
Patrick Kenny²² &
Douglas O’Shaughnessy²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7911))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

1099 Accesses
2 Citations

Abstract

In this paper we present a robust feature extractor that includes the use of a smoothed nonlinear energy operator (SNEO)-based amplitude modulation features for a large vocabulary continuous speech recognition (LVCSR) task. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separation algorithm (ESA). Similar to the PNCC (Power Normalized Cepstral Coefficients) front-end, a medium duration power bias subtraction (MDPBS) is used to enhance the AM power spectrum. The performance of the proposed feature extractor is evaluated, in the context of speech recognition, on the AURORA-4 corpus, which represents additive noise and channel mismatch conditions. The ETSI advanced front-end (ETSI-AFE),power normalized cepstral coefficients (PNCC), Cochlear filterbank cepstral coefficients (CFCC) and conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results on the AURORA-4 task depict that the proposed method is robust against both additive and different microphone channel environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Hermansky, H.: Perceptual linear prediction analysis of speech, J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Article Google Scholar
Terasawa, H.: A Hybrid Model for Timbre Perception: Quantitative Representations of Sound Color and Density. Ph.D. Thesis, Stanford University, Stanford, CA (2009)
Google Scholar
ETSI ES 202 050, Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; Compression algorithms (2003)
Google Scholar
Kim, C., Stern, R.M.: Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 4574–4577 (March 2010)
Google Scholar
Alam, M.J., Kenny, P., O’Shaughnessy, D.: Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum. In: Proc. INTERSPEECH, Portland Oregon (September 2012)
Google Scholar
van Hout, J., Alwan, A.: A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition. In: Proc. of ICASSP, pp. 4105–4108 (2012)
Google Scholar
Vikramjit Mitra, H., Franco, M., Graciarena, A.: Mandal, Normalized Amplitude modulation features for large vocabulary noise-robust speech recognition. In: Proc. of ICASSP, pp. 4117–4120 (2012)
Google Scholar
Maragos, Kaiser, J.F., Quatieri, T.F.: On amplitude and frequency demodulation using energy operators. IEEE Trans. Signal Processing 41(4), 1532–1550 (1993)
Article MATH Google Scholar
Potamianos, A., Maragos, P.: Speech analysis and synthesis using an AM–FM modulation model. Speech Communication 28, 195–209 (1999)
Article Google Scholar
Dimitriadis, D., Maragos, P.: Continuous energy demodulation methods and application to speech analysis. Speech Communication 48(7), 819–837 (2006)
Article Google Scholar
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing 9, 201–216 (2001)
Article Google Scholar
Gao, H., Chen, S.G.: Emotion classification of mandarin speech based on TEO nonlinear features. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 394–398 (2007)
Google Scholar
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters 6(10), 259–261 (1999)
Article Google Scholar
Dimitriadis, D., Maragos, P., Potamianos, A.: Robust AM–FM features for speech recognition. IEEE Signal Processing Letters 12(9), 621–624 (2005)
Article Google Scholar
Jankowski Jr., C.R., Quatieri, T.F., Reynolds, D.A.: Measuring fine structure in speech: Application to speaker identification. In: ICASSP 1995, Detroit, USA (May 1995)
Google Scholar
Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech and Audio Processing 7(5), 569–586 (1999)
Article Google Scholar
Grimaldi, M., Cummins, F.: Speaker identification using instantaneous frequencies. IEEE Trans. Audio, Speech and Language Processing 16(6), 1097–1111 (2008)
Article Google Scholar
Tsiakoulis, P., Potamianos, A.: Statistical Analysis of Amplitude Modulation in Speech Signals using an AM-FM Model. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan (April 2009)
Google Scholar
Potamianos, A., Maragos, P.: A comparison of energy operator and Hilbert transform approach to signal and speech demodulation. Signal Process 37(1), 95–120 (1994)
Article MATH Google Scholar
Mukhopadhyay, S., Ray, G.C.: A new interpretation of nonlinear energy operator and its efficacy in spike detection. IEEE Tans. on Biomedical Engg. 45(2), 180–187 (1998)
Article Google Scholar
Parihar, N., Picone, J., Pearce, D., Hirsch, H.G.: Performance analysis of the Aurora large vocabulary baseline system. In: Proceedings of the European Signal Processing Conference, Vienna, Austria (2004)
Google Scholar
Kaiser, J.F.: On a Simple Algorithm to Calculate the ‘Energy’ of a Signal,”. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, NM, pp. 381–384 (April 1990)
Google Scholar
Li, Q(P.), Huang, Y.: Robust speaker identification using an auditory-based feature. In: Proc. ICASSP, pp. 4514–4517 (2010)
Google Scholar
Kvedalen, E.: Signal processing using the Teager energy operator and other nonlinear operators, Cand. Scient Thesis, University of Oslo (May 2003)
Google Scholar
Au Yeung, S.-K., Siu, M.-H.: Improved performance of Aurora-4 using HTK and unsupervised MLLR adaptation. In: Proceedings of the Int. Conference on Spoken Language Processing, Jeju, Korea (2004)
Google Scholar
Young, S.J., et al.: HTK Book, Entropic Cambridge Research Laboratory Ltd., 3.4 edition (2006), http://htk.eng.cam.ac.uk/
Alam, M.J., Ouellet, P., Kenny, P., O’Shaughnessy, D.: Comparative Evaluation of Feature Normalization Techniques for Speaker Verification. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds.) NOLISP 2011. LNCS, vol. 7015, pp. 246–253. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

INRS-EMT, University of Quebec, Montreal, QC, Canada
Md. Jahangir Alam & Douglas O’Shaughnessy
CRIM, Montreal, QC, Canada
Md. Jahangir Alam & Patrick Kenny

Authors

Md. Jahangir Alam
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Kenny
View author publications
You can also search for this author in PubMed Google Scholar
Douglas O’Shaughnessy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

TCTS Lab, University of Mons, 31, Bouldevard Bolez, 7000, Mons, Belgium
Thomas Drugman
TCTS Lab, University of Mons, 31, Boulevard Dolez, 7000, Mons, Belgium
Thierry Dutoit

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alam, M.J., Kenny, P., O’Shaughnessy, D. (2013). Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-38847-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics