Point Process Modeling of Spectral Peaks for Low Resource Robust Speech Recognition

Mandal, Anupam; Kumar, K. R. Prasanna; Mitra, Pabitra

doi:10.1007/978-3-319-71928-3_22

Anupam Mandal¹⁶,
K. R. Prasanna Kumar¹⁶ &
Pabitra Mitra¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10682))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

1110 Accesses
1 Citations

Abstract

The paper proposes an approach for noise robust speech recognition in low resource setting. The approach involves formulation of whole word point process model based on word specific spectral peak event in selected groups of mel banks. The performance of the proposed approach is demonstrated on an isolated word recognizer on noisy speech samples (additive white Gaussian noise) at different SNR levels ranging from 0 dB to clean speech. The training is carried out with examples varying from 5 to 80. Performance comparison with HMM based system trained with mel-frequency cepstral coefficients (MFCC) features show an improvement of 8–17% (absolute) depending on SNR level when the number of training examples are less than 10. Since the approach relies only on positions and magnitudes of spectral peaks derived from spoken word examples without any language specific resources, the same can potentially be applied for any language. It is also shown that our approach recognizes those words better that are poorly recognized by HMMs across all SNR levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Jansen, A., Niyogi, P.: Point process models for spotting keywords in continuous speech. IEEE Trans. Audio Speech Lang. Process. 17(8), 1457–1470 (2009)
Article Google Scholar
Jansen, A., Dupoux, E., Goldwater, S., Johnson, M., Khudanpur, S., Church, K., Feldman, N., Hermansky, H., Metze, F., Rose, R., et al.: A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition (2013)
Google Scholar
Jansen, A., Mesgarani, N., Niyogi, P.: Point process models of spectro-temporal modulation events for speech recognition. In: 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 104–108. IEEE (2010)
Google Scholar
Jansen, A., Niyogi, P.: Robust keyword spotting with rapidly adapting point process models. In: Tenth Annual Conference of the International Speech Communication Association (2009)
Google Scholar
Jansen, A., Niyogi, P.: Detection-based speech recognition with sparse point process models. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4362–4365. IEEE (2010)
Google Scholar
Liu, C., Jansen, A., Khudanpur, S.: Context-dependent point process models for keyword search and detection-based ASR. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6025–6029. IEEE (2016)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Wang, Y., Yang, J.A., Lu, J., Liu, H., Wang, L.W.: Hierarchical deep belief networks based point process model for keywords spotting in continuous speech. Int. J. Commun. Syst. 28(3), 483–496 (2015)
Article Google Scholar
www.htk.eng.cam.ac.uk/docs/docs.shtml
www.ldc.upenn.edu/Catalog/ti46.readme.html

Download references

Author information

Authors and Affiliations

Center for AI and Robotics, Bangalore, India
Anupam Mandal & K. R. Prasanna Kumar
Department of CSE, IIT Kharagpur, Kharagpur, India
Pabitra Mitra

Authors

Anupam Mandal
View author publications
You can also search for this author in PubMed Google Scholar
K. R. Prasanna Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Mitra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anupam Mandal .

Editor information

Editors and Affiliations

Indian Statistical Institute, Kolkata, India
Ashish Ghosh
Institute for Development and Research in Banking Technology, Hyderabad, India
Rajarshi Pal
Indian Institute of Information Technology, Sri City, India
Rajendra Prasath

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandal, A., Kumar, K.R.P., Mitra, P. (2017). Point Process Modeling of Spectral Peaks for Low Resource Robust Speech Recognition. In: Ghosh, A., Pal, R., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2017. Lecture Notes in Computer Science(), vol 10682. Springer, Cham. https://doi.org/10.1007/978-3-319-71928-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-71928-3_22
Published: 28 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71927-6
Online ISBN: 978-3-319-71928-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics