A Time Delay Neural Network Acoustic Modeling for Hindi Speech Recognition

Kumar, Ankit; Aggarwal, R. K.

doi:10.1007/978-981-15-0694-9_40

A Time Delay Neural Network Acoustic Modeling for Hindi Speech Recognition

Ankit Kumar^13,14 &
R. K. Aggarwal¹³

Conference paper
First Online: 03 January 2020

1684 Accesses
4 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 94))

Abstract

Automatic Speech Recognition (ASR) systems have become more popular recently for low resource languages. India has 22 official language and more than two thousands other regional languages, the majority have low resources. The standard resources are also limited for the Hindi language. In this paper, the implementation of continuous Hindi ASR system has been done using Time Delay Neural Network (TDNN) based acoustic modeling significantly improves the performance of baseline Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) based Hindi ASR system up to 11%. Further improvement of 3% and 2% have been recorded by applying i-vector adaptation, interpolated language modeling in this work.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language models for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.
Article Google Scholar
Deng, L., Wang, K., Acero, A., Hon, H. W., Droppo, J., Boulis, C., et al. (2002). Distributed speech processing in MiPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8), 605–619.
Article Google Scholar
Vegesna, V. V. R., Gurugubelli, K., Vydana, H. K., Pulugandla, B., Shrivastava, M., & Vuppala, A. K. (2017). Dnn-hmm acoustic modeling for large vocabulary Telugu speech recognition. In International Conference on Mining Intelligence and Knowledge Exploration (pp. 189-197). Springer, Cham.
Google Scholar
Malioutov, D. M., Sanghavi, S. R., & Willsky, A. S. (2010). Sequential compressed sensing. IEEE Journal of Selected Topics in Signal Processing, 4(2), 435–444.
Article Google Scholar
Ji, S., Xue, Y., & Carin, L. (2008). Bayesian compressive sensing. IEEE Transactions on Signal Processing, 56(6), 2346–2356.
Article MathSciNet Google Scholar
Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain. Speech Communication, 56, 167–180.
Article Google Scholar
Povey, D., Peddinti, V., Galvez, D., Ghahremani, P., Manohar, V., Na, X., et al. (2016). Purely sequence-trained neural networks for ASR based on lattice-free MMI. In Interspeech (pp. 2751–2755).
Google Scholar
Yoshioka, T., & Gales, M. J. (2015). Environmentally robust ASR front-end for deep neural network acoustic models. Computer Speech & Language, 31(1), 65–86.
Article Google Scholar
Abraham, B., Umesh, S., & Joy, N. M. (2016). Overcoming data sparsity in acoustic modeling of low-resource language by borrowing data and model parameters from high-resource languages. In Interspeech (pp. 3037–3041).
Google Scholar
Xiong, W., Wu, L., Alleva, F., Droppo, J., Huang, X., & Stolcke, A. (2018). The Microsoft 2017 conversational speech recognition system. In2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5934–5938). IEEE.
Google Scholar
Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533–1545.
Article Google Scholar
Saon, G., Kuo, H. K. J., Rennie, S., & Picheny, M. (2015). The IBM 2015 English conversational telephone speech recognition system. arXiv:1505.05899.
Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In International Conference on Machine Learning (pp. 1764–1772).
Google Scholar
Mohamed, A. R., Seide, F., Yu, D., Droppo, J., Stoicke, A., Zweig, G., et al. (2015). Deep bi-directional recurrent networks over spectral windows. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 78–83). IEEE.
Google Scholar
Jozefowich, R., et al. (2016). Exploring the limits of language modeling. arXiv:1602.02410.
Stolcke, A. (2002). SRILM-an extensible language modeling toolkit. In Seventh International Conference on Spoken Language Processing.
Google Scholar
Kuamr, A., Dua, M., & Choudhary, A. (2014). Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 International Conference on Electronics and Communication Systems (ICECS) (pp. 1–5). IEEE.
Google Scholar
Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts. In Sixteenth Annual Conference of the International Speech Communication Association.
Google Scholar
Peddinti, V., et al. (2015). Reverberation robust acoustic modeling using i-vectors with time delay neural networks. In Sixteenth Annual Conference of the International Speech Communication Association.
Google Scholar
Peddinti, V., et al. (2015). Jhu aspire system: Robust lvcsr with tdnns, ivector adaptation and rnn-lms. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE.
Google Scholar
Kuamr, A., Dua, M., & Choudhary, T. (2014). Continuous Hindi speech recognition using Gaussian mixture HMM. In IEEE Students’ Conference on Electrical (pp. 1–5). IEEE: Electronics and Computer Science.
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. (2011). The Kaldi speech recognition toolkit. In IEEE Workshop on Automatic Speech Recognition and Understanding (pp. 1–4).
Google Scholar
Samudravijaya, K., Rao, P. V. S., & Agrawal, S. S. (2002). Hindi speech database. In International Conference on spoken Language Processing (pp. 456–464). China: Beijing.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, National Institute of Technology, Kurukshetra, Haryana, India
Ankit Kumar & R. K. Aggarwal
Computer Science and Engineering Department, Galgotias University, Greater Noida, India
Ankit Kumar

Authors

Ankit Kumar
View author publications
You can also search for this author in PubMed Google Scholar
R. K. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ankit Kumar .

Editor information

Editors and Affiliations

Smart Grid and Renewable Energy, University of Agder, Kristiansand, Norway
Mohan L. Kolhe
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, NIT Agartala, Tripura, India
Munesh C. Trivedi
Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, India
Krishn K. Mishra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, A., Aggarwal, R.K. (2020). A Time Delay Neural Network Acoustic Modeling for Hindi Speech Recognition. In: Kolhe, M., Tiwari, S., Trivedi, M., Mishra, K. (eds) Advances in Data and Information Sciences. Lecture Notes in Networks and Systems, vol 94. Springer, Singapore. https://doi.org/10.1007/978-981-15-0694-9_40

Download citation

DOI: https://doi.org/10.1007/978-981-15-0694-9_40
Published: 03 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0693-2
Online ISBN: 978-981-15-0694-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics