Skip to main content

A Time Delay Neural Network Acoustic Modeling for Hindi Speech Recognition

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 94))

Abstract

Automatic Speech Recognition (ASR) systems have become more popular recently for low resource languages. India has 22 official language and more than two thousands other regional languages, the majority have low resources. The standard resources are also limited for the Hindi language. In this paper, the implementation of continuous Hindi ASR system has been done using Time Delay Neural Network (TDNN) based acoustic modeling significantly improves the performance of baseline Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) based Hindi ASR system up to 11%. Further improvement of 3% and 2% have been recorded by applying i-vector adaptation, interpolated language modeling in this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aggarwal, R. K., & Dave, M. (2012). Integration of multiple acoustic and language models for improved Hindi speech recognition system. International Journal of Speech Technology, 15(2), 165–180.

    Article  Google Scholar 

  2. Deng, L., Wang, K., Acero, A., Hon, H. W., Droppo, J., Boulis, C., et al. (2002). Distributed speech processing in MiPad’s multimodal user interface. IEEE Transactions on Speech and Audio Processing, 10(8), 605–619.

    Article  Google Scholar 

  3. Vegesna, V. V. R., Gurugubelli, K., Vydana, H. K., Pulugandla, B., Shrivastava, M., & Vuppala, A. K. (2017). Dnn-hmm acoustic modeling for large vocabulary Telugu speech recognition. In International Conference on Mining Intelligence and Knowledge Exploration (pp. 189-197). Springer, Cham.

    Google Scholar 

  4. Malioutov, D. M., Sanghavi, S. R., & Willsky, A. S. (2010). Sequential compressed sensing. IEEE Journal of Selected Topics in Signal Processing, 4(2), 435–444.

    Article  Google Scholar 

  5. Ji, S., Xue, Y., & Carin, L. (2008). Bayesian compressive sensing. IEEE Transactions on Signal Processing, 56(6), 2346–2356.

    Article  MathSciNet  Google Scholar 

  6. Mohan, A., Rose, R., Ghalehjegh, S. H., & Umesh, S. (2014). Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain. Speech Communication, 56, 167–180.

    Article  Google Scholar 

  7. Povey, D., Peddinti, V., Galvez, D., Ghahremani, P., Manohar, V., Na, X., et al. (2016). Purely sequence-trained neural networks for ASR based on lattice-free MMI. In Interspeech (pp. 2751–2755).

    Google Scholar 

  8. Yoshioka, T., & Gales, M. J. (2015). Environmentally robust ASR front-end for deep neural network acoustic models. Computer Speech & Language, 31(1), 65–86.

    Article  Google Scholar 

  9. Abraham, B., Umesh, S., & Joy, N. M. (2016). Overcoming data sparsity in acoustic modeling of low-resource language by borrowing data and model parameters from high-resource languages. In Interspeech (pp. 3037–3041).

    Google Scholar 

  10. Xiong, W., Wu, L., Alleva, F., Droppo, J., Huang, X., & Stolcke, A. (2018). The Microsoft 2017 conversational speech recognition system. In2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5934–5938). IEEE.

    Google Scholar 

  11. Abdel-Hamid, O., Mohamed, A. R., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533–1545.

    Article  Google Scholar 

  12. Saon, G., Kuo, H. K. J., Rennie, S., & Picheny, M. (2015). The IBM 2015 English conversational telephone speech recognition system. arXiv:1505.05899.

  13. Graves, A., & Jaitly, N. (2014). Towards end-to-end speech recognition with recurrent neural networks. In International Conference on Machine Learning (pp. 1764–1772).

    Google Scholar 

  14. Mohamed, A. R., Seide, F., Yu, D., Droppo, J., Stoicke, A., Zweig, G., et al. (2015). Deep bi-directional recurrent networks over spectral windows. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (pp. 78–83). IEEE.

    Google Scholar 

  15. Jozefowich, R., et al. (2016). Exploring the limits of language modeling. arXiv:1602.02410.

  16. Stolcke, A. (2002). SRILM-an extensible language modeling toolkit. In Seventh International Conference on Spoken Language Processing.

    Google Scholar 

  17. Kuamr, A., Dua, M., & Choudhary, A. (2014). Implementation and performance evaluation of continuous Hindi speech recognition. In 2014 International Conference on Electronics and Communication Systems (ICECS) (pp. 1–5). IEEE.

    Google Scholar 

  18. Peddinti, V., Povey, D., & Khudanpur, S. (2015). A time delay neural network architecture for efficient modeling of long temporal contexts. In Sixteenth Annual Conference of the International Speech Communication Association.

    Google Scholar 

  19. Peddinti, V., et al. (2015). Reverberation robust acoustic modeling using i-vectors with time delay neural networks. In Sixteenth Annual Conference of the International Speech Communication Association.

    Google Scholar 

  20. Peddinti, V., et al. (2015). Jhu aspire system: Robust lvcsr with tdnns, ivector adaptation and rnn-lms. In IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE.

    Google Scholar 

  21. Kuamr, A., Dua, M., & Choudhary, T. (2014). Continuous Hindi speech recognition using Gaussian mixture HMM. In IEEE Students’ Conference on Electrical (pp. 1–5). IEEE: Electronics and Computer Science.

    Google Scholar 

  22. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. (2011). The Kaldi speech recognition toolkit. In IEEE Workshop on Automatic Speech Recognition and Understanding (pp. 1–4).

    Google Scholar 

  23. Samudravijaya, K., Rao, P. V. S., & Agrawal, S. S. (2002). Hindi speech database. In International Conference on spoken Language Processing (pp. 456–464). China: Beijing.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankit Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, A., Aggarwal, R.K. (2020). A Time Delay Neural Network Acoustic Modeling for Hindi Speech Recognition. In: Kolhe, M., Tiwari, S., Trivedi, M., Mishra, K. (eds) Advances in Data and Information Sciences. Lecture Notes in Networks and Systems, vol 94. Springer, Singapore. https://doi.org/10.1007/978-981-15-0694-9_40

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0694-9_40

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0693-2

  • Online ISBN: 978-981-15-0694-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics