Research on Acceleration Method of Speech Recognition Training

Bai, Liang; Jiang, Jingfei; Dou, Yong

doi:10.1007/978-981-13-2423-9_4

Liang Bai¹⁰,
Jingfei Jiang¹⁰ &
Yong Dou¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 908))

Included in the following conference series:

Conference on Advanced Computer Architecture

748 Accesses

Abstract

Recurrent Neural Network (RNN) is now widely used in speech recognition. Experiments show that it has significant advantages over traditional methods, but complex computation limits its application, especially in real-time application scenarios. Recurrent neural network is heavily dependent on the pre- and post-state in calculation process, and there is much overlap information, so overlapping information can be reduced to accelerate training. This paper construct a training acceleration structure, which reduces the computation cost and accelerates training speed by discarding the dependence of pre- and post- state of RNN. Then correcting the recognition results errors with text corrector. We verify the proposed method on the TIMIT and Librispeech datasets, which prove that this approach achieves about 3 times speedup with little relative accuracy reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). Book in Preparation
MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Yao, K., Cohn, T., Vylomova, K., Duh, K., Dyer, C.: Depth-gated recurrent neural networks. arXiv preprint arXiv: 1508.03790v2 (2015)
Google Scholar
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference, pp. 2342–2350 (2015)
Google Scholar
Chung, J., Gulcehre, C., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv: 1412.35556 (2014)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Polosukhin, I.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wu, Y., Schuster, M., Chen, Z.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Sak, H., Senior, A., Françoise, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: INTERSPEECH (2014)
Google Scholar
Frank, S., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), pp. 437–440, Florence, Italy (2011)
Google Scholar
Michael, S., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, pp. 7398–7402 (2013)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: International Conference on Machine Learning, pp. 1764–1772 (2014)
Google Scholar
Lei, T., Zhang, Y., Artzi, Y.: Training RNNs as fast as CNNs. arXiv preprint arXiv:1709.02755 (2017)
Graves, A., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, pp. 369–376 (2016)
Google Scholar
Zhang, Y., He, P.L., Xiang, W., Li, M.: A discriminative reranking approach to spelling correction. J. Softw. 19(3), 557–564 (2008)
Article Google Scholar
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of Annual Meeting of the Association for Computational Linguistics, pp. 144–151 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Liang Bai, Jingfei Jiang & Yong Dou

Authors

Liang Bai
View author publications
You can also search for this author in PubMed Google Scholar
Jingfei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingfei Jiang .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Chao Li
National University of Defense Technology, Changsha, China
Junjie Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, L., Jiang, J., Dou, Y. (2018). Research on Acceleration Method of Speech Recognition Training. In: Li, C., Wu, J. (eds) Advanced Computer Architecture. ACA 2018. Communications in Computer and Information Science, vol 908. Springer, Singapore. https://doi.org/10.1007/978-981-13-2423-9_4

Download citation

DOI: https://doi.org/10.1007/978-981-13-2423-9_4
Published: 13 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2422-2
Online ISBN: 978-981-13-2423-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)