A New Approach to Train LSTMs, GRUs, and Other Similar Networks for Data Generation

Kumar, Ravin

doi:10.1007/978-981-15-1718-1_14

Ravin Kumar ORCID: orcid.org/0000-0002-3416-2679¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 922))

Included in the following conference series:

International Conference on Next Generation Computing Technologies

337 Accesses

Abstract

Deep Learning has played a major role in the field of Artificial Intelligence in solving some of the toughest problems. In this paper, a training approach for deep learning model based on LSTMs, GRUs, and other similar variants is proposed. The effectiveness of this approach is checked on two different models, the first model was based on LSTMs and the second model was based on GRUs. To maintain fairness during comparison few parameters were made constant and then different tests were carried out by varying the dropout. It was found that both LSTMs and GRUs models which were trained using the proposed approach had quickly reduced their training loss without underfitting or overfitting, on the data and converges much faster than the traditional approach. A comparative study is done on a text generation task to show the differences between the quality of data generated by our proposed model with respect to the traditional model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space, Cornell University Library (2013). arXiv:1301.3781
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, Cornell University Library (2014). arXiv:1412.3555
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. 15, 1929–1958 (2015)
MathSciNet MATH Google Scholar
Cheng, G., Peddinti, V., Povery, D., Manohar, V., Khudanpur, S., Yan, Y.: An exploration of dropout with LSTMs. In: INTERSPEECH, Annual Conference of the International Speech Communication Association, Stockholm, Sweden, 20 August 2017
Google Scholar
Graves, A., Mohamed, A.-R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013)
Google Scholar
Palangu, H., et al.: Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval, Cornell University Library (2015). arXiv:1501.06922v3
Nayebi, A., Vitelli, M.: GRUV: Algorithmic Music Generation using Recurrent Neural Networks (2015). https://cs224d.stanford.edu/reports/NayebiAran.pdf
Wen, T., Gasic, M., Mrksic, N., Su, P.-H., Vandyke, D., Young, S.: Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. In: Proceedings of EMNLP, Lisbon, Portugal, pp. 1711–1721, September 2015
Google Scholar
Xiao, Y., Xiong, S., Duan, P.: Music generation system based on LSTM. In: 4^th International Conference on Electrical & Electronics Engineering and Computer Science, Advances in Computer Science and Research, vol. 50. ICEEECS (2016)
Google Scholar
Huang, A., Wu, R.: Deep Learning for Music, Cornell University Library (2016). arXiv:1606.04930
Sutskever, I. Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112 (2014)
Google Scholar
Github repository. https://github.com/mr-ravin/ShadowNetwork. Accessed 3 Nov 2018
Heilige Quotes bot on Twitter. https://twitter.com/HeiligeQuotes. Accessed 3 Nov 2018

Download references

Author information

Authors and Affiliations

Department of Computer Science, Meerut Institute of Engineering and Technology, Meerut, 250005, Uttar Pradesh, India
Ravin Kumar

Authors

Ravin Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ravin Kumar .

Editor information

Editors and Affiliations

University of Petroleum and Energy Studies, Dehradun, India
Manish Prateek
University of Petroleum and Energy Studies, Dehradun, India
Durgansh Sharma
University of Petroleum and Energy Studies, Dehradun, India
Rajeev Tiwari
University of Petroleum and Energy Studies, Dehradun, India
Rashmi Sharma
National Institute of Technology, Srinagar, Uttrakhand, India
Kamal Kumar
Thapar University, Patiala, India
Neeraj Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, R. (2019). A New Approach to Train LSTMs, GRUs, and Other Similar Networks for Data Generation. In: Prateek, M., Sharma, D., Tiwari, R., Sharma, R., Kumar, K., Kumar, N. (eds) Next Generation Computing Technologies on Computational Intelligence. NGCT 2018. Communications in Computer and Information Science, vol 922. Springer, Singapore. https://doi.org/10.1007/978-981-15-1718-1_14

Download citation

DOI: https://doi.org/10.1007/978-981-15-1718-1_14
Published: 24 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1717-4
Online ISBN: 978-981-15-1718-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics