Tricks of Implementation

  • Zihan DingEmail author
  • Hao Dong


Previous chapters have provided the readers the main knowledge of deep reinforcement learning, main categories of reinforcement learning algorithms as well as their code implementations, and several practical projects for better understanding deep reinforcement learning in practice. However, due to the aforementioned challenges like low sample efficiency, instability, and so on, it may still be hard for the novices to employ those algorithms well in their own applications. So in this chapter, we summarize some common tricks and methods in detail, either mathematically or empirically for deep reinforcement learning applications in practice. The methods and tips are provided from both the stage of algorithm implementation and the stage of training and debugging, to avoid the readers from getting trapped in some practical dilemmas. These empirical tricks can be significantly effective in some cases, but not always. This is due to the complexity and sensitivity of deep reinforcement learning models, where sometimes an ensemble of tricks needs to be applied. People can also refer to this chapter to get some enlightenment of solutions when getting stuck on the projects.


Deep reinforcement learning Application Implementation Reward engineering Neural network Normalization Efficiency Stability 


  1. Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, Doron Y, Firoiu V, Harley T, Dunning I, et al. (2018) IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures. arXiv:180201561Google Scholar
  2. Fu J, Singh A, Ghosh D, Yang L, Levine S (2018) Variational inverse control with events: a general framework for data-driven reward definition. In: Advances in neural information processing systems, pp 8538–8547Google Scholar
  3. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256Google Scholar
  4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  5. Heess N, Hunt JJ, Lillicrap TP, Silver D (2015) Memory-based control with recurrent neural networks. arXiv:151204455Google Scholar
  6. Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. In: Proceedings of the Neural Information Processing Systems (Advances in neural information processing systems) conference, pp 2017–2025Google Scholar
  7. Mahmood AR, Korenkevych D, Vasan G, Ma W, Bergstra J (2018) Benchmarking reinforcement learning algorithms on real-world robots. arXiv:1809.07731Google Scholar
  8. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533CrossRefGoogle Scholar
  9. Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:13126120Google Scholar
  10. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, et al (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782):350–354CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Imperial College LondonLondonUK
  2. 2.Peking UniversityBeijingChina

Personalised recommendations