Benchmarking Deep and Non-deep Reinforcement Learning Algorithms for Discrete Environments

  • Fernando F. DuarteEmail author
  • Nuno Lau
  • Artur Pereira
  • Luís P. Reis
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1093)


Given the plethora of Reinforcement Learning algorithms available in the literature, it can prove challenging to decide on the most appropriate one to use in order to solve a given Reinforcement Learning task. This work presents a benchmark study on the performance of several Reinforcement Learning algorithms for discrete learning environments. The study includes several deep as well as non-deep learning algorithms, with special focus on the Deep Q-Network algorithm and its variants. Neural Fitted Q-Iteration, the predecessor of Deep Q-Network as well as Vanilla Policy Gradient and a planner were also included in this assessment in order to provide a wider range of comparison between different approaches and paradigms. Three learning environments were used in order to carry out the tests, including a 2D maze and two OpenAI Gym environments, namely a custom-built Foraging/Tagging environment and the CartPole environment.


Reinforcement Learning Planning Deep Q-Network Q-Learning Value Iteration Neural Fitted Q-Iteration Policy gradient optimization 



This work was supported by National Funds through the FCT - Foundation for Science and Technology in the context of the project UID/CEC/00127/2019 and also by FCT PhD scholarship SFRH/BD/145723/2019.


  1. 1.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)CrossRefGoogle Scholar
  2. 2.
    Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)Google Scholar
  3. 3.
    Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 2094–2100. AAAI Press, Phoenix (2016)Google Scholar
  4. 4.
    Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning. In: 33rd International Conference on Machine Learning (ICML 2016), pp. 1995–2003. JMLR, New York (2016)Google Scholar
  5. 5.
    Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations, San Juan, Puerto Rico (2016)Google Scholar
  6. 6.
    Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: 34th International Conference on Machine Learning, pp. 449–458. JMLR, Sydney (2017)Google Scholar
  7. 7.
    Sutton, R.S., Mcallester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: 12th International Conference on Neural Information Processing Systems, pp. 1057–1063. MIT Press, Cambridge (1999)Google Scholar
  8. 8.
    Schulman, J., Levine, S., Moritz, P., Jordan, M., Abbeel, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, pp. 1889–1897. JMLR, Lille (2015)Google Scholar
  9. 9.
    Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5280–5289. Curran Associates, California, USA (2017)Google Scholar
  10. 10.
    Urtans, E., Nikitenko, A.: Survey of deep Q-network variants in PyGame learning environment. In: 2nd International Conference on Deep Learning Technologies, pp. 27–36. ACM, Chongqing (2018)Google Scholar
  11. 11.
    Van Hasselt, H.: Double Q-learning. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 2613–2621. Curran Associates, Vancouver (2010)Google Scholar
  12. 12.
    Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: 33rd International Conference on Machine Learning, pp. 1329–1338. JMLR, New York (2016)Google Scholar
  13. 13.
    Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34(6), 26–38 (2017)CrossRefGoogle Scholar
  14. 14.
    Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 3207–3214. AAAI Press, Louisiana (2018)Google Scholar
  15. 15.
    POPF Homepage. Accessed 13 June 2019

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Fernando F. Duarte
    • 1
    Email author
  • Nuno Lau
    • 1
  • Artur Pereira
    • 1
  • Luís P. Reis
    • 2
  1. 1.IEETAUniversity of AveiroAveiroPortugal
  2. 2.LIACCUniversity of PortoPortoPortugal

Personalised recommendations