Learning How to Play Bomberman with Deep Reinforcement and Imitation Learning

  • Ícaro GoulartEmail author
  • Aline PaesEmail author
  • Esteban CluaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11863)


Making artificial agents that learn how to play is a long-standing goal in the area of Game AI. Recently, several successful cases have emerged driven by Reinforcement Learning (RL) and neural network-based approaches. However, in most of the cases, the results have been achieved by training directly from pixel frames with valuable computational resources. In this paper, we devise agents that learn how to play the popular game of Bomberman by relying on state representations and RL-based algorithms without looking at the pixel level. To that, we designed five vector-based state representations and implemented Bomberman on the top of the Unity game engine through the ML-agents toolkit. We enhance the ML-agents algorithms by developing an Imitation-based learner (IL) that improves its model with the Actor-Critic Proximal-Policy Optimization (PPO) method. We compared this approach with a PPO-only learner that uses either a Multi-Layer Perceptron or a Long-Short Term-Memory network (LSTM). We conducted several pieces of training and tournament experiments by making the agents play against each other. The hybrid state representation and our IL followed by PPO learning algorithm achieve the best overall quantitative results, and we also observed that their agents learn a correct Bomberman behavior.


Bomberman Proximal Policy Optimization Reinforcement Learning LSTM Imitation Learning 


  1. 1.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction (2017). (in preparation)Google Scholar
  2. 2.
    Li, Y.: Deep reinforcement learning: an overview. CoRR, vol. abs/1701.07274 (2017)Google Scholar
  3. 3.
    Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR, vol. abs/1312.5602 (2013)Google Scholar
  5. 5.
    Resnick, C., et al.: Pommerman: a multi-agent playground. In: Joint Proceedings of the AIIDE 2018 Workshops Co-located with 14th AAAI Conference on AI and Interactive Digital Entertainment (AIIDE 2018) (2018)Google Scholar
  6. 6.
    da Cruz Lopes, M.A.: Bomberman as an artificial intelligence platform. Master’s thesis, Universidade do Porto (2016)Google Scholar
  7. 7.
    Karasik, E., Hemed, A.: Intro to AI Bomberman (2013). Accessed 25 Nov 2018
  8. 8.
    Kormelink, J.G., Drugan, M.M., Wiering, M.A.: Exploration methods for connectionist Q-learning in Bomberman. In: Proceedings of the 10th International Conference on Agents and Artificial Intelligence, ICAART 2018, pp. 355–362 (2018)Google Scholar
  9. 9.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
  10. 10.
    Juliani, A., et al.: Unity: a general platform for intelligent agents. CoRR, vol. abs/1809.02627 (2018)Google Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Lapan, M.: Deep Reinforcement Learning Hands-On: Apply Modern RL Methods, with Deep Q-Networks, Value Iteration, Policy Gradients, TRPO, AlphaGo Zero and More. Packt Publishing Ltd. (2018)Google Scholar
  13. 13.
    Hornik, K., Stinchcombe, M.B., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRefGoogle Scholar
  14. 14.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)Google Scholar
  15. 15.
    Lanham, M.: Learn Unity ML-Agents-Fundamentals of Unity Machine Learning: Incorporate new powerful ML algorithms such as Deep Reinforcement Learning for games. Packt Publishing Ltd. (2018)Google Scholar
  16. 16.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  1. 1.Institute of ComputingUniversidade Federal FluminenseNiteróiBrazil

Personalised recommendations