Skip to main content

Pommerman & NeurIPS 2018

Multi-Agent Competition

  • Conference paper
  • First Online:
Book cover The NeurIPS '18 Competition

Abstract

Pommerman is an exciting new environment for multi-agent research based on the classic game Bomberman. This publication covers its inaugural NeurIPS competition (and second overall), held at NeurIPS 2018, and featuring the 2v2 team environment.

In the first chapter, the first section familiarizes the audience with the game and its nuances, and the second section describes the competition and the results. In the remaining chapters, we then move on to the competitors’ descriptions in order of competition result.

Chapters two and four describe two agents made by colleagues at IBM. Chapter three’s dynamic Pommerman (dypm) agent is a particular implementation of real-time tree search with pessimistic scenarios, where standard tree search is limited to a specified depth, but each leaf is evaluated under a deterministic and pessimistic scenario. The evaluation with the deterministic scenario does not involve branching, contrary to the standard tree search, and can efficiently take into account significant events that the agent can encounter far ahead in the future. The pessimistic scenario is generated by assuming super strong enemies, and the level of pessimism is tuned via self-play. Using these techniques, the dypm agent can meet the real-time constraint when it is implemented with Python. Chapter one’s agent was similar to this, but uses a real-time search tree to evaluate moves. It is then followed by self-play for tuning.

Chapter three’s Eisenach agent was second at the Pommerman Team Competition, matching the performance of its predecessor on the earlier free-for-all competition. The chosen framework was online mini-max tree search with a quick C++ simulator, which enabled deeper search within the allowed 0.1 s. Several tactics were successfully applied to lower the amount of ties and avoid repeating situations. These helped to make games even more dense and exciting, while increasing the measured difference between agents. Bayes-based cost-optimization was applied, however it didn’t prove useful. The resulting agent passed the first 3 rounds at the competition without any tie or defeat and could even win against the overall winner in some of the matches.

Chapter five featured the Navocado agent. It was trained using Advantage-Actor-Critic (A2C) algorithm and guided by the Continual Match Based Training (COMBAT) framework. This agent first transformed the original continuous state representations into discrete state representations. This made it easier for the deep model to learn. Then, a new action space was proposed that allowed it to use its proposed destination as an action, enabling longer-term planning. Finally, the COMBAT framework allowed it to define adaptive rewards in different game stages. The Navocado agent was the top learning agent in the competition.

Finally, chapter six featured the nn_team_skynet955_skynet955 agent, which ranked second place in the learning agents category and fifth place overall. Equipped with an automatic module for action pruning, this agent was directly trained by end-to-end deep reinforcement learning in the partially observable team environment against a curriculum of opponents together with reward shaping. A single trained neural net model was selected to form a team for participating in the competition. This chapter discusses the difficulty of Pommerman as a benchmark for model-free reinforcement learning and describes the core elements upon which the agent was built.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Student at University of Alberta, work done as part-time intern at Borealis AI.

  2. 2.

    https://github.com/BorealisAI/pommerman-baseline.

References

  1. Cinjon Resnick, Wes Eldridge, David Ha, Denny Britz, Jakob Foerster, Julian Togelius, Kyunghyun Cho, and Joan Bruna. Pommerman: A multi-agent playground. CoRR, abs/1809.07124, 2018.

    Google Scholar 

  2. Cinjon Resnick, Wes Eldridge, Denny Britz, and David Ha. Playground: Ai research into multi-agent learning. https://github.com/MultiAgentLearning/playground, 2018.

  3. C. Resnick, R. Raileanu, S. Kapoor, A. Peysakhovich, K. Cho, and J. Bruna. Backplay: “Man muss immer umkehren”. ArXiv e-prints, July 2018.

    Google Scholar 

  4. Hongwei Zhou, Yichen Gong, Luvneesh Mugrai, Ahmed Khalifa, Nealen Andy, and Julian Togelius. A hybrid search agent in pommerman. In The International Conference on the Foundations of Digital Games (FDG), 2018.

    Google Scholar 

  5. Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278–287, 1999.

    Google Scholar 

  6. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning, 2013. cite arxiv:1312.5602Comment: NIPS Deep Learning Workshop 2013.

    Google Scholar 

  7. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.

    Google Scholar 

  8. Hal Daumé, John Langford, and Daniel Marcu. Search-based structured prediction. Machine learning, 75(3):297–325, 2009.

    Article  Google Scholar 

  9. Cinjon Resnick and Wes Eldridge. Pommerman neurips 2018 competition video. https://youtu.be/3U3yKZ6Yzew, 2018.

  10. Cinjon Resnick and Wes Eldridge. Pommerman neurips 2018 replays. https://www.pommerman.com/leaderboard, 2018.

  11. Takayuki Osogami and Toshihiro Takahashi. Real-time tree search with pessimistic scenarios. newblock Technical Report RT0982, IBM Research, February 2019.

    Google Scholar 

  12. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016.

    Google Scholar 

  13. Peng Peng, Liang Pang, Yufeng Yuan, and Chao Gao. Continual match based training in pommerman: Technical report. arXiv preprint arXiv:1812.07297, 2018.

    Google Scholar 

  14. Cinjon Resnick, Wes Eldridge, David Ha, Denny Britz, Jakob Foerster, Julian Togelius, Kyunghyun Cho, and Joan Bruna. Pommerman: A multi-agent playground. arXiv preprint arXiv:1809.07124, 2018.

    Google Scholar 

  15. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

    Google Scholar 

  16. Sam Devlin, Logan Yliniemi, Daniel Kudenko, and Kagan Tumer. Potential-based difference rewards for multiagent reinforcement learning. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pages 165–172, 2014.

    Google Scholar 

  17. Jakob N. Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H S Torr, Pushmeet Kohli, and Shimon Whiteson. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. In International Conference on Machine Learning, 2017.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cinjon Resnick .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Resnick, C., Gao, C., Márton, G., Osogami, T., Pang, L., Takahashi, T. (2020). Pommerman & NeurIPS 2018. In: Escalera, S., Herbrich, R. (eds) The NeurIPS '18 Competition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-29135-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29135-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29134-1

  • Online ISBN: 978-3-030-29135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics