Pommerman & NeurIPS 2018

Resnick, Cinjon; Gao, Chao; Márton, Görög; Osogami, Takayuki; Pang, Liang; Takahashi, Toshihiro

doi:10.1007/978-3-030-29135-8_2

Cinjon Resnick⁶,
Chao Gao⁷,
Görög Márton⁸,
Takayuki Osogami⁹,
Liang Pang¹⁰ &
…
Toshihiro Takahashi⁹

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

1169 Accesses
1 Citations

Abstract

Pommerman is an exciting new environment for multi-agent research based on the classic game Bomberman. This publication covers its inaugural NeurIPS competition (and second overall), held at NeurIPS 2018, and featuring the 2v2 team environment.

In the first chapter, the first section familiarizes the audience with the game and its nuances, and the second section describes the competition and the results. In the remaining chapters, we then move on to the competitors’ descriptions in order of competition result.

Chapters two and four describe two agents made by colleagues at IBM. Chapter three’s dynamic Pommerman (dypm) agent is a particular implementation of real-time tree search with pessimistic scenarios, where standard tree search is limited to a specified depth, but each leaf is evaluated under a deterministic and pessimistic scenario. The evaluation with the deterministic scenario does not involve branching, contrary to the standard tree search, and can efficiently take into account significant events that the agent can encounter far ahead in the future. The pessimistic scenario is generated by assuming super strong enemies, and the level of pessimism is tuned via self-play. Using these techniques, the dypm agent can meet the real-time constraint when it is implemented with Python. Chapter one’s agent was similar to this, but uses a real-time search tree to evaluate moves. It is then followed by self-play for tuning.

Chapter three’s Eisenach agent was second at the Pommerman Team Competition, matching the performance of its predecessor on the earlier free-for-all competition. The chosen framework was online mini-max tree search with a quick C++ simulator, which enabled deeper search within the allowed 0.1 s. Several tactics were successfully applied to lower the amount of ties and avoid repeating situations. These helped to make games even more dense and exciting, while increasing the measured difference between agents. Bayes-based cost-optimization was applied, however it didn’t prove useful. The resulting agent passed the first 3 rounds at the competition without any tie or defeat and could even win against the overall winner in some of the matches.

Chapter five featured the Navocado agent. It was trained using Advantage-Actor-Critic (A2C) algorithm and guided by the Continual Match Based Training (COMBAT) framework. This agent first transformed the original continuous state representations into discrete state representations. This made it easier for the deep model to learn. Then, a new action space was proposed that allowed it to use its proposed destination as an action, enabling longer-term planning. Finally, the COMBAT framework allowed it to define adaptive rewards in different game stages. The Navocado agent was the top learning agent in the competition.

Finally, chapter six featured the nn_team_skynet955_skynet955 agent, which ranked second place in the learning agents category and fifth place overall. Equipped with an automatic module for action pruning, this agent was directly trained by end-to-end deep reinforcement learning in the partially observable team environment against a curriculum of opponents together with reward shaping. A single trained neural net model was selected to form a team for participating in the competition. This chapter discusses the difficulty of Pommerman as a benchmark for model-free reinforcement learning and describes the core elements upon which the agent was built.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Student at University of Alberta, work done as part-time intern at Borealis AI.
2.
https://github.com/BorealisAI/pommerman-baseline.

References

Cinjon Resnick, Wes Eldridge, David Ha, Denny Britz, Jakob Foerster, Julian Togelius, Kyunghyun Cho, and Joan Bruna. Pommerman: A multi-agent playground. CoRR, abs/1809.07124, 2018.
Google Scholar
Cinjon Resnick, Wes Eldridge, Denny Britz, and David Ha. Playground: Ai research into multi-agent learning. https://github.com/MultiAgentLearning/playground, 2018.
C. Resnick, R. Raileanu, S. Kapoor, A. Peysakhovich, K. Cho, and J. Bruna. Backplay: “Man muss immer umkehren”. ArXiv e-prints, July 2018.
Google Scholar
Hongwei Zhou, Yichen Gong, Luvneesh Mugrai, Ahmed Khalifa, Nealen Andy, and Julian Togelius. A hybrid search agent in pommerman. In The International Conference on the Foundations of Digital Games (FDG), 2018.
Google Scholar
Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278–287, 1999.
Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning, 2013. cite arxiv:1312.5602Comment: NIPS Deep Learning Workshop 2013.
Google Scholar
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
Google Scholar
Hal Daumé, John Langford, and Daniel Marcu. Search-based structured prediction. Machine learning, 75(3):297–325, 2009.
Article Google Scholar
Cinjon Resnick and Wes Eldridge. Pommerman neurips 2018 competition video. https://youtu.be/3U3yKZ6Yzew, 2018.
Cinjon Resnick and Wes Eldridge. Pommerman neurips 2018 replays. https://www.pommerman.com/leaderboard, 2018.
Takayuki Osogami and Toshihiro Takahashi. Real-time tree search with pessimistic scenarios. newblock Technical Report RT0982, IBM Research, February 2019.
Google Scholar
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016.
Google Scholar
Peng Peng, Liang Pang, Yufeng Yuan, and Chao Gao. Continual match based training in pommerman: Technical report. arXiv preprint arXiv:1812.07297, 2018.
Google Scholar
Cinjon Resnick, Wes Eldridge, David Ha, Denny Britz, Jakob Foerster, Julian Togelius, Kyunghyun Cho, and Joan Bruna. Pommerman: A multi-agent playground. arXiv preprint arXiv:1809.07124, 2018.
Google Scholar
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
Google Scholar
Sam Devlin, Logan Yliniemi, Daniel Kudenko, and Kagan Tumer. Potential-based difference rewards for multiagent reinforcement learning. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pages 165–172, 2014.
Google Scholar
Jakob N. Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H S Torr, Pushmeet Kohli, and Shimon Whiteson. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. In International Conference on Machine Learning, 2017.
Google Scholar

Download references

Author information

Authors and Affiliations

NYU, New York, NY, USA
Cinjon Resnick
Borealis AI, Edmonton, AB, Canada
Chao Gao
AImotive, Budapest, Hungary
Görög Márton
IBM, New York, NY, USA
Takayuki Osogami & Toshihiro Takahashi
CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Liang Pang

Authors

Cinjon Resnick
View author publications
You can also search for this author in PubMed Google Scholar
Chao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Görög Márton
View author publications
You can also search for this author in PubMed Google Scholar
Takayuki Osogami
View author publications
You can also search for this author in PubMed Google Scholar
Liang Pang
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiro Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cinjon Resnick .

Editor information

Editors and Affiliations

Universitat de Barcelona and Computer, Vision Center, Barcelona, Spain
Sergio Escalera
Amazon (Berlin), Berlin, Berlin, Germany
Ralf Herbrich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Resnick, C., Gao, C., Márton, G., Osogami, T., Pang, L., Takahashi, T. (2020). Pommerman & NeurIPS 2018. In: Escalera, S., Herbrich, R. (eds) The NeurIPS '18 Competition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-29135-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-29135-8_2
Published: 30 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29134-1
Online ISBN: 978-3-030-29135-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics