Fast Evolutionary Adaptation for Monte Carlo Tree Search
This paper describes a new adaptive Monte Carlo Tree Search (MCTS) algorithm that uses evolution to rapidly optimise its performance. An evolutionary algorithm is used as a source of control parameters to modify the behaviour of each iteration (i.e. each simulation or roll-out) of the MCTS algorithm; in this paper we largely restrict this to modifying the behaviour of the random default policy, though it can also be applied to modify the tree policy.
This method of tightly integrating evolution into the MCTS algorithm means that evolutionary adaptation occurs on a much faster time-scale than has previously been achieved, and addresses a particular problem with MCTS which frequently occurs in real-time video and control problems: that uniform random roll-outs may be uninformative.
Results are presented on the classic Mountain Car reinforcement learning benchmark and also on a simplified version of Space Invaders. The results clearly demonstrate the value of the approach, significantly outperforming “standard” MCTS in each case. Furthermore, the adaptation is almost immediate, with no perceptual delay as the system learns: the agent frequently performs well from its very first game.
KeywordsEvolutionary Algorithm Video Game Original Game Computational Budget Default Policy
Unable to display preview. Download preview PDF.
- 1.Alhejali, A., Lucas, S.: Using Genetic Programming to Evolve Heuristics for a MonteCarlo Tree Search Ms Pac-Man Agent, In: IEEE Conference on Computational Intelligence and Games, pp. 65–72 (2013)Google Scholar
- 2.Benbassat, A., Sipper, M.: EvoMCTS: Enhancing MCTS-Based Players through Genetic Programming, In: IEEE Conference on Computational Intelligence and Games, pp. 57–64 (2013)Google Scholar
- 4.Cook, M., Colton, S.: Multi-faceted Evolution of Simple Arcade Games. In: IEEE Conference on Computational Intelligence in Games (CIG), pp. 289–296 (2011)Google Scholar
- 5.Cook, M., Colton, S., Raad, A., Gow, J.: Mechanic Miner: Reflection-Driven Game Mechanic Discovery and Level Design. In: IEEE Conference on Computational Intelligence in Games (CIG), pp. 284–293 (2013)Google Scholar
- 6.Lucas, S.: Investigating learning rates for evolution and temporal difference learning. In: IEEE Symposium on Computational Intelligence and Games, CIG 2008, pp. 1–7 (December 2008)Google Scholar
- 9.Pepels, T., Winands, M.: Enhancements for Monte-Carlo Tree Search in Ms Pac-Man. In: IEEE Conference on Computational Intelligence and Games (CIG), pp. 265–272 (2012)Google Scholar
- 10.Powley, E.J., Whitehouse, D., Cowling, P.I.: Bandits all the way down: UCB1 as a simulation policy in Monte Carlo Tree Search. In: IEEE Conference on Computational Intelligence in Games (CIG), pp. 81–88 (2013)Google Scholar
- 11.Robles, D., Rohlfshagen, P., Lucas, S.M.: Learning Non-Random Moves for Playing Othello: Improving Monte Carlo Tree Search. In: Proceedings IEEE Conf. Comput. Intell. Games, Seoul, pp. 305–312 (2011)Google Scholar
- 13.Silver, D., Sutton, R.S., Müller, M.: Sample-Based Learning and Search with Permanent and Transient Memories. In: Proceedings 25th Annu. Int. Conf. Mach. Learn., pp. 968–975, Helsinki (2008)Google Scholar
- 14.Sutton R., Barto, A.: Introduction to Reinforcement Learning. MIT Press (1998)Google Scholar
- 15.Togelius, J., Schmidhuber, J.: An Experiment in Automatic Game Design. In: IEEE Symposium on Computational Intelligence and Games, pp. 111–118 (2008)Google Scholar