Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello
We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: (i) generalization performance or expected utility, (ii) average results against a hand-crafted heuristic and (iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance measure characterizes player’s performance in the context of opponents of various strength. The multi-criteria analysis reveals that although the generalization performance of players produced by the two algorithms is similar, TDL is much better at playing against strong opponents, while CEL copes better against weak ones. We also find out that the TDL produces less diverse strategies than CEL. Our results confirms the usefulness of performance profiles as a tool for comparison of learning algorithms for games.
KeywordsReinforcement learning Coevolutionary algorithm Reversi Othello Board evaluation function Weighted piece counter Interactive domain
Unable to display preview. Download preview PDF.
- 1.Lucas, S.M., Runarsson, T.P.: Temporal difference learning versus co-evolution for acquiring othello position evaluation. In: IEEE Symposium on Computational Intelligence and Games, 52–59 IEEE (2006)Google Scholar
- 4.Axelrod, R.: The evolution of strategies in the iterated prisoner’s dilemma. In: Davis, L., (ed.) Genetic Algorithms in Simulated Annealing, London pp. 32–41 (1987)Google Scholar
- 5.Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine learning 3(1), 9–44 (1988)Google Scholar
- 6.Sutton, R., Barto, A.: Reinforcement learning, Vol. 9. MIT Press (1998)Google Scholar
- 8.Lucas, S.M.: Learning to play Othello with N-tuple systems. Australian Journal of Intelligent Information Processing Systems, Special Issue on Game Technology 9(4), 01–20 (2007)Google Scholar
- 9.Darwen, P.J.: Why co-evolution beats temporal difference learning at backgammon for a linear architecture, but not a non-linear architecture. In: Proceedings of the 2001 Congress on Evolutionary Computation, Vol. 2, pp. 1003–1010. IEEE (2001)Google Scholar
- 10.Jaśkowski, W., Liskowski, P., Szubert, M., Krawiec, K.: Improving coevolution by random sampling. In: Blum, C. (ed.) GECCO’13: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation, pp. 1141–1148. ACM, Amsterdam (2013)Google Scholar
- 11.Popovici, E., Bucci, A., Wiegand, R.P., de Jong, E.D.: Coevolutionary Principles. In: Handbook of Natural Computing. Springer (2011)Google Scholar
- 15.Baker, J.E.: Reducing bias and inefficiency in the selection algorithms (1985)Google Scholar
- 17.Szubert, M., Jaśkowski, W., Krawiec, K.: Coevolutionary temporal difference learning for othello. In: IEEE Symposium on Computational Intelligence and Games, Milano, Italy, pp. 104–111 (2009)Google Scholar
- 18.Samothrakis, S., Lucas, S., Runarsson, T., Robles, D.: Coevolving Game-Playing Agents: Measuring Performance and Intransitivities. IEEE Transactions on Evolutionary Computation 99, 1–15 (2012)Google Scholar