Abalearn: A Risk-Sensitive Approach to Self-play Learning in Abalone
This paper presents Abalearn, a self-teaching Abalone program capable of automatically reaching an intermediate level of play without needing expert-labeled training examples, deep searches or exposure to competent play.
Our approach is based on a reinforcement learning algorithm that is risk-seeking, since defensive players in Abalone tend to never end a game.
We show that it is the risk-sensitivity that allows a successful self-play training. We also propose a set of features that seem relevant for achieving a good level of play.
We evaluate our approach using a fixed heuristic opponent as a benchmark, pitting our agents against human players online and comparing samples of our agents at different times of training.
KeywordsReinforcement Learn Algorithm Search Depth Greedy Policy Training Game Chess Program
- 1.Aichholzer, O., Aurenhammer, F., Werner, T.: Algorithmic fun: Abalone. Technical report, Institut for Theoretical Computer Science, Graz University of Technology (2002)Google Scholar
- 2.Baxter, J., Tridgell, A., Weaver, L.: Knightcap: a chess program that learns by combining TD(λ) with game-tree search. In: Proc. 15th International Conf. on Machine Learning, pp. 28–36. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 5.Dahl, F.A.: Honte, a go-playing program using neural nets (1999)Google Scholar
- 6.Michie, D.: Experiments on the mechanization of game-learning – part i. characterization of the model and its parameters. The Computer Journal 6, 232–236 (1963)Google Scholar
- 10.Schaeffer, J., Hlynka, M., Jussila, V.: Temporal difference learning applied to a high-performance game-playing program. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 529–534 (2001)Google Scholar
- 11.Schraudolph, N., Dayan, P., Sejnowski, T.J.: Temporal difference learning of position evaluation in the game of go. In: Advances in Neural Information Processing Systems, vol. 6, Morgan Kaufmann Publishers, Inc., San Francisco (1994)Google Scholar
- 12.Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
- 13.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction Reinforcement Reinforcement Learning: an Introduction, 1st edn. The MIT Press, Cambridge (1998)Google Scholar
- 14.Tesauro, G.: Practical issues in temporal difference learning. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4 (1992)Google Scholar
- 15.Tesauro, G.: Td-gammon, a self-teaching backgammon program, achieves masterlevel play. In: Proceedings of the AAAI Fall Symposium on Intelligent Games: Planning and Learning, pp. 19–23. The AAAI Press, Menlo Park (1993)Google Scholar
- 19.Thrun, S.: Learning to play the game of chess. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7, pp. 1069–1076. The MIT Press, Cambridge (1995)Google Scholar
- 20.Yoshioka, T., Ishii, S., Ito, M.: Strategy acquisition for the game othello based on reinforcement learning. IEICE Transactions on Inf. and Syst. 12(E82 D) (December 1999)Google Scholar