Advertisement

Abalearn: A Risk-Sensitive Approach to Self-play Learning in Abalone

  • Pedro Campos
  • Thibault Langlois
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)

Abstract

This paper presents Abalearn, a self-teaching Abalone program capable of automatically reaching an intermediate level of play without needing expert-labeled training examples, deep searches or exposure to competent play.

Our approach is based on a reinforcement learning algorithm that is risk-seeking, since defensive players in Abalone tend to never end a game.

We show that it is the risk-sensitivity that allows a successful self-play training. We also propose a set of features that seem relevant for achieving a good level of play.

We evaluate our approach using a fixed heuristic opponent as a benchmark, pitting our agents against human players online and comparing samples of our agents at different times of training.

Keywords

Reinforcement Learn Algorithm Search Depth Greedy Policy Training Game Chess Program 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Aichholzer, O., Aurenhammer, F., Werner, T.: Algorithmic fun: Abalone. Technical report, Institut for Theoretical Computer Science, Graz University of Technology (2002)Google Scholar
  2. 2.
    Baxter, J., Tridgell, A., Weaver, L.: Knightcap: a chess program that learns by combining TD(λ) with game-tree search. In: Proc. 15th International Conf. on Machine Learning, pp. 28–36. Morgan Kaufmann, San Francisco (1998)Google Scholar
  3. 3.
    Baxter, J., Tridgell, A., Weaver, L.: Learning to play chess using temporal differences. Machine Learning 40(3), 243–263 (2000)zbMATHCrossRefGoogle Scholar
  4. 4.
    Beal, D.F., Smith, M.C.: Temporal difference learning for heuristic search and game playing. Information Sciences 122(1), 3–21 (2000)CrossRefGoogle Scholar
  5. 5.
    Dahl, F.A.: Honte, a go-playing program using neural nets (1999)Google Scholar
  6. 6.
    Michie, D.: Experiments on the mechanization of game-learning – part i. characterization of the model and its parameters. The Computer Journal 6, 232–236 (1963)Google Scholar
  7. 7.
    Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Machine Learning 49, 267–290 (2002)zbMATHCrossRefGoogle Scholar
  8. 8.
    Pollack, J.B., Blair, A.D.: Co-evolution in the successful learning of backgammon strategy. Machine Learning 32(1), 225–240 (1998)zbMATHCrossRefGoogle Scholar
  9. 9.
    Samuel, A.: Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3(3), 211–229 (1959)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Schaeffer, J., Hlynka, M., Jussila, V.: Temporal difference learning applied to a high-performance game-playing program. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 529–534 (2001)Google Scholar
  11. 11.
    Schraudolph, N., Dayan, P., Sejnowski, T.J.: Temporal difference learning of position evaluation in the game of go. In: Advances in Neural Information Processing Systems, vol. 6, Morgan Kaufmann Publishers, Inc., San Francisco (1994)Google Scholar
  12. 12.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction Reinforcement Reinforcement Learning: an Introduction, 1st edn. The MIT Press, Cambridge (1998)Google Scholar
  14. 14.
    Tesauro, G.: Practical issues in temporal difference learning. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4 (1992)Google Scholar
  15. 15.
    Tesauro, G.: Td-gammon, a self-teaching backgammon program, achieves masterlevel play. In: Proceedings of the AAAI Fall Symposium on Intelligent Games: Planning and Learning, pp. 19–23. The AAAI Press, Menlo Park (1993)Google Scholar
  16. 16.
    Tesauro, G.: Temporal difference learning and td-gammon. Communications of the ACM 38(3), 58–68 (1995)CrossRefGoogle Scholar
  17. 17.
    Tesauro, G.: Comments on co-evolution in the successful learning of backgammon strategy. Machine Learning 32(3), 41–243 (1998)CrossRefGoogle Scholar
  18. 18.
    Tesauro, G.: Programming backgammon using self-teaching neural nets. Artificial Intelligence 134, 181–199 (2002)zbMATHCrossRefGoogle Scholar
  19. 19.
    Thrun, S.: Learning to play the game of chess. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7, pp. 1069–1076. The MIT Press, Cambridge (1995)Google Scholar
  20. 20.
    Yoshioka, T., Ishii, S., Ito, M.: Strategy acquisition for the game othello based on reinforcement learning. IEICE Transactions on Inf. and Syst. 12(E82 D) (December 1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Pedro Campos
    • 1
  • Thibault Langlois
    • 1
    • 2
  1. 1.INESC-IDNeural Networks and Signal Processing GroupLisbonPortugal
  2. 2.Departamento de InformáticaFaculdade de Ciências da Universidade de LisboaLisbonPortugal

Personalised recommendations