Advertisement

Learning to Evaluate Go Positions via Temporal Difference Methods

  • N. N. Schraudolph
  • P. Dayan
  • T. J. Sejnowski
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 62)

Abstract

The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training neural networks to evaluate Go positions via temporal difference (TD) learning.

Keywords

Legal Move Move Generator Board Size Board Position Temporal Difference Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Rivest, R. (1993), invited talk, Conference on Computational Learning Theory and Natural Learning Systems, Provincetown, MA.Google Scholar
  2. [2]
    Johnson, G. (1997), “To test a powerful computer, play an ancient game,” The New York Times, July 29, http://www.cns.nyu.edu/-mechner/compgo/times/.
  3. [3]
    Mechner, D.A. (1998), “All systems go,” The Sciences, vol. 38, no. 1, pp. 32–37, http://www.cns.nyu.edu/-mchner/compgo/sciences/.
  4. [4]
    Fotland, D. (1993), “Knowledge representation in the Many Faces of Go,”ftp://www.joy.ne.jp/welcome/igs/Go/computer/mfg .Z.
  5. [5]
    Brügmann, B. (1993), “Monte Carlo Go,” ftp://www.joy.ne.jp/welcome/igs/Go/computer/mcgo.tex.Z.
  6. [6]
    Kirkpatrick, S., Gelatt Jr., C., and Vecchi, M. (1983), “Optimization by simulated annealing,” Science, vol. 220, pp. 671–680, reprinted in [25].Google Scholar
  7. [7]
    Stoutamire, D. (1991), “Machine learning applied to Go,” Master’s thesis, Case Western Reserve University, ftp://www.joy.ne.jp/welcome/igs/Go/computer/report.ps.Z.
  8. [8]
    Enderton, H.D. (1991), “The Golem Go program,” Tech. Rep. CMU-CS-92–101, Carnegie Mellon University, ftp://www.joy.ne.jp/welcome/igs/Go/computer/golem.sh.Z.
  9. [9]
    Sutton, R.S. and Barto, A.G. (1998), Reinforcement Learning: an Introduction, The MIT Press, Cambridge, MA.Google Scholar
  10. [10]
    Samuel, A.L. (1959), “Some studies in machine learning using the game of checkers,” IBM Journal of Research and Development, vol. 3, pp. 211–229.MathSciNetCrossRefGoogle Scholar
  11. [ 11 ]
    Watkins, C. (1989), Learning from Delayed Rewards, Ph.D. thesis, University of Cambridge, England.Google Scholar
  12. [ 12]
    Bertsekas, D.P. and Tsitsiklis, J.N. (1996), Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.Google Scholar
  13. [13]
    Tesauro, G. (1992), “Practical issues in temporal difference learning,” Machine Learning, vol. 8, p. 257.MATHGoogle Scholar
  14. [14]
    Robertie, B. (1992), “Carbon versus silicon: matching wits with TD-Gammon,” Inside Backgammon, vol. 2, no. 2, pp. 14–22.Google Scholar
  15. [ 15]
    Tesauro, G. (1994), “TD-gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation, vol. 6, no. 2, pp. 215–219.Google Scholar
  16. [16]
    Geman, S. and Geman, D. (1984), “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, reprinted in [25].Google Scholar
  17. [ 17]
    Newman, W.H. (1988), “Wally, a Go playing program,” f tp: // www.joy.ne.jp/welcome/igs/Go/computer/wally.sh.Z.Google Scholar
  18. [18]
    Dayan, P. (1993), “Improving generalization for temporal difference learning: the successor representation,” Neural Computation, vol. 5, no. 4, pp. 613–624.CrossRefGoogle Scholar
  19. [19]
    LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., and Jackel, L. (1989), “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, pp. 541–551CrossRefGoogle Scholar
  20. [20]
    Fukushima, K., Miyake, S., and Ito, T. (1983), “Neocognitron: a neural network model for a mechanism of visual pattern recognition,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 13, reprinted in [25].Google Scholar
  21. [21]
    Fotland, D. (1994), personal communication.Google Scholar
  22. [22]
    Enzensberger, M. (1996), “The integration of a priori knowledge into a Go playing neural network,” http://www.cgl.ucsf.edu/go/Programs/neurogo-html/NeuroGo.html.
  23. [23]
    Dahl, F.A. (1999), “Honte, a Go-playing program using neural nets,” http://www.ai.univie.ac.at/icm1-99-ws-games/papers/dahl.ps.gz.
  24. [24]
    Schraudolph, N.N., Dayan, P., and Sejnowski, T.J. (1994), “Temporal difference learning of position evaluation in the game of Go,” in Cowan, J.D., Tesauro, G., and Alspector, J. (Eds.), Advances in Neural Information Processing Systems, vol. 6, pp. 817–824, Morgan Kaufmann, San Francisco.Google Scholar
  25. [25]
    Anderson, J. and Rosenfeld, E. (Eds.) (1988), Neurocomputing: Foundations of Research, MIT Press, Cambridge.Google Scholar

Copyright information

© Physica-Verlag Heidelberg 2001

Authors and Affiliations

  • N. N. Schraudolph
  • P. Dayan
  • T. J. Sejnowski

There are no affiliations available

Personalised recommendations