Learning to Evaluate Go Positions via Temporal Difference Methods
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactions that make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training neural networks to evaluate Go positions via temporal difference (TD) learning.
KeywordsLegal Move Move Generator Board Size Board Position Temporal Difference Learning
Unable to display preview. Download preview PDF.
- Rivest, R. (1993), invited talk, Conference on Computational Learning Theory and Natural Learning Systems, Provincetown, MA.Google Scholar
- Johnson, G. (1997), “To test a powerful computer, play an ancient game,” The New York Times, July 29, http://www.cns.nyu.edu/-mechner/compgo/times/.
- Mechner, D.A. (1998), “All systems go,” The Sciences, vol. 38, no. 1, pp. 32–37, http://www.cns.nyu.edu/-mchner/compgo/sciences/.
- Fotland, D. (1993), “Knowledge representation in the Many Faces of Go,”ftp://www.joy.ne.jp/welcome/igs/Go/computer/mfg .Z.
- Brügmann, B. (1993), “Monte Carlo Go,” ftp://www.joy.ne.jp/welcome/igs/Go/computer/mcgo.tex.Z.
- Kirkpatrick, S., Gelatt Jr., C., and Vecchi, M. (1983), “Optimization by simulated annealing,” Science, vol. 220, pp. 671–680, reprinted in .Google Scholar
- Stoutamire, D. (1991), “Machine learning applied to Go,” Master’s thesis, Case Western Reserve University, ftp://www.joy.ne.jp/welcome/igs/Go/computer/report.ps.Z.
- Enderton, H.D. (1991), “The Golem Go program,” Tech. Rep. CMU-CS-92–101, Carnegie Mellon University, ftp://www.joy.ne.jp/welcome/igs/Go/computer/golem.sh.Z.
- Sutton, R.S. and Barto, A.G. (1998), Reinforcement Learning: an Introduction, The MIT Press, Cambridge, MA.Google Scholar
- [ 11 ]Watkins, C. (1989), Learning from Delayed Rewards, Ph.D. thesis, University of Cambridge, England.Google Scholar
- [ 12]Bertsekas, D.P. and Tsitsiklis, J.N. (1996), Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.Google Scholar
- Robertie, B. (1992), “Carbon versus silicon: matching wits with TD-Gammon,” Inside Backgammon, vol. 2, no. 2, pp. 14–22.Google Scholar
- [ 15]Tesauro, G. (1994), “TD-gammon, a self-teaching backgammon program, achieves master-level play,” Neural Computation, vol. 6, no. 2, pp. 215–219.Google Scholar
- Geman, S. and Geman, D. (1984), “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, reprinted in .Google Scholar
- [ 17]
- Fukushima, K., Miyake, S., and Ito, T. (1983), “Neocognitron: a neural network model for a mechanism of visual pattern recognition,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 13, reprinted in .Google Scholar
- Fotland, D. (1994), personal communication.Google Scholar
- Enzensberger, M. (1996), “The integration of a priori knowledge into a Go playing neural network,” http://www.cgl.ucsf.edu/go/Programs/neurogo-html/NeuroGo.html.
- Dahl, F.A. (1999), “Honte, a Go-playing program using neural nets,” http://www.ai.univie.ac.at/icm1-99-ws-games/papers/dahl.ps.gz.
- Schraudolph, N.N., Dayan, P., and Sejnowski, T.J. (1994), “Temporal difference learning of position evaluation in the game of Go,” in Cowan, J.D., Tesauro, G., and Alspector, J. (Eds.), Advances in Neural Information Processing Systems, vol. 6, pp. 817–824, Morgan Kaufmann, San Francisco.Google Scholar
- Anderson, J. and Rosenfeld, E. (Eds.) (1988), Neurocomputing: Foundations of Research, MIT Press, Cambridge.Google Scholar