TD-Gammon: A Self-Teaching Backgammon Program

Tesauro, Gerald

doi:10.1007/978-1-4757-2379-3_11

Gerald Tesauro²

490 Accesses
30 Citations

Abstract

This chapter describes TD-Gammon, a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results. TD-Gammon uses a recently proposed reinforcement learning algorithm called TD(λ) (Sutton, 1988), and is apparently the first application of this algorithm to a complex nontrivial task. Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e. given only a “raw” description of the board state), the network learns to play the entire game at a strong intermediate level that surpasses not only conventional commercial programs, but also comparable networks trained via supervised learning on a large corpus of human expert games. The hidden units in the network have apparently discovered useful features, a longstanding goal of computer games research.

Furthermore, when a set of hand-crafted features is added to the network’s input representation, the result is a truly staggering level of performance: TD-Gammon is now estimated to play at a strong master level that is extremely close to the world’s best human players. We discuss possible principles underlying the success of TD-Gammon, and the prospects for successful real-world applications of TD learning in other domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Berliner, “Computer backgammon.” Scientific American 243:1, 64–72 (1980).
Article Google Scholar
D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models. En-glewood Cliffs NJ: Prentice Hall (1987).
MATH Google Scholar
J. Christensen and R. Korf, “A unified theory of heuristic evaluation functions and its application to learning.” Proc. of AAAI-86, 148-152 (1986).
Google Scholar
P. Dayan, ‘The convergence of TD(λ) for general λ.” Machine Learning 8, 341–362 (1992).
MATH Google Scholar
P. W. Frey, “Algorithmic strategies for improving the performance of game playing programs.” In: D. Farmer et al. (Eds.), Evolution, Games and Learning. Amsterdam: North Holland (1986).
Google Scholar
A. K. Griffith, “A comparison and evaluation of three machine learning procedures as applied to the game of checkers.” Artificial Intelligence 5, 137–148 (1974).
Article MATH Google Scholar
K. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal approximators.” Neural Networks 2, 359–366 (1989).
Article Google Scholar
K.-F. Lee and S. Majahan, “A pattern classification approach to evaluation function learning.” Artificial Intelligence 36, 1–25 (1988).
Article Google Scholar
P. Magriel, Backgammon. New York: Times Books (1976).
Google Scholar
M. L. Minsky and S. A. Papert, Perceptrons. Cambridge MA: MIT Press (1969). (Republished as an expanded edition in 1988).
MATH Google Scholar
D. H. Mitchell, “Using features to evaluate positions in experts’ and novices’ Othello games.” Master’s Thesis, Northwestern Univ., Evanston IL (1984).
Google Scholar
J. R. Quinlan, “Learning efficient classification procedures and their application to chess end games.” In: R. S. Michalski, J. G. Carbonell and T. M. Mitchell (Eds.), Machine Learning. Palo Alto CA: Tioga (1983).
Google Scholar
B. Robertie, Advanced Backgammon. Arlington MA: Gammon Press (1991).
Google Scholar
B. Robertie, “Carbon versus silicon: matching wits with TD-Gammon.” Inside Backgammon 2:2, 14–22 (1992).
Google Scholar
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representation by error propagation.” In D. Rumelhart and J. McClelland (Eds.), Parallel Distributed Processing, Vol. 1. Cambridge MA: MIT Press (1986).
Google Scholar
A. Samuel, “Some studies in machine learning using the game of checkers.” IBM J. of Research and Development 3, 210–229 (1959).
Article MathSciNet Google Scholar
A. Samuel, “Some studies in machine learning using the game of checkers, II — recent progress.” IBM J. of Research and Development 11, 601–617 (1967).
Article Google Scholar
R. S. Sutton, “Temporal credit assignment in reinforcement learning.” Ph. D. Thesis, Univ. of Massachusetts, Amherst MA (1984).
Google Scholar
R. S. Sutton, “Learning to predict by the methods of temporal differences.” Machine Learning 3, 9–44 (1988).
Google Scholar
G. Tesauro and T. J. Sejnowski, “A parallel network that learns to play backgammon.” Artificial Intelligence 39, 357–390 (1989).
Article MATH Google Scholar
G. Tesauro, “Connectionist learning of expert preferences by comparison training.” In D. Touretzky (Ed.), Advances in Neural Information Processing 1, 99–106. San Mateo, CA: Morgan Kauffmann (1989).
Google Scholar
G. Tesauro, “Neurogammon: a neural network backgammon program.” IJCNN Proceedings III, 33–39 (1990).
Google Scholar
G. Tesauro, “Practical issues in temporal difference learning.” Machine Learning 8, 257–277 (1992).
MATH Google Scholar
N. Zadeh and G. Kobliska, “On optimal doubling in backgammon.” Management Science 23, 853–858 (1977).
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

IBM Thomas J. Watson Research Center, P. O. Box 704, Yorktown Heights, NY, 10598, USA
Gerald Tesauro

Authors

Gerald Tesauro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The University of Edinburgh, UK
Alan F. Murray

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tesauro, G. (1995). TD-Gammon: A Self-Teaching Backgammon Program. In: Murray, A.F. (eds) Applications of Neural Networks. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-2379-3_11

Download citation

DOI: https://doi.org/10.1007/978-1-4757-2379-3_11
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5140-3
Online ISBN: 978-1-4757-2379-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics