Skip to main content

TD-Gammon: A Self-Teaching Backgammon Program

  • Chapter
Applications of Neural Networks

Abstract

This chapter describes TD-Gammon, a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results. TD-Gammon uses a recently proposed reinforcement learning algorithm called TD(λ) (Sutton, 1988), and is apparently the first application of this algorithm to a complex nontrivial task. Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e. given only a “raw” description of the board state), the network learns to play the entire game at a strong intermediate level that surpasses not only conventional commercial programs, but also comparable networks trained via supervised learning on a large corpus of human expert games. The hidden units in the network have apparently discovered useful features, a longstanding goal of computer games research.

Furthermore, when a set of hand-crafted features is added to the network’s input representation, the result is a truly staggering level of performance: TD-Gammon is now estimated to play at a strong master level that is extremely close to the world’s best human players. We discuss possible principles underlying the success of TD-Gammon, and the prospects for successful real-world applications of TD learning in other domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • H. Berliner, “Computer backgammon.” Scientific American 243:1, 64–72 (1980).

    Article  Google Scholar 

  • D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models. En-glewood Cliffs NJ: Prentice Hall (1987).

    MATH  Google Scholar 

  • J. Christensen and R. Korf, “A unified theory of heuristic evaluation functions and its application to learning.” Proc. of AAAI-86, 148-152 (1986).

    Google Scholar 

  • P. Dayan, ‘The convergence of TD(λ) for general λ.” Machine Learning 8, 341–362 (1992).

    MATH  Google Scholar 

  • P. W. Frey, “Algorithmic strategies for improving the performance of game playing programs.” In: D. Farmer et al. (Eds.), Evolution, Games and Learning. Amsterdam: North Holland (1986).

    Google Scholar 

  • A. K. Griffith, “A comparison and evaluation of three machine learning procedures as applied to the game of checkers.” Artificial Intelligence 5, 137–148 (1974).

    Article  MATH  Google Scholar 

  • K. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal approximators.” Neural Networks 2, 359–366 (1989).

    Article  Google Scholar 

  • K.-F. Lee and S. Majahan, “A pattern classification approach to evaluation function learning.” Artificial Intelligence 36, 1–25 (1988).

    Article  Google Scholar 

  • P. Magriel, Backgammon. New York: Times Books (1976).

    Google Scholar 

  • M. L. Minsky and S. A. Papert, Perceptrons. Cambridge MA: MIT Press (1969). (Republished as an expanded edition in 1988).

    MATH  Google Scholar 

  • D. H. Mitchell, “Using features to evaluate positions in experts’ and novices’ Othello games.” Master’s Thesis, Northwestern Univ., Evanston IL (1984).

    Google Scholar 

  • J. R. Quinlan, “Learning efficient classification procedures and their application to chess end games.” In: R. S. Michalski, J. G. Carbonell and T. M. Mitchell (Eds.), Machine Learning. Palo Alto CA: Tioga (1983).

    Google Scholar 

  • B. Robertie, Advanced Backgammon. Arlington MA: Gammon Press (1991).

    Google Scholar 

  • B. Robertie, “Carbon versus silicon: matching wits with TD-Gammon.” Inside Backgammon 2:2, 14–22 (1992).

    Google Scholar 

  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representation by error propagation.” In D. Rumelhart and J. McClelland (Eds.), Parallel Distributed Processing, Vol. 1. Cambridge MA: MIT Press (1986).

    Google Scholar 

  • A. Samuel, “Some studies in machine learning using the game of checkers.” IBM J. of Research and Development 3, 210–229 (1959).

    Article  MathSciNet  Google Scholar 

  • A. Samuel, “Some studies in machine learning using the game of checkers, II — recent progress.” IBM J. of Research and Development 11, 601–617 (1967).

    Article  Google Scholar 

  • R. S. Sutton, “Temporal credit assignment in reinforcement learning.” Ph. D. Thesis, Univ. of Massachusetts, Amherst MA (1984).

    Google Scholar 

  • R. S. Sutton, “Learning to predict by the methods of temporal differences.” Machine Learning 3, 9–44 (1988).

    Google Scholar 

  • G. Tesauro and T. J. Sejnowski, “A parallel network that learns to play backgammon.” Artificial Intelligence 39, 357–390 (1989).

    Article  MATH  Google Scholar 

  • G. Tesauro, “Connectionist learning of expert preferences by comparison training.” In D. Touretzky (Ed.), Advances in Neural Information Processing 1, 99–106. San Mateo, CA: Morgan Kauffmann (1989).

    Google Scholar 

  • G. Tesauro, “Neurogammon: a neural network backgammon program.” IJCNN Proceedings III, 33–39 (1990).

    Google Scholar 

  • G. Tesauro, “Practical issues in temporal difference learning.” Machine Learning 8, 257–277 (1992).

    MATH  Google Scholar 

  • N. Zadeh and G. Kobliska, “On optimal doubling in backgammon.” Management Science 23, 853–858 (1977).

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer Science+Business Media New York

About this chapter

Cite this chapter

Tesauro, G. (1995). TD-Gammon: A Self-Teaching Backgammon Program. In: Murray, A.F. (eds) Applications of Neural Networks. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-2379-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-2379-3_11

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5140-3

  • Online ISBN: 978-1-4757-2379-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics