Abstract
We present a computational case study of neuro-dynamic programming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration.
The algorithms we consider evolve as a sequence of approximate policy evaluations and policy updates. An (exact) evaluation amounts to the computation of the rewardto-go function associated with the policy in question. Approximations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for estimating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropagation.
The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, although we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Barto, A. G., Bradtke, S. J., and Singh, S. P. (1995). Learning to act using real-time dynamic programming Artificial Intelligence, 72: 81–138.
Bertsekas, D. P. (1995). A counterexample to temporal differences learning. Neural Computation, 7: 270–279.
Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control, volume I and I I. Athena Scientific, Belmont, MA.
Bertsekas, D. R and Tsitsiklis, J. N. (1991). Analysis of Stochastic Shortest Path Problems. Mathematics of Operations Research, 16: 580–595.
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
Haykin, S. (1996). Neural Networks, a comprehensive foundation. Macmillan, New York.
Hertz, J. A., Krogh, A., and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA.
Puterman, M. L. (1994). Markovian Decision Problems. Wiley, New York.
Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.
Schweitzer, P. J. and Seidmann, A. (1985). Generalized polynommial approximations in markov decision processes. Journal of Mathematical Analysis and Applications, 110: 568–582.
Sutton, R. S. (1988). Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3: 9–44.
Tesauro, G. J. (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, 38: 58–68.
Werbos, P. J. (1992). Handbook of Intelligent Control, chapter Approximate Dynamic Programming for Real-Time Control and Neural Modeling. Van Nostrand, New York. (eds. D. A. White and D. A. Sofge).
White, D. J. (1969). Dynamic Programming. Holden-Day.
Whitt, W. (1978). Approximations of dynamic programs i. Mathematics of Operations Research, 3: 231–243.
Whitt, W. (1979). Approximations of dynamic programs ii. Mathematics of Operations Research, 4: 179–185.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media New York
About this chapter
Cite this chapter
Patek, S.D., Bertsekas, D.P. (1998). Play Selection in American Football: A Case Study in Neuro-Dynamic Programming. In: Woodruff, D.L. (eds) Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search. Operations Research/Computer Science Interfaces Series, vol 9. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-2807-1_7
Download citation
DOI: https://doi.org/10.1007/978-1-4757-2807-1_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5023-9
Online ISBN: 978-1-4757-2807-1
eBook Packages: Springer Book Archive