Play Selection in American Football: A Case Study in Neuro-Dynamic Programming
We present a computational case study of neuro-dynamic programming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration.
The algorithms we consider evolve as a sequence of approximate policy evaluations and policy updates. An (exact) evaluation amounts to the computation of the rewardto-go function associated with the policy in question. Approximations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for estimating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropagation.
The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, although we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.
KeywordsAMERICAN Football Policy Iteration Poisson Random Variable Sample Trajectory Opposing Team
Unable to display preview. Download preview PDF.
- Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control, volume I and I I. Athena Scientific, Belmont, MA.Google Scholar
- Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.Google Scholar
- Haykin, S. (1996). Neural Networks, a comprehensive foundation. Macmillan, New York.Google Scholar
- Hertz, J. A., Krogh, A., and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA.Google Scholar
- Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.Google Scholar
- Sutton, R. S. (1988). Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3: 9–44.Google Scholar
- Werbos, P. J. (1992). Handbook of Intelligent Control, chapter Approximate Dynamic Programming for Real-Time Control and Neural Modeling. Van Nostrand, New York. (eds. D. A. White and D. A. Sofge).Google Scholar
- White, D. J. (1969). Dynamic Programming. Holden-Day.Google Scholar