Play Selection in American Football: A Case Study in Neuro-Dynamic Programming

Patek, Stephen D.; Bertsekas, Dimitri P.

doi:10.1007/978-1-4757-2807-1_7

Play Selection in American Football: A Case Study in Neuro-Dynamic Programming

Stephen D. Patek³ &
Dimitri P. Bertsekas³

Chapter

267 Accesses
2 Citations

Part of the book series: Operations Research/Computer Science Interfaces Series ((ORCS,volume 9))

Abstract

We present a computational case study of neuro-dynamic programming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration.

The algorithms we consider evolve as a sequence of approximate policy evaluations and policy updates. An (exact) evaluation amounts to the computation of the rewardto-go function associated with the policy in question. Approximations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for estimating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropagation.

The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, although we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barto, A. G., Bradtke, S. J., and Singh, S. P. (1995). Learning to act using real-time dynamic programming Artificial Intelligence, 72: 81–138.
Article Google Scholar
Bertsekas, D. P. (1995). A counterexample to temporal differences learning. Neural Computation, 7: 270–279.
Article Google Scholar
Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control, volume I and I I. Athena Scientific, Belmont, MA.
Google Scholar
Bertsekas, D. R and Tsitsiklis, J. N. (1991). Analysis of Stochastic Shortest Path Problems. Mathematics of Operations Research, 16: 580–595.
Article Google Scholar
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
Google Scholar
Haykin, S. (1996). Neural Networks, a comprehensive foundation. Macmillan, New York.
Google Scholar
Hertz, J. A., Krogh, A., and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA.
Google Scholar
Puterman, M. L. (1994). Markovian Decision Problems. Wiley, New York.
Book Google Scholar
Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.
Google Scholar
Schweitzer, P. J. and Seidmann, A. (1985). Generalized polynommial approximations in markov decision processes. Journal of Mathematical Analysis and Applications, 110: 568–582.
Article Google Scholar
Sutton, R. S. (1988). Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3: 9–44.
Google Scholar
Tesauro, G. J. (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, 38: 58–68.
Article Google Scholar
Werbos, P. J. (1992). Handbook of Intelligent Control, chapter Approximate Dynamic Programming for Real-Time Control and Neural Modeling. Van Nostrand, New York. (eds. D. A. White and D. A. Sofge).
Google Scholar
White, D. J. (1969). Dynamic Programming. Holden-Day.
Google Scholar
Whitt, W. (1978). Approximations of dynamic programs i. Mathematics of Operations Research, 3: 231–243.
Article Google Scholar
Whitt, W. (1979). Approximations of dynamic programs ii. Mathematics of Operations Research, 4: 179–185.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02139, USA
Stephen D. Patek & Dimitri P. Bertsekas

Authors

Stephen D. Patek
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri P. Bertsekas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of California, Davis, USA
David L. Woodruff

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Patek, S.D., Bertsekas, D.P. (1998). Play Selection in American Football: A Case Study in Neuro-Dynamic Programming. In: Woodruff, D.L. (eds) Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search. Operations Research/Computer Science Interfaces Series, vol 9. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-2807-1_7

Download citation

DOI: https://doi.org/10.1007/978-1-4757-2807-1_7
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5023-9
Online ISBN: 978-1-4757-2807-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics