Skip to main content

Play Selection in American Football: A Case Study in Neuro-Dynamic Programming

  • Chapter

Part of the book series: Operations Research/Computer Science Interfaces Series ((ORCS,volume 9))

Abstract

We present a computational case study of neuro-dynamic programming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration.

The algorithms we consider evolve as a sequence of approximate policy evaluations and policy updates. An (exact) evaluation amounts to the computation of the rewardto-go function associated with the policy in question. Approximations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for estimating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropagation.

The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, although we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barto, A. G., Bradtke, S. J., and Singh, S. P. (1995). Learning to act using real-time dynamic programming Artificial Intelligence, 72: 81–138.

    Article  Google Scholar 

  2. Bertsekas, D. P. (1995). A counterexample to temporal differences learning. Neural Computation, 7: 270–279.

    Article  Google Scholar 

  3. Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control, volume I and I I. Athena Scientific, Belmont, MA.

    Google Scholar 

  4. Bertsekas, D. R and Tsitsiklis, J. N. (1991). Analysis of Stochastic Shortest Path Problems. Mathematics of Operations Research, 16: 580–595.

    Article  Google Scholar 

  5. Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.

    Google Scholar 

  6. Haykin, S. (1996). Neural Networks, a comprehensive foundation. Macmillan, New York.

    Google Scholar 

  7. Hertz, J. A., Krogh, A., and Palmer, R. G. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley, Reading, MA.

    Google Scholar 

  8. Puterman, M. L. (1994). Markovian Decision Problems. Wiley, New York.

    Book  Google Scholar 

  9. Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.

    Google Scholar 

  10. Schweitzer, P. J. and Seidmann, A. (1985). Generalized polynommial approximations in markov decision processes. Journal of Mathematical Analysis and Applications, 110: 568–582.

    Article  Google Scholar 

  11. Sutton, R. S. (1988). Learning to Predict by the Methods of Temporal Differences. Machine Learning, 3: 9–44.

    Google Scholar 

  12. Tesauro, G. J. (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, 38: 58–68.

    Article  Google Scholar 

  13. Werbos, P. J. (1992). Handbook of Intelligent Control, chapter Approximate Dynamic Programming for Real-Time Control and Neural Modeling. Van Nostrand, New York. (eds. D. A. White and D. A. Sofge).

    Google Scholar 

  14. White, D. J. (1969). Dynamic Programming. Holden-Day.

    Google Scholar 

  15. Whitt, W. (1978). Approximations of dynamic programs i. Mathematics of Operations Research, 3: 231–243.

    Article  Google Scholar 

  16. Whitt, W. (1979). Approximations of dynamic programs ii. Mathematics of Operations Research, 4: 179–185.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Patek, S.D., Bertsekas, D.P. (1998). Play Selection in American Football: A Case Study in Neuro-Dynamic Programming. In: Woodruff, D.L. (eds) Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search. Operations Research/Computer Science Interfaces Series, vol 9. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-2807-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-2807-1_7

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5023-9

  • Online ISBN: 978-1-4757-2807-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics