Advertisement

Approximate Dynamic Programming by Practical Examples

  • Martijn R. K. MesEmail author
  • Arturo Pérez Rivera
Chapter
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 248)

Abstract

Computing the exact solution of an MDP model is generally difficult and possibly intractable for realistically sized problem instances. A powerful technique to solve the large scale discrete time multistage stochastic control processes is Approximate Dynamic Programming (ADP). Although ADP is used as an umbrella term for a broad spectrum of methods to approximate the optimal solution of MDPs, the common denominator is typically to combine optimization with simulation, use approximations of the optimal values of the Bellman’s equations, and use approximate policies. This chapter aims to present and illustrate the basics of these steps by a number of practical and instructive examples. We use three examples (1) to explain the basics of ADP, relying on value iteration with an approximation of the value functions, (2) to provide insight into implementation issues, and (3) to provide test cases for the reader to validate its own ADP implementations.

Key words

Dynamic programming Approximate dynamic programming Stochastic optimization Monte Carlo simulation Curse of dimensionality 

References

  1. 1.
    R. Bellman, Dynamic Programming, 1st edn. (Princeton University Press, Princeton, NJ, 1957)Google Scholar
  2. 2.
    D.P.D. Farias, B.V. Roy, On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29 (3), 462–478 (2004)CrossRefGoogle Scholar
  3. 3.
    A.P. George, W.B. Powell, Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Mach. Learn. 65 (1), 167–198 (2006)CrossRefGoogle Scholar
  4. 4.
    A.P. George, W.B. Powell, S.R. Kulkarni, S. Mahadevan, Value function approximation using multiple aggregation for multiattribute resource management. J. Mach. Learn. Res. 9, 2079–2111 (2008)Google Scholar
  5. 5.
    T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning. Springer Series in Statistics (Springer, New York, NY, 2001)Google Scholar
  6. 6.
    P.J.H. Hulshof, M.R.K. Mes, R.J. Boucherie, E.W. Hans, Patient admission planning using approximate dynamic programming. Flex. Serv. Manuf. J. 28 (1), 30–61 (2016)CrossRefGoogle Scholar
  7. 7.
    D.R. Jiang, T.V. Pham, W.B. Powell, D.F. Salas, W.R. Scott, A comparison of approximate dynamic programming techniques on benchmark energy storage problems: does anything work?, in IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2014, pp. 1–8Google Scholar
  8. 8.
    M.R.K. Mes, W.B. Powell, P.I. Frazier, Hierarchical knowledge gradient for sequential sampling. J. Mach. Learn. Res. 12, 2931–2974 (2011)Google Scholar
  9. 9.
    A. Pérez Rivera, M.R.K. Mes, Dynamic multi-period freight consolidation, in Computational Logistics, ed. by F. Corman, S. Voß, R.R. Negenborn. Lecture Notes in Computer Science, vol. 9335 (Springer, Cham, 2015), pp. 370–385Google Scholar
  10. 10.
    W.B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley Series in Probability and Statistics (Wiley, London, 2011)Google Scholar
  11. 11.
    W.B. Powell, Perspectives of approximate dynamic programming. Ann. Oper. Res. 241 (1), 319–356 (2012)Google Scholar
  12. 12.
    W.B. Powell, Clearing the jungle of stochastic optimization, in Informs Tutorials in Operations Research, chap. 4 (INFORMS, Hanover, MD, 2014), pp. 109–137Google Scholar
  13. 13.
    W.B. Powell, I.O. Ryzhov, Optimal Learning and Approximate Dynamic Programming (Wiley, London, 2013), pp. 410–431Google Scholar
  14. 14.
    W.B. Powell, H.P. Simao, B. Bouzaiene-Ayari, Approximate dynamic programming in transportation and logistics: a unified framework. EURO J. Transp. Logist. 1 (3), 237–284 (2012)CrossRefGoogle Scholar
  15. 15.
    I.O. Ryzhov, W.B. Powell, Approximate dynamic programming with correlated bayesian beliefs, in Proceedings of the 48th Allerton Conference on Communication, Control and Computing (2010)Google Scholar
  16. 16.
    R.S. Sutton, A.G. Barto, Introduction to Reinforcement Learning, 1st edn. (MIT Press, Cambridge, MA, 1998)Google Scholar
  17. 17.
    J.N. Tsitsiklis, B. Roy, Feature-based methods for large scale dynamic programming. Mach. Learn. 22 (1), 59–94 (1996)Google Scholar
  18. 18.
    W. van Heeswijk, M.R.K. Mes, M. Schutten, An approximate dynamic programming approach to urban freight distribution with batch arrivals, in Computational Logistics, ed. by F. Corman, S. Voß, R.R. Negenborn. Lecture Notes in Computer Science, vol. 9335 (Springer, Cham, 2015), pp. 61–75Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Industrial Engineering and Business Information SystemsUniversity of TwenteEnschedeThe Netherlands

Personalised recommendations