One-Step Improvement Ideas and Computational Aspects

  • Henk TijmsEmail author
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 248)


In this contribution we give a down-to-earth discussion on basic ideas for solving practical Markov decision problems. The emphasis is on the concept of the policy-improvement step for average cost optimization. This concept provides a flexible method of improving a given policy. By appropriately designing the policy-improvement step in specific applications, tailor-made algorithms may be developed to generate the best control rule within a class of control rules characterized by a few parameters. Also, in decision problems with an intractable multi-dimensional state space, decomposition and a once-only application of the policy-improvement step may lead to a good heuristic rule. These useful features of the policy-improvement concept will be illustrated with a queueing control problem with variable service rate and with the dynamic routing of arrivals to parallel queues. In the final section, we deal with the concept of the one-stage-look-ahead rule in optimal stopping and give several applications.


  1. 1.
    S. Bhulai, G. Koole, On the structure of value functions for threshold policies in queueing models. J. Appl. Probab. 40, 613–622 (2003)CrossRefGoogle Scholar
  2. 2.
    W.M. Boyce, On a simple stopping problem. Discret. Math. 5, 297–312 (1973)CrossRefGoogle Scholar
  3. 3.
    E.V. Denardo, Dynamic Programming (Prentice-Hall, Englewood Cliffs, NJ, 1980)Google Scholar
  4. 4.
    C. Derman, Finite State Markovian Decision Processes (Academic, New York, 1970)Google Scholar
  5. 5.
    O. Hägström, J. Wästlund, Rigorous computer analysis of the Chow-Robbins game. Am. Math. Mon. 120, 893–900 (2013)CrossRefGoogle Scholar
  6. 6.
    R. Haijema, J. Van der Wal, An MDP decomposition approach for traffic control at isolated signalized intersections. Probab. Eng. Inf. Sci. 27, 587–602 (2008)Google Scholar
  7. 7.
    N.A.J. Hastings, Bounds on the gain of a Markov decision process. Oper. Res. 19, 240–244 (1971)CrossRefGoogle Scholar
  8. 8.
    T.P. Hill, Knowing when to stop. Am. Sci. 97, 126–133 (2007)CrossRefGoogle Scholar
  9. 9.
    R.A. Howard, Dynamic Programming and Markov Processes (Wiley, New York, 1960)Google Scholar
  10. 10.
    K.R. Krishnan, T.J. Ott, State-dependent routing for telephone traffic: theory and results, in Proceedings of 25th IEEE Conference on Decision and Control, Athens (IEEE, New York, 1986), pp. 2124–2128Google Scholar
  11. 11.
    K.R. Krishnan, T.J. Ott, Joining the right queue: a Markov decision rule, in Proceedings of 26th IEEE Conference on Decision and Control, Los Angeles, CA (IEEE, New York, 1987), pp. 1863–1868Google Scholar
  12. 12.
    J.M. Norman, Heuristic Procedures in Dynamic Programming (Manchester University Press, Manchester, 1972)Google Scholar
  13. 13.
    A. Odoni, On finding the maximal gain for Markov decision processes. Operat. Res. 17, 857–860 (1969)CrossRefGoogle Scholar
  14. 14.
    W.B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley, New York, 2007)CrossRefGoogle Scholar
  15. 15.
    M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York, 1994)CrossRefGoogle Scholar
  16. 16.
    S.M. Ross, Introduction to Stochastic Dynamic Programming, (Academic, New York, 1983)Google Scholar
  17. 17.
    S.A.E. Sassen, H.C. Tijms, R.D. Nobel, A heuristic rule for routing customers to parallel servers. Statistica Neerlandica 51, 107–121 (1997)CrossRefGoogle Scholar
  18. 18.
    P.J. Schweitzer, A. Federgruen, Geometric convergence of value iteration in multichain Markov decision problems. Adv. Appl. Probab. 11, 188–217 (1979)CrossRefGoogle Scholar
  19. 19.
    H.C. Tijms, A First Course in Stochastic Models (Wiley, New York, 2003)CrossRefGoogle Scholar
  20. 20.
    H.C. Tijms, Understanding Probability, 3rd edn. (Cambridge University Press, New York, 2012)CrossRefGoogle Scholar
  21. 21.
    J. Van der Wal, The method of value oriented successive approximations for the average reward Markov decision process. OR Spektrum 1, 233–242 (1980)CrossRefGoogle Scholar
  22. 22.
    R. Weber, Optimization and Control. Class Notes (University of Cambridge, Cambridge, 2014).
  23. 23.
    J. Wijngaard, Decomposition for dynamic programming in production and inventory control. Eng. Process Econ. 4, 385–388 (1979)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Vrije Universiteit AmsterdamAmsterdamThe Netherlands

Personalised recommendations