Modeling Medical Treatment Using Markov Decision Processes

  • Andrew J. Schaefer
  • Matthew D. Bailey
  • Steven M. Shechter
  • Mark S. Roberts
Part of the International Series in Operations Research & Management Science book series (ISOR, volume 70)


Medical treatment decisions are often sequential and uncertain. Markov decision processes (MDPs) are an appropriate technique for modeling and solving such stochastic and dynamic decisions. This chapter gives an overview of MDP models and solution techniques. We describe MDP modeling in the context of medical treatment and discuss when MDPs are an appropriate technique. We review selected successful applications of MDPs to treatment decisions in the literature. We conclude with a discussion of the challenges and opportunities for applying MDPs to medical treatment decisions.

Key words

Markov decision processes Stochastic dynamic programs Optimal medical treatment Stochastic optimal control Medical decision making 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Morris, A.H. (2000). Developing and implementing computerized protocols for standardization of clinical decisions. Annals of Internal Medicine, 132, 373–83.PubMedGoogle Scholar
  2. [2]
    Tversky, A. and D. Kahneman (1982). Availability: a heuristic for judging frequency and probability. In Judgment Under Uncertainty: Heuristics and Biases, D. Kahneman, P. Slovic and A. Tversky, (Eds.), Cambridge University Press, New York.Google Scholar
  3. [3]
    Pilote, L., R.M. Califf, S. Sapp, D.P. Miller, D.B. Mark, W.D. Weaver, J.M. Gore, P.W. Armstrong, E.M. Ohman and E.J. Topol for the GUSTO-1 Investigators (1995). Regional variation across the United States in the management of acute myocardial infarction. New England Journal of Medicine, 333, 565–572.CrossRefPubMedGoogle Scholar
  4. [4]
    Nattinger, A.B., M.S. Gottlieb, J. Veum, D. Yahnke and J.S. Goodwin (1992). Geographic variation in the use of breast-conserving treatment for breast cancer. New England Journal of Medicine, 326, 1102–7.PubMedCrossRefGoogle Scholar
  5. [5]
    Wennberg, J. and A. Gittelsohn (1973). Small area variations in health care delivery. Science, 182, 1102–1108.ADSPubMedCrossRefGoogle Scholar
  6. [6]
    Van Roy, B. (2002). Neuro-dynamic programming: Overview and recent trends. In Handbook of Markov Decision Processes: Methods and Applications, E. Feinberg and A. Schwartz, (Eds.), Kluwer Academic Press, Boston, MA.Google Scholar
  7. [7]
    de Farias, D.P. and B. Van Roy (2003). The linear programming approach to approximate dynamic programming. Operations Research 51, 850–856.MathSciNetCrossRefGoogle Scholar
  8. [8]
    Tierney, W.M., J.M. Overhage and C.J. McDonald (1995). Toward electronic medical records that improve care. Annals of Internal Medicine, 122, 725–726.PubMedGoogle Scholar
  9. [9]
    Puterman, M.L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York.Google Scholar
  10. [10]
    Bertsekas, D.P. (2001). Dynamic Programming and Optimal Control. Athena Scientific Press, Belmont, MA.Google Scholar
  11. [11]
    Bellman, R.E. (1957). Dynamic Programming. Princeton University Press, Princeton, NJ.Google Scholar
  12. [12]
    Arapostathis, A., V. Borkar, E. Fernandez-Gaucherand, M.K. Ghosh and S.I. Marcus (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM Journal on Control and Optimization, 31, 282–344.CrossRefMathSciNetGoogle Scholar
  13. [13]
    Shapley, L.S. (1953). Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39, 1095–1100.ADSzbMATHMathSciNetCrossRefGoogle Scholar
  14. [14]
    Howard, R.A. (1960). Dynamic Programming and Markov Processes. Technology Press of Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  15. [15]
    Lovejoy, W.S. (1991). A survey of algorithmic methods for partially observed Markov decision problems. Annals of Operations Research, 28, 47–66.CrossRefzbMATHMathSciNetGoogle Scholar
  16. [16]
    White, C.C. and W.T. Scherer (1989). Solution procedures for partially observed Markov decision processes. Operations Research, 37, 791–797.MathSciNetCrossRefGoogle Scholar
  17. [17]
    Streibel, C.T. (1965). Sufficient statistics in the optimal control of stochastic systems. Journal of Mathematical Analysis and Applications, 12, 576–592.MathSciNetCrossRefGoogle Scholar
  18. [18]
    Jewell, W.S. (1963). Markov-renewal programming I: Formulation, finite return models; Markov-renewal programming II, infinite return models, example. Operations Research, 11, 938–971.zbMATHMathSciNetCrossRefGoogle Scholar
  19. [19]
    Serfozo, R. (1979). An equivalence between continuous and discrete time Markov decision processes. Operations Research, 27, 616–620.zbMATHMathSciNetCrossRefGoogle Scholar
  20. [20]
    Roberts, M.S. and F.A. Sonnenberg (2000). Decision modeling techniques. In Decision Making in Health Care, F. A. Sonnenberg and G. Chapman, (Eds.), Cambridge University Press, Cambridge, UK.Google Scholar
  21. [21]
    Magni, P., S. Quaglini, M. Marchetti and G. Barosi (2000). Deciding when to intervene: a Markov decision process approach. International Journal of Medical Informatics, 60, 237–253.CrossRefPubMedGoogle Scholar
  22. [22]
    Torrance, G.W. (1976). Social preferences for health states: an empirical evaluate of three measurement techniques. Socio-Economic Planning Sciences, 10, 129–136.CrossRefGoogle Scholar
  23. [23]
    Torrance, G.W., D.H. Feeny, W.J. Furlong, R.D. Barr, Y. Zhang and Q. Wang (1996). Multiattribute utility function for a comprehensive health status classification system. Health Utilities Index Mark 2. Medical Care, 34, 702–722.CrossRefPubMedGoogle Scholar
  24. [24]
    Drummond, M.F., B. O’Brien, G.W. Stoddart and G.W. Torrance (1997). Methods for the Economic Evaluation of Health Care Programmes. Oxford University Press, Oxford.Google Scholar
  25. [25]
    Ahn, J.H. and J.C. Hornberger (1996). Involving patients in the cadaveric kidney transplant allocation process: A decision-theoretic perspective. Management Science, 42, 629–641.CrossRefGoogle Scholar
  26. [26]
    Samuelson, P. (1937). A note on measurement of utility. Review of Economic Studies, 4, 155–161.CrossRefGoogle Scholar
  27. [27]
    Frederick, S., G. Loewenstein and T. O’Donoghue (2002). Time discounting and time preference: A critical review. Journal of Economic Literature, XL, 351–401.CrossRefGoogle Scholar
  28. [28]
    Christensen-Szalanski, J.J. (1984). Discount functions and the measurement of patients’ values. Women’s decisions during childbirth. Medical Decision Making, 4, 47–58.PubMedCrossRefGoogle Scholar
  29. [29]
    Kirby, K.N. and N.N. Markovic (1995). Modeling myopic decisions: Evidence for hyperbolic delay-discounting within subjects and amounts. Organizational Behavior and Human Decision Processes, 64, 22–30.CrossRefGoogle Scholar
  30. [30]
    Gold, M.R., J. Siegel, L. Russell and M. Weinstein, Eds. (1996). Cost-Effectiveness in Health and Medicine. Oxford University Press, New York.Google Scholar
  31. [31]
    Chapman, G.B. (2003). Time discounting of health outcomes. In Time and Decision: Economic and Psychological Perspectives on Intertemporal Choice, G. Loewenstein, D. Read and R. F. Baumeister, (Eds.), Russell Sage Foundation, New York.Google Scholar
  32. [32]
    Pflug, G. and U. Dieter (1992). Simulation and Optimization: Proceedings of the International Workshop on Computationally Intensive Methods in Simulation and Optimization, held at the International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria, August 23–25, 1990. Springer-Verlag, Berlin.Google Scholar
  33. [33]
    Lefevre, C. (1981). Optimal control of a birth and death epidemic process. Operations Research, 29, 971–982.zbMATHPubMedMathSciNetCrossRefGoogle Scholar
  34. [34]
    Lippman, S. (1973). Applying a new technique in the optimization of exponential systems. Operations Research, 23, 687–710.MathSciNetCrossRefGoogle Scholar
  35. [35]
    Hu, C., W.S. Lovejoy and S.L. Shafer (1993). Comparison of some suboptimal control policies in medical drug therapy. Operations Research, 44, 696–709.CrossRefGoogle Scholar
  36. [36]
    Hauskrecht, M. and H. Fraser (2000). Planning treatment of ischemic heart disease with partially observable Markov decision processes. Artificial Intelligence in Medicine, 18, 221–244.CrossRefPubMedGoogle Scholar
  37. [37]
    Ivy, J.S. (2002). A maintenance model for breast cancer detection and treatment. Submitted for publication.Google Scholar
  38. [38]
    Alagoz, O., A.J. Schaefer, L.M. Maillart and M.S. Roberts (2002). Determining the optimal timing of living-donor liver transplantation using a Markov decision process (MDP) model. Medical Decision Making, 22, 558 (abstract).Google Scholar
  39. [39]
    Roberts, M.S. and D.C. Angus (2002). The optimal timing of liver transplantation: Final report R01 HS09694. University of Pittsburgh, Pittsburgh, PA.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Andrew J. Schaefer
    • 1
    • 2
    • 3
  • Matthew D. Bailey
    • 1
  • Steven M. Shechter
    • 1
  • Mark S. Roberts
    • 2
    • 3
  1. 1.Department of Industrial EngineeringUniversity of PittsburghPittsburgh
  2. 2.Department of MedicineUniversity of PittsburghPittsburgh
  3. 3.Center for Research on Health CareUniversity of PittsburghPittsburgh

Personalised recommendations