Skip to main content

A survey of algorithms for some restricted classes of Markov decision problems

  • Conference paper

Part of the book series: Proceedings in Operations Research 8 ((ORP,volume 1978))

Abstract

This paper will give a survey of some standard infinite stage Markov decision process equations and algorithms for solving these. The paper will restrict itself to finite state sets, finite action sets, and bounded rewards. The reader will find Schweitzer [80] of some relevance for a more general, but now out-of-date, reference list to Markov decision processes. Although we restrict ourselves to conventional Markov decision processes, we refer the reader to more general formulations such as those of Koehler [48], [49] who considers Leontief and essentially Leontief type extensions, and the method of successive approximations, and that of Rothblum [75].

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   44.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Anthonisse, H.C. Tijms, On White’s Condition in Dynamic Programming, Mathematische Centrum, Report B.W. 46/75, Amsterdam, 1975.

    Google Scholar 

  2. J. Bather, Optimal Decision Procedures for Finite Markov Chains, Pt.II: Communicating Systems, Adv. Appl. Prob. 5, 1975, pp. 521–540.

    Article  Google Scholar 

  3. R. Bellman, A Markovian Decision Process, J. Math. Mech., 6, 1957, pp. 679–684.

    Google Scholar 

  4. P.F. Bestwick, D. Sadjadi, An Action Elimination Algorithm for the Discounted Semi-Markov Problem, University of Bradford Management Centre, 1978.

    Google Scholar 

  5. D. Blackwell, Discrete Dynamic Programming, Ann. Math. Stat. 33, 1962, pp. 719–726.

    Article  Google Scholar 

  6. B.W. Brown, On The Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process, Ann. Math. Stat. 36, 1965, pp. 1279–1285.

    Article  Google Scholar 

  7. G. Dantzig, P. Wolfe, Linear Programming in a Markov Chain, Opns. Res., X, 1962, pp. 702–710.

    Google Scholar 

  8. J. De Cani, A Dynamic Programming Algorithm for Imbedded Markov Chains When the Planning Horizon is at Infinity, Man. Sci. 10, 1964; pp. 716–733.

    Google Scholar 

  9. G.T. De Ghellinck, G.D. Eppen, Linear Programming Solutions for Separable Markovian Decision Problems, Man. Sci. 13, 1967, pp. 371–394.

    Google Scholar 

  10. E.V. Denardo, B. Fox, Multichain Markov Renewal Programs, SIAM J. Appl. Math. 16, 1968, pp. 468–487.

    Google Scholar 

  11. E.V. Denardo, Markov Renewal Programs with Small Interest Rates, Ann. Math. Stat. 42, 1971, pp. 477–496.

    Article  Google Scholar 

  12. E.V. Denardo, Separable Markov Decision Problems, Man. Sci. 14, 1968, pp. 451–462.

    Article  Google Scholar 

  13. E.V. Denardo, On Linear Programming in a Markov Decision Problem, Man. Sci. 16, 1970A, pp. 281–288.

    Google Scholar 

  14. E.V. Denardo, Contraction Mappings in the Theory Underlying Dynamic Programming, SIAM Rev. 9, 1967, pp. 165–177.

    Article  Google Scholar 

  15. E.V. Denardo, A Markov Decision Problem, in T.C. Hu, S.M. Robinson (Eds.), Mathematical Programming, Academic Press, New York, 1973.

    Google Scholar 

  16. C. Derman, On Sequential Decisions and Markov Chains, Man. Sci. IX, 1962, pp. 16–24.

    Google Scholar 

  17. C. Derman, Markovian Decision Processes, Ann. Math. Stat., 37, 1966, pp. 1545–1553.

    Article  Google Scholar 

  18. B. Curtis Eaves, Complementary Pivot Theory and Markov Decision Chains, in S. Karamordiou, C.B. Garcia (Eds.), Fixed Points, Algorithms and Applications, First International Conference on Computing Fixed Points with Applications, Clemson University, South Carolina, June 1974.

    Google Scholar 

  19. A. Federgruen, P.J. Schweitzer, Discounted and Undiscounted Value Iteration in Markov Decision Problems: A Survey, Math. Centrum Report B.W; 78/77, August 1978, Amsterdam.

    Google Scholar 

  20. B.L. Fox, Numerical Computation; Transient Behaviour of a Markov Renewal Process, Paper No. 119, Département d’Informatique, University of Montreal, 1973.

    Google Scholar 

  21. B.L. Fox, Markov Renewal Programming by Linear Fractional Programming, SIAM J. Appl. Math. 14, 1966, pp. 1418–1432.

    Google Scholar 

  22. R. Grinold, Elimination of Suboptimal Actions in Markov Decision Problems, Opns. Res. 21, 1973, pp. 848–851.

    Google Scholar 

  23. N.A.J. Hastings, J.M.C. Mello, Tests for Suboptimal Actions in Discounted Markov Programming, Man. Sci. 19, 1973, pp. 1019–1022.

    Google Scholar 

  24. N.A.J. Hastings, A Test for Non-Optimal Actions in Undiscoutned Markov Decision Chains, Man. Sci. 23, 1976, pp. 87–92.

    Google Scholar 

  25. N.A.J. Hastings, Bounds on the Gain of a Markov Decision Process, Opns. Res. 19, 1971, pp. 240–243.

    Google Scholar 

  26. N.A.J. Hastings, J.A.E.E. van Nunen, The Action Elimination Algorithms for Markov Decision Processes, Memorandum COSOR 76–20, Department of Mathematics, Eindhoven University of Technology, 1976.

    Google Scholar 

  27. N.A.J. Hastings, Optimisation of Discounted Markov Decision Problems, Opl. Res. Q. 20, 1969, pp. 499–500.

    Google Scholar 

  28. N.A.J. Hastings, Some Notes on Dynamic Programming and Replacement, Opl. Res. Q. 19, 1968, pp. 453–457.

    Google Scholar 

  29. N.A.J. Hastings, J.M.C. Mello, Decision Networks, Wiley, 1978.

    Google Scholar 

  30. K. Hinderer, Estimates for Finite-Stage Dynamic Programs, J.M.A.A., 55, 1976, pp. 205–238.

    Google Scholar 

  31. K. Hinderer, On Approximate Solutions of Finite-Stage Dynamic Programs, International Conference on Dynamic Programming, University of British Columbia, Vancouver, 1977.

    Google Scholar 

  32. K. Hinderer, G. Hübner, On Exact and Approximate Solutions of Unstructured Finite Stage Dynamic Programs, Proceedings of the Advanced Seminar on Markov Decision Theory, Mathematical Centre, Amsterdam, 1976.

    Google Scholar 

  33. K. Hinderer, G. Hübner, Recent Results on Finite Stage Stochastic Dynamic Programs, Proceedings of the 41st Session of the International Statistical Institute, New Delhi, December 1977.

    Google Scholar 

  34. K. Hinderer, W. Whitt, On Approximate Solutions of Finite-Stage Dynamic Programs, Proceedings of International Conference on Dynamic Programming, University of British Columbia, Vancouver, 1977.

    Google Scholar 

  35. H. Hinomoto, Linear Programming of Markovian Decisions, Man. Sci. Th., 18, 1971, pp. 88–96.

    Google Scholar 

  36. D. Hitchcock, J. MacQueen, On Computing the Expected Discounted Return in a Markov Chain, Naval Res. Log. Quart. 17, 1970, pp. 237–241.

    Article  Google Scholar 

  37. A. Hordijk, H. Tijms, A Modified Form of the Iterative Method of Dynamic Programming, Ann. Math. Stat. 3, 1975, pp. 203–208.

    Article  Google Scholar 

  38. A. Hordijk, H. Tijms, The Method of Successive Approximations and Markovian Decision Problems, Opns. Res. 22, 1974, pp. 519–521.

    Google Scholar 

  39. A. Hordijk, P.J. Schweitzer, H. Tijms, The Asymptotic Behaviour of the Minimal Total Expected Cost for the Denumerable State Markov Decision Model, J. Appl. Prob. 12, 1975, pp. 298–305.

    Article  Google Scholar 

  40. A. Hordijk, L.C.M. Kallenberg, On Solving Markov Decision Problems by Linear Programming, International Conference on Markov Decision Processes, Manchester, July 1978.

    Google Scholar 

  41. R.A. Howard, Semi-Markovian Control Systems, in: Semi-Markovian Decision Processes, Proc. 34 Session Bull. Internat. Stat. Inst. 40, Book 2, 1964, pp. 625–652.

    Google Scholar 

  42. R.A. Howard, Dynamic Programming and Markov Processes, 1960.

    Google Scholar 

  43. R.A. Howard, Research in Semi-Markovian Soc. of Japan, 6, 1964, pp. 163–199.

    Google Scholar 

  44. G. Hübner, Extrapolation and Exclusion Stage Markov Decision Models, Doctoral burg, Germany, 1977. Decision Structures, J. Opns. Res. of Suboptimal Actions in Finite- Dissertation, University of Ham

    Google Scholar 

  45. G. Hübner, Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by Use of Contraction Properties, in: Information Theory, Statistical Decision Functions, Random Processes, Transactions of the Seventh Prague Conference and the European Meeting of Statisticians, D. Reidel, 1977.

    Google Scholar 

  46. W. Jewell, Markov Renewal Programming, Opns. Res. 11, 1963, pp. 938–971.

    Google Scholar 

  47. A.J. Kleinman, H.J. Kushner, Accelerated Procedures for the Solution of Discrete Markov Control Problems, I.E.E.E. Trans. Aut. Cont. A.C.-16 1971, pp. 147–152.

    Google Scholar 

  48. G. Koehler, Generalised Markov Decision Processes, Dept. of Management, University of Florida, Feb. 1978.

    Google Scholar 

  49. G. Koehler, Value Convergence in a Generalised Markov Decision Process, Dept. of Management, University of Florida, Jan. 1978.

    Google Scholar 

  50. M. Krasnosel’skii, G. Vainikko, P. Zabreicko, Ya. Rutitiskii, and V. Stetsenko, Approximate Solution of Operator Equations, Wolters-Naardkoff Pub., Groningen, 1972.

    Chapter  Google Scholar 

  51. H. Kushner, Introduction to Stochastic Control, Holt, Rinehart and Winston, 1971.

    Google Scholar 

  52. E. Lanery, Etude asymptotique des systèmes Markoviens a commande, Revue d’Informatique et Recherche 0pérationelle, 1, 1967, pp. 3–56.

    Google Scholar 

  53. S. Lefschetz, Introduction to Topology, Princeton University Press, 1949.

    Google Scholar 

  54. J. MacQueen, A Modified Dynamic Programming Method for Markovian Decision Problems, J.M.A.A. 14, 1966, pp. 38–43.

    Google Scholar 

  55. J. MacQueen, A Test for Sub-Optimal Actions in Markovian Decision Problems, Opns. Res. 15, 1967, pp. 559–561.

    Google Scholar 

  56. A. Manne, Linear Programming and Sequential Decisions, Man. Sci. 6, 1960, pp. 259–267.

    Article  Google Scholar 

  57. B. Miller, A. Veinott, Dynamic Programming with a Small Interest Rate, Ann. Math. Stat. 40, 1969, pp. 366–370.

    Article  Google Scholar 

  58. H. Mine, S. Osaki, Linear Programming Algorithms for Semi-Markovian Decision Processes, J. Math. Ann. App. 22, 1968, pp. 356–381.

    Article  Google Scholar 

  59. H. Mine, Y. Tabata, On the Direct Sums of Markovian Decision Processes, J.M.A.A., 28, 1969, pp. 284–293.

    Google Scholar 

  60. H. Mine, S. Osaki, Some Remarks on a Markov Decision Process with an Absorbing State, J. M.A.A., 23, 1968, pp. 327–334.

    Google Scholar 

  61. H. Mine, S. Osaki, Linear Programming Considerations on Markovian Decision Processes with No Discounting, J. Math. Anal. Appl. 26, 1969, pp. 221–230.

    Article  Google Scholar 

  62. H. Mine, S. Osaki, Markovian Decision Processes, Elsevier, 1970.

    Google Scholar 

  63. T.E. Morton, Using Strong Convergence to Accelerate Value Iteration, Graduate School of Industrial Administration, Carnegie-Mellon University, 1976.

    Google Scholar 

  64. T.E. Morton, Undiscounted Markov Renewal Programming via Modified Successive Approximations, Opns. Res. Vol. 19, pp. 1081–1089, 1971.

    Article  Google Scholar 

  65. J.M. Norman, D.J. White, A Method for Approximate Solutions to Stochastic Dynamic Programming Problems Using Expectations, Opns. Res. 16, 1968, pp. 296–306.

    Google Scholar 

  66. A.R. Odoni, On Finding The Maximal Gain For Markov Decision Processes, Opns. Res. 17, 1969, pp. 857–860.

    Google Scholar 

  67. E. Porteus, Some Bounds for Discounted Sequential Decision Processes, Man. Sci. 18, 1971, pp. 7–11.

    Google Scholar 

  68. E. Porteus, Bounds and Transformations for Discounted Finite Markov Decision Chains, Opns. Res. 23, 1975, pp. 761–765.

    Google Scholar 

  69. E.L. Porteus, J.C. Totten, Accelerated Computations of the Expected Discounted Return in a Markov Chain, Opns. Res. 26, 1978, pp. 350–357.

    Google Scholar 

  70. E.L. Porteus, Overview of Iterative Methods for Discounted Markov Decision Chains, International Conference on Markov Decision Processes, Manchester, July 1978.

    Google Scholar 

  71. D. Reetz, Approximate Solutions of Discounted Markovian Decision Processes, Institut für Gesellschafts-and Wirtschaftswissenschaften, Wirtschaftstheoretische Abteilung, University of Bonn, 1971.

    Google Scholar 

  72. U. Rieder, Estimates for Dynamic Programs with Lower and Upper Bounding Functions, Institut für Mathematische Statistik, University of Karlsruhe, Germany, 1977.

    Google Scholar 

  73. I.V. Romananskii, On The Solvability of Bellman’s Functional Equations for a Markovian Decision Process, J.M.A.A. 42, 1973, pp. 485–498.

    Google Scholar 

  74. S.M. Ross, Non-Discounted Denumerable Markovian Decision Models, Ann. Math. Stat. 39, pp. 412–423, 1968.

    Article  Google Scholar 

  75. U.G. Rothblum, Normalised Markov Decision Chains, Opns. Res. 23, 1975, pp. 785–795.

    Article  Google Scholar 

  76. H.E. Scarf, The Approximation of Fixed Points of a Continuous Mapping, SIAM J. Appl. Maths. 15, 1967, pp. 1328–1343.

    Google Scholar 

  77. P.J. Schweitzer, Iterative Solution of the Functional Equations of Un-discounted Markov Renewal Programming, J.M.A.A. 34, 1971, pp. 495–501.

    Google Scholar 

  78. P.J. Schweitzer, Multiple Policy Improvements in Undiscounted Markov Renewal Programming, Opns. Res. 19, 1971, pp. 784–793.

    Google Scholar 

  79. P.J. Schweitzer, Perturbation Theory and Undiscounted Markov Renewal Programming, Opns. Res. 17, 1969, pp. 716–727.

    Article  Google Scholar 

  80. P.J. Schweitzer, Annotated Bibliograpiy on Markov Decision Processes, IBM Watson Research Center, P.O. Box 218, Yorktown Heights, New York.

    Google Scholar 

  81. P.J. Schweitzer, An Overview of Undiscounted Markovian Decision Processes, International Conference on Markov Decision Processes, Manchester, July, 1978.

    Google Scholar 

  82. J.F. Shapiro, Brouwer’s Fixed Point Theorem and Finite State Space Markovian Decision Theory, J. Math. Ann. Appl., 49, 1975, pp. 710–712.

    Article  Google Scholar 

  83. J.F. Shapiro, Turnpike Planning Horizons for a Markovian Decision Model, Man. Sci. 14, 1968, pp. 292–300.

    Google Scholar 

  84. J.C. Totten, Computational Methods for Finite State Finite Valued Markovian Decision Problems, Report ORC 71–9, Operations Research Center, University of California, Berkeley, 1971.

    Google Scholar 

  85. J.A.E.E. Van Nunen, A Set of Successive Approximation Methods for Discounted Markovian Decision Problems, Zeitschr. f. Ops. Res. 20, 1976, pp. 203–209.

    Google Scholar 

  86. J.A.E.E. Van Nunen, Improved Successive Approximation Methods for Discounted Markov Decision Processes, in Colloquia Mathematica Societatis Janos Bolyai 12, pp. 667–682, ( A. Prekopa Ed. ), North-Holland, 1976.

    Google Scholar 

  87. J.A.E.E. Van Nunen, J. Wessels, A Principle for Generating Optimisation Procedures for Discounted Markov Decision Processes, in: Colloquia Mathematica Societatis Janos Bolyai 12, pp. 683–695, ( A. Prekopa Ed. ), North-Holland, 1976.

    Google Scholar 

  88. H. Wagner, Principles of Operations Research, Prentice Hall, 1969.

    Google Scholar 

  89. D.J. White, Dynamic Programming, Oliver and Boyd, 1969.

    Google Scholar 

  90. D.J. White, Elimination of Non-Optimal Actions in Markov Decision Processes, Notes in Decision Theory, No. 31, Department of Decision Theory, Manchester University, April 1977.

    Google Scholar 

  91. D.J. White, Finite Dynamic Programming, Wiley - forthcoming.

    Google Scholar 

  92. D.J. White, Dynamic Programming, Markov Chains and the Method of Successive Approximations, J.M.A.A. 6, 1963, pp. 373–376.

    Google Scholar 

  93. D.J. White, Dynamic Programming and Systems of Uncertain Duration, J.M.A.A 29, 1970, pp. 419–423.

    Google Scholar 

  94. D.J. White, Dynamic Programming and Systems of Uncertain Duration, Man. Sci. 12, 1965, pp. 37–67.

    Article  Google Scholar 

  95. D.J. White, Approximating Bounds and Policies in Markov Decision Pro-cesses, Notes in Decision Theory, No. 54, Department of Decision Theory, Manchester University, June 1978.

    Google Scholar 

  96. D. Yamada, Duality Theorem in Markov Decision Problems, J.M.A.A. 50, 1975, pp. 579–595.

    Google Scholar 

Additional References

  1. L.C. Thomas, Connectedness Conditions Used in Finite State Markov Decision Processes, to appear in J.M.A.A.

    Google Scholar 

  2. L.C. Thomas, Connectedness Conditions Used in Denumerable State Markov Decision Processes, International Conference on Markov Decision Processes, Manchester, July, 1978.

    Google Scholar 

  3. M.L. Puterman,-S. Brumelli, On the Convergence of Policy Iteration in Stationary Dynamic Programming, Working Paper No. 392, September 1977, Faculty of Commerce, University of British Columbia.

    Google Scholar 

  4. M.L. Puterman, M.C. Shiu, Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Working Paper No. 481, November 1977, Faculty of Commerce, University of British Columbia.

    Google Scholar 

  5. H.C. Tijms, An Overview of Non-Finite State Semi-Markov Decision Problems with the Average Cost Criterion, International Conference on Markov Decision Processes, Manchester, July, 1978.

    Google Scholar 

  6. H.W. Kuhn, A Simplicial Approximation of Fixed Points, Proc. Nat. Acad. Sci., U.S.A., 61, 1968, 1238–1242.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

K.-W. Gaede D. B. Pressmar Ch. Schneeweiß K.-P. Schuster O. Seifert

Rights and permissions

Reprints and permissions

Copyright information

© 1979 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

White, D.J. (1979). A survey of algorithms for some restricted classes of Markov decision problems. In: Gaede, KW., Pressmar, D.B., Schneeweiß, C., Schuster, KP., Seifert, O. (eds) Papers of the 8th DGOR Annual Meeting / Vorträge der 8. DGOR Jahrestagung. Proceedings in Operations Research 8, vol 1978. Physica, Heidelberg. https://doi.org/10.1007/978-3-642-99749-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-99749-5_13

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-0212-2

  • Online ISBN: 978-3-642-99749-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics