Abstract
This paper will give a survey of some standard infinite stage Markov decision process equations and algorithms for solving these. The paper will restrict itself to finite state sets, finite action sets, and bounded rewards. The reader will find Schweitzer [80] of some relevance for a more general, but now out-of-date, reference list to Markov decision processes. Although we restrict ourselves to conventional Markov decision processes, we refer the reader to more general formulations such as those of Koehler [48], [49] who considers Leontief and essentially Leontief type extensions, and the method of successive approximations, and that of Rothblum [75].
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
J. Anthonisse, H.C. Tijms, On White’s Condition in Dynamic Programming, Mathematische Centrum, Report B.W. 46/75, Amsterdam, 1975.
J. Bather, Optimal Decision Procedures for Finite Markov Chains, Pt.II: Communicating Systems, Adv. Appl. Prob. 5, 1975, pp. 521–540.
R. Bellman, A Markovian Decision Process, J. Math. Mech., 6, 1957, pp. 679–684.
P.F. Bestwick, D. Sadjadi, An Action Elimination Algorithm for the Discounted Semi-Markov Problem, University of Bradford Management Centre, 1978.
D. Blackwell, Discrete Dynamic Programming, Ann. Math. Stat. 33, 1962, pp. 719–726.
B.W. Brown, On The Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process, Ann. Math. Stat. 36, 1965, pp. 1279–1285.
G. Dantzig, P. Wolfe, Linear Programming in a Markov Chain, Opns. Res., X, 1962, pp. 702–710.
J. De Cani, A Dynamic Programming Algorithm for Imbedded Markov Chains When the Planning Horizon is at Infinity, Man. Sci. 10, 1964; pp. 716–733.
G.T. De Ghellinck, G.D. Eppen, Linear Programming Solutions for Separable Markovian Decision Problems, Man. Sci. 13, 1967, pp. 371–394.
E.V. Denardo, B. Fox, Multichain Markov Renewal Programs, SIAM J. Appl. Math. 16, 1968, pp. 468–487.
E.V. Denardo, Markov Renewal Programs with Small Interest Rates, Ann. Math. Stat. 42, 1971, pp. 477–496.
E.V. Denardo, Separable Markov Decision Problems, Man. Sci. 14, 1968, pp. 451–462.
E.V. Denardo, On Linear Programming in a Markov Decision Problem, Man. Sci. 16, 1970A, pp. 281–288.
E.V. Denardo, Contraction Mappings in the Theory Underlying Dynamic Programming, SIAM Rev. 9, 1967, pp. 165–177.
E.V. Denardo, A Markov Decision Problem, in T.C. Hu, S.M. Robinson (Eds.), Mathematical Programming, Academic Press, New York, 1973.
C. Derman, On Sequential Decisions and Markov Chains, Man. Sci. IX, 1962, pp. 16–24.
C. Derman, Markovian Decision Processes, Ann. Math. Stat., 37, 1966, pp. 1545–1553.
B. Curtis Eaves, Complementary Pivot Theory and Markov Decision Chains, in S. Karamordiou, C.B. Garcia (Eds.), Fixed Points, Algorithms and Applications, First International Conference on Computing Fixed Points with Applications, Clemson University, South Carolina, June 1974.
A. Federgruen, P.J. Schweitzer, Discounted and Undiscounted Value Iteration in Markov Decision Problems: A Survey, Math. Centrum Report B.W; 78/77, August 1978, Amsterdam.
B.L. Fox, Numerical Computation; Transient Behaviour of a Markov Renewal Process, Paper No. 119, Département d’Informatique, University of Montreal, 1973.
B.L. Fox, Markov Renewal Programming by Linear Fractional Programming, SIAM J. Appl. Math. 14, 1966, pp. 1418–1432.
R. Grinold, Elimination of Suboptimal Actions in Markov Decision Problems, Opns. Res. 21, 1973, pp. 848–851.
N.A.J. Hastings, J.M.C. Mello, Tests for Suboptimal Actions in Discounted Markov Programming, Man. Sci. 19, 1973, pp. 1019–1022.
N.A.J. Hastings, A Test for Non-Optimal Actions in Undiscoutned Markov Decision Chains, Man. Sci. 23, 1976, pp. 87–92.
N.A.J. Hastings, Bounds on the Gain of a Markov Decision Process, Opns. Res. 19, 1971, pp. 240–243.
N.A.J. Hastings, J.A.E.E. van Nunen, The Action Elimination Algorithms for Markov Decision Processes, Memorandum COSOR 76–20, Department of Mathematics, Eindhoven University of Technology, 1976.
N.A.J. Hastings, Optimisation of Discounted Markov Decision Problems, Opl. Res. Q. 20, 1969, pp. 499–500.
N.A.J. Hastings, Some Notes on Dynamic Programming and Replacement, Opl. Res. Q. 19, 1968, pp. 453–457.
N.A.J. Hastings, J.M.C. Mello, Decision Networks, Wiley, 1978.
K. Hinderer, Estimates for Finite-Stage Dynamic Programs, J.M.A.A., 55, 1976, pp. 205–238.
K. Hinderer, On Approximate Solutions of Finite-Stage Dynamic Programs, International Conference on Dynamic Programming, University of British Columbia, Vancouver, 1977.
K. Hinderer, G. Hübner, On Exact and Approximate Solutions of Unstructured Finite Stage Dynamic Programs, Proceedings of the Advanced Seminar on Markov Decision Theory, Mathematical Centre, Amsterdam, 1976.
K. Hinderer, G. Hübner, Recent Results on Finite Stage Stochastic Dynamic Programs, Proceedings of the 41st Session of the International Statistical Institute, New Delhi, December 1977.
K. Hinderer, W. Whitt, On Approximate Solutions of Finite-Stage Dynamic Programs, Proceedings of International Conference on Dynamic Programming, University of British Columbia, Vancouver, 1977.
H. Hinomoto, Linear Programming of Markovian Decisions, Man. Sci. Th., 18, 1971, pp. 88–96.
D. Hitchcock, J. MacQueen, On Computing the Expected Discounted Return in a Markov Chain, Naval Res. Log. Quart. 17, 1970, pp. 237–241.
A. Hordijk, H. Tijms, A Modified Form of the Iterative Method of Dynamic Programming, Ann. Math. Stat. 3, 1975, pp. 203–208.
A. Hordijk, H. Tijms, The Method of Successive Approximations and Markovian Decision Problems, Opns. Res. 22, 1974, pp. 519–521.
A. Hordijk, P.J. Schweitzer, H. Tijms, The Asymptotic Behaviour of the Minimal Total Expected Cost for the Denumerable State Markov Decision Model, J. Appl. Prob. 12, 1975, pp. 298–305.
A. Hordijk, L.C.M. Kallenberg, On Solving Markov Decision Problems by Linear Programming, International Conference on Markov Decision Processes, Manchester, July 1978.
R.A. Howard, Semi-Markovian Control Systems, in: Semi-Markovian Decision Processes, Proc. 34 Session Bull. Internat. Stat. Inst. 40, Book 2, 1964, pp. 625–652.
R.A. Howard, Dynamic Programming and Markov Processes, 1960.
R.A. Howard, Research in Semi-Markovian Soc. of Japan, 6, 1964, pp. 163–199.
G. Hübner, Extrapolation and Exclusion Stage Markov Decision Models, Doctoral burg, Germany, 1977. Decision Structures, J. Opns. Res. of Suboptimal Actions in Finite- Dissertation, University of Ham
G. Hübner, Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by Use of Contraction Properties, in: Information Theory, Statistical Decision Functions, Random Processes, Transactions of the Seventh Prague Conference and the European Meeting of Statisticians, D. Reidel, 1977.
W. Jewell, Markov Renewal Programming, Opns. Res. 11, 1963, pp. 938–971.
A.J. Kleinman, H.J. Kushner, Accelerated Procedures for the Solution of Discrete Markov Control Problems, I.E.E.E. Trans. Aut. Cont. A.C.-16 1971, pp. 147–152.
G. Koehler, Generalised Markov Decision Processes, Dept. of Management, University of Florida, Feb. 1978.
G. Koehler, Value Convergence in a Generalised Markov Decision Process, Dept. of Management, University of Florida, Jan. 1978.
M. Krasnosel’skii, G. Vainikko, P. Zabreicko, Ya. Rutitiskii, and V. Stetsenko, Approximate Solution of Operator Equations, Wolters-Naardkoff Pub., Groningen, 1972.
H. Kushner, Introduction to Stochastic Control, Holt, Rinehart and Winston, 1971.
E. Lanery, Etude asymptotique des systèmes Markoviens a commande, Revue d’Informatique et Recherche 0pérationelle, 1, 1967, pp. 3–56.
S. Lefschetz, Introduction to Topology, Princeton University Press, 1949.
J. MacQueen, A Modified Dynamic Programming Method for Markovian Decision Problems, J.M.A.A. 14, 1966, pp. 38–43.
J. MacQueen, A Test for Sub-Optimal Actions in Markovian Decision Problems, Opns. Res. 15, 1967, pp. 559–561.
A. Manne, Linear Programming and Sequential Decisions, Man. Sci. 6, 1960, pp. 259–267.
B. Miller, A. Veinott, Dynamic Programming with a Small Interest Rate, Ann. Math. Stat. 40, 1969, pp. 366–370.
H. Mine, S. Osaki, Linear Programming Algorithms for Semi-Markovian Decision Processes, J. Math. Ann. App. 22, 1968, pp. 356–381.
H. Mine, Y. Tabata, On the Direct Sums of Markovian Decision Processes, J.M.A.A., 28, 1969, pp. 284–293.
H. Mine, S. Osaki, Some Remarks on a Markov Decision Process with an Absorbing State, J. M.A.A., 23, 1968, pp. 327–334.
H. Mine, S. Osaki, Linear Programming Considerations on Markovian Decision Processes with No Discounting, J. Math. Anal. Appl. 26, 1969, pp. 221–230.
H. Mine, S. Osaki, Markovian Decision Processes, Elsevier, 1970.
T.E. Morton, Using Strong Convergence to Accelerate Value Iteration, Graduate School of Industrial Administration, Carnegie-Mellon University, 1976.
T.E. Morton, Undiscounted Markov Renewal Programming via Modified Successive Approximations, Opns. Res. Vol. 19, pp. 1081–1089, 1971.
J.M. Norman, D.J. White, A Method for Approximate Solutions to Stochastic Dynamic Programming Problems Using Expectations, Opns. Res. 16, 1968, pp. 296–306.
A.R. Odoni, On Finding The Maximal Gain For Markov Decision Processes, Opns. Res. 17, 1969, pp. 857–860.
E. Porteus, Some Bounds for Discounted Sequential Decision Processes, Man. Sci. 18, 1971, pp. 7–11.
E. Porteus, Bounds and Transformations for Discounted Finite Markov Decision Chains, Opns. Res. 23, 1975, pp. 761–765.
E.L. Porteus, J.C. Totten, Accelerated Computations of the Expected Discounted Return in a Markov Chain, Opns. Res. 26, 1978, pp. 350–357.
E.L. Porteus, Overview of Iterative Methods for Discounted Markov Decision Chains, International Conference on Markov Decision Processes, Manchester, July 1978.
D. Reetz, Approximate Solutions of Discounted Markovian Decision Processes, Institut für Gesellschafts-and Wirtschaftswissenschaften, Wirtschaftstheoretische Abteilung, University of Bonn, 1971.
U. Rieder, Estimates for Dynamic Programs with Lower and Upper Bounding Functions, Institut für Mathematische Statistik, University of Karlsruhe, Germany, 1977.
I.V. Romananskii, On The Solvability of Bellman’s Functional Equations for a Markovian Decision Process, J.M.A.A. 42, 1973, pp. 485–498.
S.M. Ross, Non-Discounted Denumerable Markovian Decision Models, Ann. Math. Stat. 39, pp. 412–423, 1968.
U.G. Rothblum, Normalised Markov Decision Chains, Opns. Res. 23, 1975, pp. 785–795.
H.E. Scarf, The Approximation of Fixed Points of a Continuous Mapping, SIAM J. Appl. Maths. 15, 1967, pp. 1328–1343.
P.J. Schweitzer, Iterative Solution of the Functional Equations of Un-discounted Markov Renewal Programming, J.M.A.A. 34, 1971, pp. 495–501.
P.J. Schweitzer, Multiple Policy Improvements in Undiscounted Markov Renewal Programming, Opns. Res. 19, 1971, pp. 784–793.
P.J. Schweitzer, Perturbation Theory and Undiscounted Markov Renewal Programming, Opns. Res. 17, 1969, pp. 716–727.
P.J. Schweitzer, Annotated Bibliograpiy on Markov Decision Processes, IBM Watson Research Center, P.O. Box 218, Yorktown Heights, New York.
P.J. Schweitzer, An Overview of Undiscounted Markovian Decision Processes, International Conference on Markov Decision Processes, Manchester, July, 1978.
J.F. Shapiro, Brouwer’s Fixed Point Theorem and Finite State Space Markovian Decision Theory, J. Math. Ann. Appl., 49, 1975, pp. 710–712.
J.F. Shapiro, Turnpike Planning Horizons for a Markovian Decision Model, Man. Sci. 14, 1968, pp. 292–300.
J.C. Totten, Computational Methods for Finite State Finite Valued Markovian Decision Problems, Report ORC 71–9, Operations Research Center, University of California, Berkeley, 1971.
J.A.E.E. Van Nunen, A Set of Successive Approximation Methods for Discounted Markovian Decision Problems, Zeitschr. f. Ops. Res. 20, 1976, pp. 203–209.
J.A.E.E. Van Nunen, Improved Successive Approximation Methods for Discounted Markov Decision Processes, in Colloquia Mathematica Societatis Janos Bolyai 12, pp. 667–682, ( A. Prekopa Ed. ), North-Holland, 1976.
J.A.E.E. Van Nunen, J. Wessels, A Principle for Generating Optimisation Procedures for Discounted Markov Decision Processes, in: Colloquia Mathematica Societatis Janos Bolyai 12, pp. 683–695, ( A. Prekopa Ed. ), North-Holland, 1976.
H. Wagner, Principles of Operations Research, Prentice Hall, 1969.
D.J. White, Dynamic Programming, Oliver and Boyd, 1969.
D.J. White, Elimination of Non-Optimal Actions in Markov Decision Processes, Notes in Decision Theory, No. 31, Department of Decision Theory, Manchester University, April 1977.
D.J. White, Finite Dynamic Programming, Wiley - forthcoming.
D.J. White, Dynamic Programming, Markov Chains and the Method of Successive Approximations, J.M.A.A. 6, 1963, pp. 373–376.
D.J. White, Dynamic Programming and Systems of Uncertain Duration, J.M.A.A 29, 1970, pp. 419–423.
D.J. White, Dynamic Programming and Systems of Uncertain Duration, Man. Sci. 12, 1965, pp. 37–67.
D.J. White, Approximating Bounds and Policies in Markov Decision Pro-cesses, Notes in Decision Theory, No. 54, Department of Decision Theory, Manchester University, June 1978.
D. Yamada, Duality Theorem in Markov Decision Problems, J.M.A.A. 50, 1975, pp. 579–595.
Additional References
L.C. Thomas, Connectedness Conditions Used in Finite State Markov Decision Processes, to appear in J.M.A.A.
L.C. Thomas, Connectedness Conditions Used in Denumerable State Markov Decision Processes, International Conference on Markov Decision Processes, Manchester, July, 1978.
M.L. Puterman,-S. Brumelli, On the Convergence of Policy Iteration in Stationary Dynamic Programming, Working Paper No. 392, September 1977, Faculty of Commerce, University of British Columbia.
M.L. Puterman, M.C. Shiu, Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Working Paper No. 481, November 1977, Faculty of Commerce, University of British Columbia.
H.C. Tijms, An Overview of Non-Finite State Semi-Markov Decision Problems with the Average Cost Criterion, International Conference on Markov Decision Processes, Manchester, July, 1978.
H.W. Kuhn, A Simplicial Approximation of Fixed Points, Proc. Nat. Acad. Sci., U.S.A., 61, 1968, 1238–1242.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1979 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
White, D.J. (1979). A survey of algorithms for some restricted classes of Markov decision problems. In: Gaede, KW., Pressmar, D.B., Schneeweiß, C., Schuster, KP., Seifert, O. (eds) Papers of the 8th DGOR Annual Meeting / Vorträge der 8. DGOR Jahrestagung. Proceedings in Operations Research 8, vol 1978. Physica, Heidelberg. https://doi.org/10.1007/978-3-642-99749-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-99749-5_13
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-0212-2
Online ISBN: 978-3-642-99749-5
eBook Packages: Springer Book Archive