Abstract
We consider manufacturing problems which can be modelled as finite horizon Markov decision processes for which the effective reward function is either a strictly concave or strictly convex functional of the distribution of the final state. Reward structures such as these often arise when penalty factors are incorporated into the usual expected reward objective function. For convex problems there is a Markov deterministic policy which is optimal, but for concave problems we usually have to consider the larger class of Markov randomised policies. In the natural formulation these problems cannot be solved directly by dynamic programming. We outline alternative iterative schemes for solution and show how they can be applied in a specific manufacturing example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Collins, E.J. (1995) Finite horizon variance penalized Markov decision process. Department of Mathematics, University of Bristol, Report no. S-95–11. Submitted to OR Spektrum.
Collins, E.J. and McNamara, J.M. (1995) Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state. Department of Mathematics, University of Bristol, Report no. S-95–10. Submitted to Adv. Appl. Prob.
Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.
Derman, C. and Strauch, R. (1966) A note on memoryless rules for controlling sequential control processes. Ann. Math. Statist. 37, 276–278.
Filar, J.A., Kallenberg, L.C.M. and Lee, H.M. (1989) Variance-penalised Markov decision processes. Math Oper Res 14, 147–161.
Huang, Y. and Kallenberg, L.C.M. (1994) On finding optimal policies for Markov decision chains: a unifying framework for mean-variance tradeoffs. Math Oper Res 19, 434–448.
Kallenberg, L.C.M. (1983) Linear Programming and Finite Markov Control Problems. Mathematical Centre, Amsterdam.
Luenberger, D.G. (1973) Introduction to Linear and Nonlinear Programming. Addison Wesley, Reading.
Sobel, M.J. (1982) The variance of discounted Markov decision processes. J Appl Prob 19, 774–802.
White, D.J. (1988) Mean, variance and probabilistic criteria in finite Markov decision processes: a review. J Optimization Theory and Applic 56, 1–29.
White, D.J. (1994) A mathematical programming approach to a problem in variance penalised Markov decision processes. OR Spektrum 15, 225–230.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Collins, E.J. (1997). Using Markov decision processes to optimize a nonlinear functional of the final distribution, with manufacturing applications. In: Christer, A.H., Osaki, S., Thomas, L.C. (eds) Stochastic Modelling in Innovative Manufacturing. Lecture Notes in Economics and Mathematical Systems, vol 445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-59105-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-59105-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61768-6
Online ISBN: 978-3-642-59105-1
eBook Packages: Springer Book Archive