Abstract
We consider an adaptive finite state controlled Markov chain with partial state information, motivated by a class of replacement problems. We present parameter estimation techniques based on the information available after actions that reset the state to a known value are taken. We prove that the parameter estimates converge w.p.1 to the true (unknown) parameter, under the feedback structure induced by a certainty equivalent adaptive policy. We also show that the adaptive policy is self-optimizing, in a long-run average sense, for any (measurable) sequence of parameter estimates converging w.p.1 to the true parameter.
This work was supported in part by the Texas Advanced Technology Program under Grant No. 003658-093, in part by the Air Force Office of Scientific Research under Grants AFOSR-91-0033, F49620-92-J-0045, and F49620-92-J-0083, and in part by the National Science Foundation under Grant CDR-8803012.
Preview
Unable to display preview. Download preview PDF.
References
A. Arapostathis and S. I. Marcus, “Analysis of an Identification Algorithm Arising in the Adaptive Estimation of Markov Chains,” Mathematics of Control, Signals and Systems, vol. 3, 1990, pp. 1–29.
A. Arapostathis, V. S. Borkar, E. Fernández-Gaucherand, M. K. Ghosh, and S. I. Marcus, “Discrete-Time Controlled Markov Processes with Average Cost Criterion: A Survey,” submitted for publication.
A. Arapostathis, E. Fernández-Gaucherand, and S. I. Marcus, “Analysis of an Adaptive Control Scheme for a Partially Observed Controlled Markov Chain,” Proc. 29th IEEE Conf. Decision and Control, Honolulu, HI, 1990, pp. 1438–1444.
K. J. Åström, “Optimal Control of Markov Processes with Incomplete State Information,” J. Math. Anal. Appl., vol. 10, 1965, pp. 174–205.
D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987.
E. Fernández-Gaucherand, “Controlled Markov Processes on the Infinite Planning Horizon: Optimal & Adaptive Control,” Ph.D. Dissertation, The University of Texas at Austin, August 1991.
E. Fernández-Gaucherand, A. Arapostathis and S.I. Marcus, “On the Adaptive Control of Partially Observable Markov Decision Processes,” Proc. 27th IEEE Conf. Decision and Control, Austin, TX, 1988, pp. 1204–1210.
E. Fernández-Gaucherand, A. Arapostathis and S. I. Marcus, “On the Adaptive Control of a Partially Observable Binary Markov Decision Process,” in Advances in Computing and Control, W. A. Porter, S. C. Kak, J. L. Aravena, eds., Lecture Notes in Control and Information Sciences, vol. 130, Springer-Verlag, Berlin, 1989, pp. 217–228.
E. Fernández-Gaucherand, A. Arapostathis and S. I. Marcus, “On the Average Cost Optimality Equation and the Structure of Optimal Policies for Partially Observable Markov Decision Processes,” Annals of Operations Research, vol. 29, 1991, pp. 439–470.
O. Hernández-Lerma, Adaptive Markov Control Processes, Springer Verlag, New York, 1989.
P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification and Adaptive Control, Prentice-Hall, Englewood Cliffs, NJ, 1986.
H. J. Kushner, “An Averaging Method for Stochastic Approximations with Discontinuous Dynamics, Constraints, and State Dependent Noise,” in Recent Advances in Statistics, Rizvi, Rustagi and Siegmund, Eds., Academic Press, New York, 1983, pp. 211–235.
H. J. Kushner and D. S. Clark, Stochastic Approximation Methods for Constrained and Unconstrained Systems, Springer-Verlag, New York, 1978.
P. Mandl, “Estimation and Control in Markov Chains,” Adv. Appl. Prob., vol. 6, 1974, pp. 40–60.
M. Ohnishi, H. Mine and H. Kawai, “An Optimal Inspection and Replacement Policy Under Incomplete State Information: Average Cost Criterion,” in Stochastic Models in Reliability Theory, S. Osaki and Y. Hatoyama, eds., Lecture Notes in Econ. and Math. Systems No. 235, Springer-Verlag, Berlin, 1984, pp. 187–197.
L. K. Platzman, “Optimal Infinite-Horizon Undiscounted Control of Finite Probabilistic Systems,” SIAM J. Control Optim., Vol. 18, 1980, pp. 362–380.
A. Shwartz and A. M. Makowski, “Comparing Policies in Markov Decision Processes: Mandl's Lemma Revisited,” Math. Oper. Res., vol. 15, 1990, pp. 155–174.
C. C. White, “A Markov Quality Control Process Subject to Partial Observation,” Mang. Sci., Vol. 23, 1977, pp. 843–852.
E. Fernández-Gaucherand, A. Arapostathis and S. I. Marcus, “Analysis of an Adaptive Control Scheme for a Partially Observed Controlled Markov Chain,” Department of Systems and Industrial Eng. Working Paper #91-038, University of Arizona, Tucson, Arizona.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag
About this paper
Cite this paper
Fernández-Gaucherand, E., Arapostathis, A., Marcus, S.I. (1992). Adaptive control of a partially observed controlled Markov chain. In: Duncan, T.E., Pasik-Duncan, B. (eds) Stochastic Theory and Adaptive Control. Lecture Notes in Control and Information Sciences, vol 184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0113238
Download citation
DOI: https://doi.org/10.1007/BFb0113238
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55962-7
Online ISBN: 978-3-540-47327-5
eBook Packages: Springer Book Archive