Adaptive control of Markov chains
Consider a controlled Markov chain whose transition probabilities depend upon an unknown parameter α taking values in a finite set A. To each α is associated a prespecified stationary control law ø(α). The adaptive control law selects at each time t the control action indicated by ø(αt) where αt is the maximum likelihood estimate of α. It is shown that αt converges to a parameter α* such that the transition probabilities corresponding to α* and ø(α*) are the same as those corresponding to α0 and ø(α*) where α0 is the true parameter.
Unable to display preview. Download preview PDF.
- P. Mandl, Estimation and control in Markov chains, Adv. Appl. Prob. 6, 40–60, 1974.Google Scholar
- L. Ljung and B. Wittenmark, Asymptotic properties of self-tuning regulators, TFRT-3071, Dept. of Auto. Contr., Lund Institute of Technology, 1974Google Scholar
- M. Loève, Probability Theory, Princeton: Van Nostrand, 1960.Google Scholar