Advertisement

Lazy Fully Probabilistic Design of Decision Strategies

  • Miroslav Kárný
  • Karel Macek
  • Tatiana V. Guy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8866)

Abstract

Fully probabilistic design of decision strategies (FPD) extends Bayesian dynamic decision making. The FPD specifies the decision aim via so-called ideal - a probability density, which assigns high probability values to the desirable behaviours and low values to undesirable ones. The optimal decision strategy minimises the Kullback-Leibler divergence of the probability density describing the closed-loop behaviour to this ideal. In spite of the availability of explicit minimisers in the corresponding dynamic programming, it suffers from the curse of dimensionality connected with complexity of the value function. Recently proposed a lazy FPD tailors lazy learning, which builds a local model around the current behaviour, to estimation of the closed-loop model with the optimal strategy. This paper adds a theoretical support to the lazy FPD and outlines its further improvement.

Keywords

Decision making Lazy learning Bayesian learning Local model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bellman, R.: Adaptive Control Processes. Princeton U. Press, NJ (1961)zbMATHGoogle Scholar
  2. 2.
    Berec, L., Kárný, M.: Identification of reality in Bayesian context. In: Warwick, K., Kárný, M. (eds.) Computer-Intensive Methods in Control and Signal Processing, pp. 181–193. Birkhäuser (1997)Google Scholar
  3. 3.
    Berger, J.: Statistical Decision Theory and Bayesian Analysis. Springer, New York (1985)CrossRefzbMATHGoogle Scholar
  4. 4.
    Bertsekas, D.: Dynamic Programming and Optimal Control. Athena Scientific, US (2001)zbMATHGoogle Scholar
  5. 5.
    Bontempi, G., Birattari, M., Bersini, H.: Lazy learning for local modelling & control design. Int. J. of Control 72(7–8), 643–658 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Cappe, O., Godsill, S., Moulines, E.: An overview of existing methods and recent advances in sequential monte carlo. Proc. of the IEEE 95(5), 899–924 (2007)CrossRefGoogle Scholar
  7. 7.
    Daum, F.: Nonlinear filters: beyond the kalman filter. IEEE Aerospace and Electronic Systems Magazine 20(8), 57–69 (2005)CrossRefGoogle Scholar
  8. 8.
    Doucet, A., Johansen, A.: A tutorial on particle filtering and smoothing: Fifteen years later. In: Handbook of Nonlinear Filtering. Oxford University Press, Oxford (2011)Google Scholar
  9. 9.
    Feldbaum, A.: Theory of dual control. Autom. Remote Control 21(9) (1960)Google Scholar
  10. 10.
    Gilboa, I., Schmeidler, D.: Case-based decsion theory. The Quaterly Journal of Economics 110, 605–639 (1995)CrossRefzbMATHGoogle Scholar
  11. 11.
    Guan, P., Raginsky, M., Willett, R.: Online Markov decision processes with Kull-back Leibler control cost. IEEE Trans. on Automatic, Control (2014)Google Scholar
  12. 12.
    Kárný, M.: Towards fully probabilistic control design. Automatica 32(12), 1719–1722 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Kárný, M.: Adaptive systems: Local approximators? In: Workshop n Adaptive Systems in Control and Signal Processing, pp. 129–134. IFAC, Glasgow (1998)Google Scholar
  14. 14.
    Kárný, M.: On approximate fully probabilistic design of decision making strategies. In: Guy, T., Kárný, M. (eds.) Proceedings of the 3rd International Workshop on Scalable Decision Making, ECML/PKDD 2013. UTIA AV ČR, Prague (2013) iSBN 978-80-903834-8-7Google Scholar
  15. 15.
    Kárný, M.: Approximate bayesian recursive estimation. Information Sciences (2014), doi: 10.1016/j.ins.2014.01.048Google Scholar
  16. 16.
    Kárný, M., Guy, T.V.: Fully probabilistic control design. Systems & Control Letters 55(4), 259–265 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Kárný, M., Kroupa, T.: Axiomatisation of fully probabilistic design. Information Sciences 186(1), 105–113 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Kulhavý, R., Zarrop, M.B.: On a general concept of forgetting. Int. J. of Control 58(4), 905–924 (1993)CrossRefzbMATHGoogle Scholar
  19. 19.
    Kullback, S., Leibler, R.: On information and sufficiency. Annals of Mathematical Statistics 22, 79–87 (1951)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Li, J., Dong, G., Ramamohanarao, K., Wong, L.: Deeps: a new instance-based lazy discovery and classification system. Machine Learning 54(2), 99–124 (2004)CrossRefzbMATHGoogle Scholar
  21. 21.
    Loeve, M.: Probability Theory. van Nostrand, Princeton, New Jersey (1962) (Russian translation, Moscow 1962)Google Scholar
  22. 22.
    Macek, K., Guy, T., Kárný, M.: A lazy-learning concept of fully probabilistic decision making (2014) (unpublished manuscript)Google Scholar
  23. 23.
    Martín-Sánchez, J., Lemos, J., Rodellar, J.: Survey of industrial optimized adaptive control. Int. J. of Adaptive Control and Signal Processing 26(10), 881–918 (2013).Google Scholar
  24. 24.
    Peterka, V.: Bayesian system identification. In: Eykhoff, P. (ed.) Trends and Progress in System Identification, pp. 239–304. Pergamon Press, Oxford (1981)CrossRefGoogle Scholar
  25. 25.
    Qin, S., Badgwell, T.: A survey of industrial model predictive control technology. Control Engineering Practice 11(7), 733–764 (2003)CrossRefGoogle Scholar
  26. 26.
    Rao, M.: Measure Theory and Integration. John Wiley, NY (1987)zbMATHGoogle Scholar
  27. 27.
    Roll, J., Nazin, A., Ljung, L.: Nonlinear system identification via direct weight optimization. Automatica 41(3), 475–490 (2004)CrossRefMathSciNetGoogle Scholar
  28. 28.
    Sanov, I.: On probability of large deviations of random variables. Matematičeskij Sbornik 42, 11–44 (in russian), also in selected translations mathematical statistics and probability. I 1961, 213–244 (1957)Google Scholar
  29. 29.
    Savage, L.: Foundations of Statistics. Wiley, NY (1954)zbMATHGoogle Scholar
  30. 30.
    Schon, T., Gustafsson, F., Nordlund, P.: Marginalized particle filters for mixed linear/nonlinear state-space models. IEEE Tran. on Signal Processing 53(7), 2279–2289 (2005)CrossRefMathSciNetGoogle Scholar
  31. 31.
    Si, J., Barto, A., Powell, W., Wunsch, D. (eds.): Handbook of Learning and Approximate Dynamic Programming. Wiley-IEEE Press, Danvers (2004)Google Scholar
  32. 32.
    Tishby, N., Polani, D.: Information theory of decisions and actions. In: Cutsuridis, V., Hussain, A., Taylor, J. (eds.) Perception-Action Cycle. Springer Series in Cognitive and Neural Systems, pp. 601–636. Springer, New York (2011)CrossRefGoogle Scholar
  33. 33.
    Todorov, E.: Linearly-solvable Markov decision problems. In: Schölkopf, B., et al. (eds.) Advances in Neural Inf. Processing, pp. 1369–1376. MIT Press, NY (2006)Google Scholar
  34. 34.
    Zhu, C., Zhu, W.: Feedback control of nonlinear stochastic systems for targeting a specified stationary probability density. Automatica 47(3), 539–544 (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Miroslav Kárný
    • 1
  • Karel Macek
    • 1
  • Tatiana V. Guy
    • 1
  1. 1.Institute of Information Theory and AutomationAcademy of Sciences of the Czech RepublicPrague 8Czech Republic

Personalised recommendations