Advertisement

Markov Decision Processes

  • Luis Enrique SucarEmail author
Chapter
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

This chapter introduces sequential decision problems, in particular Markov decision processes (MDPs). A formal definition of an MDP is given, and the two most common solution techniques are described: value iteration and policy iteration. Then, factored MDPs are described, which provide a representation based on graphical models to solve very large MDPs. An introduction to partially observable MDPs (POMDPs) is also included. The chapter concludes by describing two applications of MDPs: power plant control and service robot task coordination.

Keywords

Markov Decision Process Reward Function Service Robot Policy Iteration Belief Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Avilés-Arriaga, H.H., Sucar, L.E., Morales, E.F., Vargas, B.A., Corona, E.: Markovito: a flexible and general service robot. In: Liu, D., Wang, L., Tan, K.C. (eds.) Computational Intelligence in Autonomous Robotic Systems, pp. 401–423. Springer, Berlin (2009)Google Scholar
  2. 2.
    Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)zbMATHGoogle Scholar
  3. 3.
    Corona, E., Morales, E.F., Sucar, L.E.: Solving policy conflicts in concurrent Markov decision processes. In: ICAPS Workshop on Planning and Scheduling Under Uncertainty, Association for the Advancement of Artificial Intelligence (2010)Google Scholar
  4. 4.
    Corona, E., Sucar, L.E.: Task coordination for service robots based on multiple Markov decision processes. In: Sucar, L.E., Hoey, J., Morales, E. (eds.) Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions. IGI Global, Hershey (2011)Google Scholar
  5. 5.
    Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 106–111 (1997)Google Scholar
  6. 6.
    Dietterich, T.: Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Elinas, P., Sucar, L., Reyes, A., Hoey, J.: A decision theoretic approach for task coordination in social robots. In: Proceedings of the IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN), pp. 679–684 (2004)Google Scholar
  8. 8.
    Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using decision diagrams. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI), pp. 279–288 (1999)Google Scholar
  9. 9.
    Li, L., Walsh, T.J., Littman, M.L.: Towards a unified theory of state abstraction for MDPs. In: Proceedings of the Nineth International Symposium on Artificial Intelligence and Mathematics, pp. 21–30 (2006)Google Scholar
  10. 10.
    Meuleau, N., Hauskrecht, M., Kim, K.E., Peshkin, L., Kaelbling, L.P., Dean, T., Boutilier, C.: Solving very large weakly coupled Markov decision processes. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), pp. 165–172 (1998)Google Scholar
  11. 11.
    Parr, R., Russell, S. J.: Reinforcement learning with hierarchies of machines. In: Proceeding of the Advances in Neural Information Processing Systems (NIPS) (1997)Google Scholar
  12. 12.
    Poupart, P.: An introduction to fully and partially observable Markov decision processes. In: Sucar, L.E., Hoey, J., Morales, E. (eds.) Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions. IGI Global, Hershey (2011)Google Scholar
  13. 13.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)zbMATHCrossRefGoogle Scholar
  14. 14.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  15. 15.
    Reyes, A., Sucar, L.E., Morales, E.F., Ibargüngoytia, P.: Hybrid Markov Decision Processes. Lecture Notes in Computer Science, vol. 4293. Springer, Berlin (2006)Google Scholar
  16. 16.
    Reyes, A., Sucar, L.E., Morales, E.F.: AsistO: a qualitative MDP-based recommender system for power plant operation. Computacion y Sistemas 13(1), 5–20 (2009)Google Scholar
  17. 17.
    Ross, S., Pineau, J., Paquet, S., Chaib-draa, B.: Online planning algorithms for POMDPs. J. Artif. Intell. Res. 32, 663–704 (2008)zbMATHMathSciNetGoogle Scholar
  18. 18.
    Sucar, L.E., Hoey, J., Morales, E.: Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions. IGI Global, Hershey (2011)Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE)Santa María TonantzintlaMexico

Personalised recommendations