Advertisement

Computing and Using Lower and Upper Bounds for Action Elimination in MDP Planning

  • Ugur Kuter
  • Jiaqiao Hu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4612)

Abstract

We describe a way to improve the performance of MDP planners by modifying them to use lower and upper bounds to eliminate non-optimal actions during their search. First, we discuss a particular state-abstraction formulation of MDP planning problems and how to use that formulation to compute bounds on the Q-functions of those planning problems. Then, we describe how to incorporate those bounds into a large class of MDP planning algorithms to control their search during planning. We provide theorems establishing the correctness of this technique and an experimental evaluation to demonstrate its effectiveness. We incorporated our ideas into two MDP planners: the Real Time Dynamic Programming (RTDP) algorithm [1] and the Adaptive Multi-stage (AMS) sampling algorithm [2], taken respectively from automated planning and operations research communities. Our experiments on an Unmanned Aerial Vehicles (UAVs) path planning problem demonstrate that our action-elimination technique provides significant speed-ups in the performance of both RTDP and AMS.

Keywords

Equivalence Class Planning Problem Unmanned Aerial Vehicle Goal State Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bonet, B., Geffner, H.: Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming. In: ICAPS 2003, pp. 12–21 (2003)Google Scholar
  2. 2.
    Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: An adaptive samping algorithm for solving markov decision processes. Operations Research 53(1), 126–139 (2005)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Bertsekas, D.P., Castañon, D.A.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Trans. on Automatic Control 34(6), 589–598 (1989)zbMATHCrossRefGoogle Scholar
  4. 4.
    Dearden, R., Boutilier, C.: Abstraction and approximate decision-theoretic planning. Artificial Intelligence 89(1-2), 219–283 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Dean, T., Kaelbling, L.P., Kirman, J., Nicholson, A.: Planning under time constraints in stochastic domains. Artificial Intelligence 76(1–2), 35–74 (1995)CrossRefGoogle Scholar
  6. 6.
    Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121(1-2), 49–107 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in markov decision processes. Artificial Intelligence 147(1-2), 163–233 (2003)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Tsitsiklis, J.N., Van Roy, B.: Feature-based methods for large-scale dynamic programming. Machine Learning 22, 59–94 (1996)zbMATHGoogle Scholar
  9. 9.
    de Farias, D.P., Van Roy, B.: The linear programming approach to approximate dynamic programming. Operations Research 51(6), 850–865 (2003)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Trick, M., Zin, S.: Spline approximations to value functions: A linear programming approach. Macroeconomic Dynamics 1, 255–277 (1997)zbMATHCrossRefGoogle Scholar
  11. 11.
    Hansen, E.A., Zilberstein, S.: LAO*: A Heuristic Search Algorithm that Finds Solutions With Loops. Artificial Intelligence 129, 35–62 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    MacQueen, J.: A modified dynamic programming method for markovian decision problems. J. Math. Anal. Appl. 14, 38–43 (1966)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Even-Dar, E., Mannor, S., Mansour, Y.: Action elimination and stopping conditions for reinforcement learning. In: ICML 2003 (2003)Google Scholar
  15. 15.
    Boutilier, C., Dean, T., Hanks, S.: Decision theoretic planning: Structural assumptions and computational leverage. JAIR 11, 1–94 (1999)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley & Sons, Inc., New York (1994)zbMATHGoogle Scholar
  17. 17.
    Jum, M., Andrea, R.D.: Path Planning for Unmanned Aerial Vehicles in Uncertain and Adversarial Environments. In: Cooperative Control: Models, Applications and Algorithms, Kluwer, Dordrecht (2002)Google Scholar
  18. 18.
    Hanks, S., McDermott, D.: Modeling a dynamic and uncertain world I: Symbolic and probabilistic reasoning about change. Technical Report TR-93-06-10, U. of Washington, Dept. of Computer Science and Engineering (1993)Google Scholar
  19. 19.
    Boutilier, C., Dean, T.L., Hanks, S.: Planning under uncertainty: Structural assumptions and computational leverage. In: Ghallab, Milani (eds.) New Directions in AI planning, pp. 157–171. IOS Press, Amsterdam (1996)Google Scholar
  20. 20.
    Berstekas, D.: Dynamic Programming and Optimal Control. Athena Scientific (1995)Google Scholar
  21. 21.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge (1989)Google Scholar
  22. 22.
    Korf, R.E.: Optimal Path Finding Algorithms. Search in AI, pp. 223–267 (1988)Google Scholar
  23. 23.
    Guinchiglia, F., Walsh, T.: A theory of abstraction. Artificial Intelligence 57(2-3), 323–390 (1992)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Ravindran, B., Barto, A.: Smdp homomorphisms: An algebraic approach to abstraction in semi-markov decision processes. In: IJCAI 2003, pp. 1011–1016 (2003)Google Scholar
  25. 25.
    Dean, T., Givan, R., Leach, S.: Model reduction techniques for computing approximately optimal solutions for markov decision processes. In: UAI 1997, pp. 124–131 (1997)Google Scholar
  26. 26.
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR 13, 227–303 (2000)zbMATHMathSciNetGoogle Scholar
  27. 27.
    Li, L., Walsh, T., Littman, M.: Towards a unified theory of state abstraction for mdps. In: AI and Math-06 (2006)Google Scholar
  28. 28.
    Culberson, J.C., Schaeffer, J.: Efficiently searching the 15-puzzle. Technical report, Department of Computer Science, University of Alberta (1994)Google Scholar
  29. 29.
    Korf, R.E.: Finding optimal solutions to rubikös cube using pattern databases. In: AAAI-97, pp. 700–705 (1997)Google Scholar
  30. 30.
    Edelkamp, S.: Planning with pattern databases. In: ECP. Proceedings of the European Conference on Planning, pp. 13–24 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ugur Kuter
    • 1
  • Jiaqiao Hu
    • 2
  1. 1.University of Maryland Institute for Advanced Computer Studies, University of Maryland at College Park, College Park, MD 20742USA
  2. 2.Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11794USA

Personalised recommendations