Advertisement

Autonomous Agents and Multi-Agent Systems

, Volume 33, Issue 5, pp 481–517 | Cite as

Comparative criteria for partially observable contingent planning

  • Dorin ShmaryahuEmail author
  • Guy Shani
  • Jörg Hoffmann
Article
  • 57 Downloads

Abstract

In contingent planning under partial observability with sensing actions, agents actively use sensing to discover meaningful facts about the world. The solution can be represented as a plan tree or graph, branching on various possible observations. Typically in contingent planning one seeks a satisfying plan leading to a goal state at each leaf. In many applications, however, one may prefer some satisfying plans to others, such as plans that lead to the goal with a lower average cost. However, methods such as average cost make an implicit assumption concerning the probabilities of outcomes, which may not apply when the stochastic dynamics of the environment are unknown. We focus on the problem of providing valid comparative criteria for contingent plan trees and graphs, allowing us to compare two plans and decide which one is preferable. We suggest a set of such comparison criteria—plan simplicity, dominance, and best and worst plan costs.We also argue that in some cases certain branches of the plan correspond to an unlikely combination of mishaps, and can be ignored, and provide methods for pruning such unlikely branches before comparing the plan graphs. We explain these criteria, and discuss their validity, correlations, and application to real world problems. We also suggest efficient algorithms for computing the comparative criteria where needed. We provide experimental results, showing that existing contingent planners provide diverse plans, that can be compared using these criteria.

Keywords

Planning Contingent planning Comparative Criteria Plan tree Partial observability 

Notes

Acknowledgements

This work was supported by ISF Grant 933/13, and by the Israeli Cyber Center.

References

  1. 1.
    Albore, A., Palacios, H., & Geffner, H. (2009). A translation-based approach to contingent planning. In Proceedings of the twenty-first international joint conference on artificial intelligence (pp. 1623–1628).Google Scholar
  2. 2.
    Ashkenazi, M., Bar-Sinai, M., & Brafman, R. (2016). Planning and monitoring with performance level profiles. In Planning and robotics workshop (PlanRob), ICAPS 2016.Google Scholar
  3. 3.
    Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In Proceedings, international conference on Software Maintenance, 1998 (pp 368–377). IEEE.Google Scholar
  4. 4.
    Bonet, B., & Geffner, H. (2000). Planning with incomplete information as heuristic search in belief space. In Proceedings of the Fifth international conference on artificial intelligence planning systems, Breckenridge, CO, USA, April 14–17, 2000 (pp. 52–61).Google Scholar
  5. 5.
    Bonet, B., & Geffner, H. (2009). Solving pomdps: RTDP-Bel versus point-based algorithms. In IJCAI (pp 1641–1646).Google Scholar
  6. 6.
    Bonet, B., & Geffner, H. (2011). Planning under partial observability by classical replanning: Theory and experiments. In IJCAI (pp. 1936–1941).Google Scholar
  7. 7.
    Bonet, B., & Geffner, H. (2014). Belief tracking for planning with sensing: Width, complexity and approximations. Journal of Artificial Intelligence Research, 50, 923–970.MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Brafman, R. I., & Shani, G. (2012). A multi-path compilation approach to contingent planning. In Proceedings of the twenty-sixth AAAI conference on artificial intelligence.Google Scholar
  9. 9.
    Brafman, R., & Shani, G. (2014). On the properties of belief tracking for online contingent planning using regression. In ECAI 2014–21st European conference on artificial intelligence (pp. 147–152).Google Scholar
  10. 10.
    Brafman, R. I., & Shani, G. (2012). Replanning in domains with partial information and sensing actions. Journal of Artificial Intelligence Research (JAIR), 45, 565–600.MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Braziunas, D., & Boutilier, C. (2010). Assessing regret-based preference elicitation with the utpref recommendation system. In Proceedings of the 11th ACM conference on electronic commerce (pp. 219–228). ACM.Google Scholar
  12. 12.
    Bryce, D., Kambhampati, S., & Smith, D. E. (2006). Planning graph heuristics for belief space search. Journal of Artificial Intelligence Research, 26, 35–99.CrossRefzbMATHGoogle Scholar
  13. 13.
    Bryce, D., Kambhampati, S., & Smith, D. E. (2006). Planning graph heuristics for belief space search. Journal of Artificial Intelligence Research., 26, 35–99.CrossRefzbMATHGoogle Scholar
  14. 14.
    Domshlak, C. (2013). Fault tolerant planning: Complexity and compilation. In ICAPS.Google Scholar
  15. 15.
    Finzi, A., & Orlandini, A. (2005). Human-robot interaction through mixed-initiative planning for rescue and search rovers. In AI*IA 2005 (pp. 483–494).Google Scholar
  16. 16.
    Garbarino, E. C., & Edell, J. A. (1997). Cognitive effort, affect, and choice. Journal of Consumer Research, 24(2), 147–158.CrossRefGoogle Scholar
  17. 17.
    Ghallab, M., Nau, D., & Traverso, P. (2016). Automated planning and acting. Cambridge: Cambridge University Press.zbMATHGoogle Scholar
  18. 18.
    Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research, 26, 191–246.CrossRefzbMATHGoogle Scholar
  19. 19.
    Hoffmann, J. (2015). Simulated penetration testing: From “Dijkstra” to “Turing Test++”. In Proceedings of the 25th international conference on automated planning and scheduling, ICAPS (pp. 364–372).Google Scholar
  20. 20.
    Hoffmann, J., & Brafman, R. (2005). Contingent planning via heuristic forward search with implicit belief states. In Proc. ICAPS, Vol. 2005.Google Scholar
  21. 21.
    Hoffmann, J., & Nebel, B. (2001). The FF planning system: Fast plan generation through heuristic search. JAIR, 14, 253–302.CrossRefzbMATHGoogle Scholar
  22. 22.
    International planning competition 2014. https://helios.hud.ac.uk/scommv/IPC-14/domains_sequential.html.
  23. 23.
    Komarnitsky, R., & Shani, G. (2014). Computing contingent plans using online replanning. In Proceedings of the Twenty-Eighth AAAI conference on artificial intelligence, July 27–31, 2014, Québec City, Québec, Canada (pp. 2322–2329).Google Scholar
  24. 24.
    Komarnitsky, R., & Shani, G. (2016). Computing contingent plans using online replanning. In Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA (pp. 3159–3165).Google Scholar
  25. 25.
    Kupcsik, A., Deisenroth, M. P., Peters, J., Loh, A. P., Vadakkepat, P., & Neumann, G. (2017). Model-based contextual policy search for data-efficient generalization of robot skills. Artificial Intelligence, 247, 415–439.MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Likhachev, M., & Stentz, A. (2009). Probabilistic planning with clear preferences on missing information. Artificial Intelligence, 173(5–6), 696–721.MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Louridas, P. (2006). Static code analysis. IEEE Software, 23(4), 58–61.CrossRefGoogle Scholar
  28. 28.
    Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine learning, 22(1–3), 159–195.zbMATHGoogle Scholar
  29. 29.
    Mahler, J., & Goldberg, K. (2017). Learning deep policies for robot bin picking by simulating robust grasping sequences. In Conference on robot learning (pp. 515–524).Google Scholar
  30. 30.
    Maliah, S., Brafman, R. I., Karpas, E., & Shani, G. (2014). Partially observable online contingent planning using landmark heuristics. In Proceedings of the twenty-fourth international conference on automated planning and scheduling, ICAPS 2014, Portsmouth, New Hampshire, USA, June 21–26, 2014.Google Scholar
  31. 31.
    Mastrogiovanni, F., Sgorbissa, A., & Zaccaria, R. (2009). Robust navigation in an unknown environment with minimal sensing and representation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(1), 212–229.CrossRefGoogle Scholar
  32. 32.
    Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 427–436). Morgan Kaufmann Publishers Inc.Google Scholar
  33. 33.
    Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (2013). Machine learning: An artificial intelligence approach. Berlin: Springer Science & Business Media.zbMATHGoogle Scholar
  34. 34.
    Mirsky, R, Gal, Y. K., Stern, R., Kalech, M. (2016). Sequential plan recognition. In Proceedings of the 2016 international conference on autonomous agents & multiagent systems (pp. 1347–1348). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
  35. 35.
    Muise, C. J., Belle, V., & McIlraith, S. A. (2014). Computing contingent plans via fully observable non-deterministic planning. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27–31, 2014, Québec City, Québec, Canada (pp. 2322–2329).Google Scholar
  36. 36.
    Muise, C. J., Belle, V., & McIlraith, S. A. (2014). Computing contingent plans via fully observable non-deterministic planning. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence.Google Scholar
  37. 37.
    Muise, C. J., McIlraith, S. A., & Christopher Beck, J. (2012). Improved non-deterministic planning by exploiting state relevance. In Proceedings of the twenty-second international conference on automated planning and scheduling, ICAPS.Google Scholar
  38. 38.
    O’Kane, J. M., & LaValle, S. M. (2008). Comparing the power of robots. The International Journal of Robotics Research, 27(1), 5–23.CrossRefGoogle Scholar
  39. 39.
    Palacios, H., & Geffner, H. (2007). From conformant into classical planning: Efficient translations that may be complete too. In ICAPS (pp. 264–271).Google Scholar
  40. 40.
    Poupart, P., & Boutilier, C. (2004). Bounded finite state controllers. In Advances in neural information processing systems (pp. 823–830).Google Scholar
  41. 41.
    Poupart, P., Boutilier, C., Schuurmans, D., & Patrascu, R. (2002). Piecewise linear value function approximation for factored mdps. In Proceedings of the eighteenth national conference on artificial intelligence (AAAI02), Edmonton.Google Scholar
  42. 42.
    Raghavan, S., Rohana, R., Leon, D., Podgurski, A., & Augustine, V. (2004). Dex: A semantic-graph differencing tool for studying changes in large code bases. In Proceedings 20th IEEE international conference on software maintenance, 2004 (pp. 188–197). IEEE.Google Scholar
  43. 43.
    Shani, G., & Brafman, R. I. (2011). Replanning in domains with partial information and sensing actions. In IJCAI (pp. 2021–2026).Google Scholar
  44. 44.
    Shani, G., & Meek, Cr. (2009). Improving existing fault recovery policies. In Advances in neural information processing systems (NIPS) (pp. 1642–1650).Google Scholar
  45. 45.
    Shani, G., Heckerman, D., & Brafman, R. I. (2005). An MDP-based recommender system. Journal of Machine Learning Research, 6(Sep), 1265–1295.MathSciNetzbMATHGoogle Scholar
  46. 46.
    Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems, 27(1), 1–51.CrossRefGoogle Scholar
  47. 47.
    Shmaryahu, D., Hoffmann, J., Shani, G., & Steinmetz, M. (2016). Constructing plan trees for simulated penetration testing. In Proceedings of the scheduling and planning applications woRKshop (SPARK), ICAPS 2016.Google Scholar
  48. 48.
    Shmaryahu, D., Shani, G., Hoffmann, J., & Steinmetz, M. (2016). Constructing plan trees for simulated penetration testing.Google Scholar
  49. 49.
    Siepmann, F., Ziegler, L., Kortkamp, M., & Wachsmuth, S. (2014). Deploying a modeling framework for reusable robot behavior to enable informed strategies for domestic service robots. Robotics and Autonomous Systems, 62(5), 619–631.CrossRefGoogle Scholar
  50. 50.
    Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. In Proceedings of the 20th conference on Uncertainty in artificial intelligence (pp. 520–527). AUAI Press.Google Scholar
  51. 51.
    Son, J.-W., Park, S.-B., & Park, S.-Y. (2006). Program plagiarism detection using parse tree kernels. In Pacific Rim international conference on artificial intelligence (pp. 1000–1004). Springer.Google Scholar
  52. 52.
    Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.zbMATHGoogle Scholar
  53. 53.
    Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. MIT press.Google Scholar
  54. 54.
    Vidal, V., & Geffner, H. (2006). Branching and pruning: An optimal temporal pocl planner based on constraint programming. Artificial Intelligence, 170(3), 298–335.MathSciNetCrossRefzbMATHGoogle Scholar
  55. 55.
    Yang, J., Zhihua, Q., Wang, J., & Conrad, K. (2010). Comparison of optimal solutions to real-time path planning for a mobile vehicle. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40(4), 721–731.CrossRefGoogle Scholar
  56. 56.
    Yoon, S. W., Fern, A., & Givan, R. (2007). FF-Replan: A baseline for probabilistic planning. In ICAPS.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Ben Gurion University of the NegevBeershebaIsrael
  2. 2.Saarland UniversitySaarbrückenGermany

Personalised recommendations