Skip to main content
Log in

Comparative criteria for partially observable contingent planning

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

In contingent planning under partial observability with sensing actions, agents actively use sensing to discover meaningful facts about the world. The solution can be represented as a plan tree or graph, branching on various possible observations. Typically in contingent planning one seeks a satisfying plan leading to a goal state at each leaf. In many applications, however, one may prefer some satisfying plans to others, such as plans that lead to the goal with a lower average cost. However, methods such as average cost make an implicit assumption concerning the probabilities of outcomes, which may not apply when the stochastic dynamics of the environment are unknown. We focus on the problem of providing valid comparative criteria for contingent plan trees and graphs, allowing us to compare two plans and decide which one is preferable. We suggest a set of such comparison criteria—plan simplicity, dominance, and best and worst plan costs.We also argue that in some cases certain branches of the plan correspond to an unlikely combination of mishaps, and can be ignored, and provide methods for pruning such unlikely branches before comparing the plan graphs. We explain these criteria, and discuss their validity, correlations, and application to real world problems. We also suggest efficient algorithms for computing the comparative criteria where needed. We provide experimental results, showing that existing contingent planners provide diverse plans, that can be compared using these criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Two obvious omissions are PO-PRP [35], and perhaps POND [13], both of which are not currently available for use.

References

  1. Albore, A., Palacios, H., & Geffner, H. (2009). A translation-based approach to contingent planning. In Proceedings of the twenty-first international joint conference on artificial intelligence (pp. 1623–1628).

  2. Ashkenazi, M., Bar-Sinai, M., & Brafman, R. (2016). Planning and monitoring with performance level profiles. In Planning and robotics workshop (PlanRob), ICAPS 2016.

  3. Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In Proceedings, international conference on Software Maintenance, 1998 (pp 368–377). IEEE.

  4. Bonet, B., & Geffner, H. (2000). Planning with incomplete information as heuristic search in belief space. In Proceedings of the Fifth international conference on artificial intelligence planning systems, Breckenridge, CO, USA, April 14–17, 2000 (pp. 52–61).

  5. Bonet, B., & Geffner, H. (2009). Solving pomdps: RTDP-Bel versus point-based algorithms. In IJCAI (pp 1641–1646).

  6. Bonet, B., & Geffner, H. (2011). Planning under partial observability by classical replanning: Theory and experiments. In IJCAI (pp. 1936–1941).

  7. Bonet, B., & Geffner, H. (2014). Belief tracking for planning with sensing: Width, complexity and approximations. Journal of Artificial Intelligence Research, 50, 923–970.

    Article  MathSciNet  MATH  Google Scholar 

  8. Brafman, R. I., & Shani, G. (2012). A multi-path compilation approach to contingent planning. In Proceedings of the twenty-sixth AAAI conference on artificial intelligence.

  9. Brafman, R., & Shani, G. (2014). On the properties of belief tracking for online contingent planning using regression. In ECAI 2014–21st European conference on artificial intelligence (pp. 147–152).

  10. Brafman, R. I., & Shani, G. (2012). Replanning in domains with partial information and sensing actions. Journal of Artificial Intelligence Research (JAIR), 45, 565–600.

    Article  MathSciNet  MATH  Google Scholar 

  11. Braziunas, D., & Boutilier, C. (2010). Assessing regret-based preference elicitation with the utpref recommendation system. In Proceedings of the 11th ACM conference on electronic commerce (pp. 219–228). ACM.

  12. Bryce, D., Kambhampati, S., & Smith, D. E. (2006). Planning graph heuristics for belief space search. Journal of Artificial Intelligence Research, 26, 35–99.

    Article  MATH  Google Scholar 

  13. Bryce, D., Kambhampati, S., & Smith, D. E. (2006). Planning graph heuristics for belief space search. Journal of Artificial Intelligence Research., 26, 35–99.

    Article  MATH  Google Scholar 

  14. Domshlak, C. (2013). Fault tolerant planning: Complexity and compilation. In ICAPS.

  15. Finzi, A., & Orlandini, A. (2005). Human-robot interaction through mixed-initiative planning for rescue and search rovers. In AI*IA 2005 (pp. 483–494).

  16. Garbarino, E. C., & Edell, J. A. (1997). Cognitive effort, affect, and choice. Journal of Consumer Research, 24(2), 147–158.

    Article  Google Scholar 

  17. Ghallab, M., Nau, D., & Traverso, P. (2016). Automated planning and acting. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  18. Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research, 26, 191–246.

    Article  MATH  Google Scholar 

  19. Hoffmann, J. (2015). Simulated penetration testing: From “Dijkstra” to “Turing Test++”. In Proceedings of the 25th international conference on automated planning and scheduling, ICAPS (pp. 364–372).

  20. Hoffmann, J., & Brafman, R. (2005). Contingent planning via heuristic forward search with implicit belief states. In Proc. ICAPS, Vol. 2005.

  21. Hoffmann, J., & Nebel, B. (2001). The FF planning system: Fast plan generation through heuristic search. JAIR, 14, 253–302.

    Article  MATH  Google Scholar 

  22. International planning competition 2014. https://helios.hud.ac.uk/scommv/IPC-14/domains_sequential.html.

  23. Komarnitsky, R., & Shani, G. (2014). Computing contingent plans using online replanning. In Proceedings of the Twenty-Eighth AAAI conference on artificial intelligence, July 27–31, 2014, Québec City, Québec, Canada (pp. 2322–2329).

  24. Komarnitsky, R., & Shani, G. (2016). Computing contingent plans using online replanning. In Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA (pp. 3159–3165).

  25. Kupcsik, A., Deisenroth, M. P., Peters, J., Loh, A. P., Vadakkepat, P., & Neumann, G. (2017). Model-based contextual policy search for data-efficient generalization of robot skills. Artificial Intelligence, 247, 415–439.

    Article  MathSciNet  MATH  Google Scholar 

  26. Likhachev, M., & Stentz, A. (2009). Probabilistic planning with clear preferences on missing information. Artificial Intelligence, 173(5–6), 696–721.

    Article  MathSciNet  MATH  Google Scholar 

  27. Louridas, P. (2006). Static code analysis. IEEE Software, 23(4), 58–61.

    Article  Google Scholar 

  28. Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine learning, 22(1–3), 159–195.

    MATH  Google Scholar 

  29. Mahler, J., & Goldberg, K. (2017). Learning deep policies for robot bin picking by simulating robust grasping sequences. In Conference on robot learning (pp. 515–524).

  30. Maliah, S., Brafman, R. I., Karpas, E., & Shani, G. (2014). Partially observable online contingent planning using landmark heuristics. In Proceedings of the twenty-fourth international conference on automated planning and scheduling, ICAPS 2014, Portsmouth, New Hampshire, USA, June 21–26, 2014.

  31. Mastrogiovanni, F., Sgorbissa, A., & Zaccaria, R. (2009). Robust navigation in an unknown environment with minimal sensing and representation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(1), 212–229.

    Article  Google Scholar 

  32. Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 427–436). Morgan Kaufmann Publishers Inc.

  33. Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (2013). Machine learning: An artificial intelligence approach. Berlin: Springer Science & Business Media.

    MATH  Google Scholar 

  34. Mirsky, R, Gal, Y. K., Stern, R., Kalech, M. (2016). Sequential plan recognition. In Proceedings of the 2016 international conference on autonomous agents & multiagent systems (pp. 1347–1348). International Foundation for Autonomous Agents and Multiagent Systems.

  35. Muise, C. J., Belle, V., & McIlraith, S. A. (2014). Computing contingent plans via fully observable non-deterministic planning. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27–31, 2014, Québec City, Québec, Canada (pp. 2322–2329).

  36. Muise, C. J., Belle, V., & McIlraith, S. A. (2014). Computing contingent plans via fully observable non-deterministic planning. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence.

  37. Muise, C. J., McIlraith, S. A., & Christopher Beck, J. (2012). Improved non-deterministic planning by exploiting state relevance. In Proceedings of the twenty-second international conference on automated planning and scheduling, ICAPS.

  38. O’Kane, J. M., & LaValle, S. M. (2008). Comparing the power of robots. The International Journal of Robotics Research, 27(1), 5–23.

    Article  Google Scholar 

  39. Palacios, H., & Geffner, H. (2007). From conformant into classical planning: Efficient translations that may be complete too. In ICAPS (pp. 264–271).

  40. Poupart, P., & Boutilier, C. (2004). Bounded finite state controllers. In Advances in neural information processing systems (pp. 823–830).

  41. Poupart, P., Boutilier, C., Schuurmans, D., & Patrascu, R. (2002). Piecewise linear value function approximation for factored mdps. In Proceedings of the eighteenth national conference on artificial intelligence (AAAI02), Edmonton.

  42. Raghavan, S., Rohana, R., Leon, D., Podgurski, A., & Augustine, V. (2004). Dex: A semantic-graph differencing tool for studying changes in large code bases. In Proceedings 20th IEEE international conference on software maintenance, 2004 (pp. 188–197). IEEE.

  43. Shani, G., & Brafman, R. I. (2011). Replanning in domains with partial information and sensing actions. In IJCAI (pp. 2021–2026).

  44. Shani, G., & Meek, Cr. (2009). Improving existing fault recovery policies. In Advances in neural information processing systems (NIPS) (pp. 1642–1650).

  45. Shani, G., Heckerman, D., & Brafman, R. I. (2005). An MDP-based recommender system. Journal of Machine Learning Research, 6(Sep), 1265–1295.

    MathSciNet  MATH  Google Scholar 

  46. Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems, 27(1), 1–51.

    Article  Google Scholar 

  47. Shmaryahu, D., Hoffmann, J., Shani, G., & Steinmetz, M. (2016). Constructing plan trees for simulated penetration testing. In Proceedings of the scheduling and planning applications woRKshop (SPARK), ICAPS 2016.

  48. Shmaryahu, D., Shani, G., Hoffmann, J., & Steinmetz, M. (2016). Constructing plan trees for simulated penetration testing.

  49. Siepmann, F., Ziegler, L., Kortkamp, M., & Wachsmuth, S. (2014). Deploying a modeling framework for reusable robot behavior to enable informed strategies for domestic service robots. Robotics and Autonomous Systems, 62(5), 619–631.

    Article  Google Scholar 

  50. Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. In Proceedings of the 20th conference on Uncertainty in artificial intelligence (pp. 520–527). AUAI Press.

  51. Son, J.-W., Park, S.-B., & Park, S.-Y. (2006). Program plagiarism detection using parse tree kernels. In Pacific Rim international conference on artificial intelligence (pp. 1000–1004). Springer.

  52. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.

    MATH  Google Scholar 

  53. Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. MIT press.

  54. Vidal, V., & Geffner, H. (2006). Branching and pruning: An optimal temporal pocl planner based on constraint programming. Artificial Intelligence, 170(3), 298–335.

    Article  MathSciNet  MATH  Google Scholar 

  55. Yang, J., Zhihua, Q., Wang, J., & Conrad, K. (2010). Comparison of optimal solutions to real-time path planning for a mobile vehicle. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40(4), 721–731.

    Article  Google Scholar 

  56. Yoon, S. W., Fern, A., & Givan, R. (2007). FF-Replan: A baseline for probabilistic planning. In ICAPS.

Download references

Acknowledgements

This work was supported by ISF Grant 933/13, and by the Israeli Cyber Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dorin Shmaryahu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Parts of this paper appeared in [48]

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shmaryahu, D., Shani, G. & Hoffmann, J. Comparative criteria for partially observable contingent planning. Auton Agent Multi-Agent Syst 33, 481–517 (2019). https://doi.org/10.1007/s10458-019-09406-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-019-09406-0

Keywords

Navigation