Comparative criteria for partially observable contingent planning

Shmaryahu, Dorin; Shani, Guy; Hoffmann, Jörg

doi:10.1007/s10458-019-09406-0

Comparative criteria for partially observable contingent planning

Published: 09 May 2019

Volume 33, pages 481–517, (2019)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

368 Accesses
3 Citations
Explore all metrics

Abstract

In contingent planning under partial observability with sensing actions, agents actively use sensing to discover meaningful facts about the world. The solution can be represented as a plan tree or graph, branching on various possible observations. Typically in contingent planning one seeks a satisfying plan leading to a goal state at each leaf. In many applications, however, one may prefer some satisfying plans to others, such as plans that lead to the goal with a lower average cost. However, methods such as average cost make an implicit assumption concerning the probabilities of outcomes, which may not apply when the stochastic dynamics of the environment are unknown. We focus on the problem of providing valid comparative criteria for contingent plan trees and graphs, allowing us to compare two plans and decide which one is preferable. We suggest a set of such comparison criteria—plan simplicity, dominance, and best and worst plan costs.We also argue that in some cases certain branches of the plan correspond to an unlikely combination of mishaps, and can be ignored, and provide methods for pruning such unlikely branches before comparing the plan graphs. We explain these criteria, and discuss their validity, correlations, and application to real world problems. We also suggest efficient algorithms for computing the comparative criteria where needed. We provide experimental results, showing that existing contingent planners provide diverse plans, that can be compared using these criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Game-theoretic multi-agent motion planning in a mixed environment

Article 15 March 2024

Classification of global catastrophic risks connected with artificial intelligence

Article 03 May 2018

Notes

Two obvious omissions are PO-PRP [35], and perhaps POND [13], both of which are not currently available for use.

References

Albore, A., Palacios, H., & Geffner, H. (2009). A translation-based approach to contingent planning. In Proceedings of the twenty-first international joint conference on artificial intelligence (pp. 1623–1628).
Ashkenazi, M., Bar-Sinai, M., & Brafman, R. (2016). Planning and monitoring with performance level profiles. In Planning and robotics workshop (PlanRob), ICAPS 2016.
Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., & Bier, L. (1998). Clone detection using abstract syntax trees. In Proceedings, international conference on Software Maintenance, 1998 (pp 368–377). IEEE.
Bonet, B., & Geffner, H. (2000). Planning with incomplete information as heuristic search in belief space. In Proceedings of the Fifth international conference on artificial intelligence planning systems, Breckenridge, CO, USA, April 14–17, 2000 (pp. 52–61).
Bonet, B., & Geffner, H. (2009). Solving pomdps: RTDP-Bel versus point-based algorithms. In IJCAI (pp 1641–1646).
Bonet, B., & Geffner, H. (2011). Planning under partial observability by classical replanning: Theory and experiments. In IJCAI (pp. 1936–1941).
Bonet, B., & Geffner, H. (2014). Belief tracking for planning with sensing: Width, complexity and approximations. Journal of Artificial Intelligence Research, 50, 923–970.
Article MathSciNet MATH Google Scholar
Brafman, R. I., & Shani, G. (2012). A multi-path compilation approach to contingent planning. In Proceedings of the twenty-sixth AAAI conference on artificial intelligence.
Brafman, R., & Shani, G. (2014). On the properties of belief tracking for online contingent planning using regression. In ECAI 2014–21st European conference on artificial intelligence (pp. 147–152).
Brafman, R. I., & Shani, G. (2012). Replanning in domains with partial information and sensing actions. Journal of Artificial Intelligence Research (JAIR), 45, 565–600.
Article MathSciNet MATH Google Scholar
Braziunas, D., & Boutilier, C. (2010). Assessing regret-based preference elicitation with the utpref recommendation system. In Proceedings of the 11th ACM conference on electronic commerce (pp. 219–228). ACM.
Bryce, D., Kambhampati, S., & Smith, D. E. (2006). Planning graph heuristics for belief space search. Journal of Artificial Intelligence Research, 26, 35–99.
Article MATH Google Scholar
Bryce, D., Kambhampati, S., & Smith, D. E. (2006). Planning graph heuristics for belief space search. Journal of Artificial Intelligence Research., 26, 35–99.
Article MATH Google Scholar
Domshlak, C. (2013). Fault tolerant planning: Complexity and compilation. In ICAPS.
Finzi, A., & Orlandini, A. (2005). Human-robot interaction through mixed-initiative planning for rescue and search rovers. In AI*IA 2005 (pp. 483–494).
Garbarino, E. C., & Edell, J. A. (1997). Cognitive effort, affect, and choice. Journal of Consumer Research, 24(2), 147–158.
Article Google Scholar
Ghallab, M., Nau, D., & Traverso, P. (2016). Automated planning and acting. Cambridge: Cambridge University Press.
MATH Google Scholar
Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research, 26, 191–246.
Article MATH Google Scholar
Hoffmann, J. (2015). Simulated penetration testing: From “Dijkstra” to “Turing Test++”. In Proceedings of the 25th international conference on automated planning and scheduling, ICAPS (pp. 364–372).
Hoffmann, J., & Brafman, R. (2005). Contingent planning via heuristic forward search with implicit belief states. In Proc. ICAPS, Vol. 2005.
Hoffmann, J., & Nebel, B. (2001). The FF planning system: Fast plan generation through heuristic search. JAIR, 14, 253–302.
Article MATH Google Scholar
International planning competition 2014. https://helios.hud.ac.uk/scommv/IPC-14/domains_sequential.html.
Komarnitsky, R., & Shani, G. (2014). Computing contingent plans using online replanning. In Proceedings of the Twenty-Eighth AAAI conference on artificial intelligence, July 27–31, 2014, Québec City, Québec, Canada (pp. 2322–2329).
Komarnitsky, R., & Shani, G. (2016). Computing contingent plans using online replanning. In Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA (pp. 3159–3165).
Kupcsik, A., Deisenroth, M. P., Peters, J., Loh, A. P., Vadakkepat, P., & Neumann, G. (2017). Model-based contextual policy search for data-efficient generalization of robot skills. Artificial Intelligence, 247, 415–439.
Article MathSciNet MATH Google Scholar
Likhachev, M., & Stentz, A. (2009). Probabilistic planning with clear preferences on missing information. Artificial Intelligence, 173(5–6), 696–721.
Article MathSciNet MATH Google Scholar
Louridas, P. (2006). Static code analysis. IEEE Software, 23(4), 58–61.
Article Google Scholar
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine learning, 22(1–3), 159–195.
MATH Google Scholar
Mahler, J., & Goldberg, K. (2017). Learning deep policies for robot bin picking by simulating robust grasping sequences. In Conference on robot learning (pp. 515–524).
Maliah, S., Brafman, R. I., Karpas, E., & Shani, G. (2014). Partially observable online contingent planning using landmark heuristics. In Proceedings of the twenty-fourth international conference on automated planning and scheduling, ICAPS 2014, Portsmouth, New Hampshire, USA, June 21–26, 2014.
Mastrogiovanni, F., Sgorbissa, A., & Zaccaria, R. (2009). Robust navigation in an unknown environment with minimal sensing and representation. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(1), 212–229.
Article Google Scholar
Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence (pp. 427–436). Morgan Kaufmann Publishers Inc.
Michalski, R. S., Carbonell, J. G., & Mitchell, T. M. (2013). Machine learning: An artificial intelligence approach. Berlin: Springer Science & Business Media.
MATH Google Scholar
Mirsky, R, Gal, Y. K., Stern, R., Kalech, M. (2016). Sequential plan recognition. In Proceedings of the 2016 international conference on autonomous agents & multiagent systems (pp. 1347–1348). International Foundation for Autonomous Agents and Multiagent Systems.
Muise, C. J., Belle, V., & McIlraith, S. A. (2014). Computing contingent plans via fully observable non-deterministic planning. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27–31, 2014, Québec City, Québec, Canada (pp. 2322–2329).
Muise, C. J., Belle, V., & McIlraith, S. A. (2014). Computing contingent plans via fully observable non-deterministic planning. In Proceedings of the twenty-eighth AAAI conference on artificial intelligence.
Muise, C. J., McIlraith, S. A., & Christopher Beck, J. (2012). Improved non-deterministic planning by exploiting state relevance. In Proceedings of the twenty-second international conference on automated planning and scheduling, ICAPS.
O’Kane, J. M., & LaValle, S. M. (2008). Comparing the power of robots. The International Journal of Robotics Research, 27(1), 5–23.
Article Google Scholar
Palacios, H., & Geffner, H. (2007). From conformant into classical planning: Efficient translations that may be complete too. In ICAPS (pp. 264–271).
Poupart, P., & Boutilier, C. (2004). Bounded finite state controllers. In Advances in neural information processing systems (pp. 823–830).
Poupart, P., Boutilier, C., Schuurmans, D., & Patrascu, R. (2002). Piecewise linear value function approximation for factored mdps. In Proceedings of the eighteenth national conference on artificial intelligence (AAAI02), Edmonton.
Raghavan, S., Rohana, R., Leon, D., Podgurski, A., & Augustine, V. (2004). Dex: A semantic-graph differencing tool for studying changes in large code bases. In Proceedings 20th IEEE international conference on software maintenance, 2004 (pp. 188–197). IEEE.
Shani, G., & Brafman, R. I. (2011). Replanning in domains with partial information and sensing actions. In IJCAI (pp. 2021–2026).
Shani, G., & Meek, Cr. (2009). Improving existing fault recovery policies. In Advances in neural information processing systems (NIPS) (pp. 1642–1650).
Shani, G., Heckerman, D., & Brafman, R. I. (2005). An MDP-based recommender system. Journal of Machine Learning Research, 6(Sep), 1265–1295.
MathSciNet MATH Google Scholar
Shani, G., Pineau, J., & Kaplow, R. (2013). A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems, 27(1), 1–51.
Article Google Scholar
Shmaryahu, D., Hoffmann, J., Shani, G., & Steinmetz, M. (2016). Constructing plan trees for simulated penetration testing. In Proceedings of the scheduling and planning applications woRKshop (SPARK), ICAPS 2016.
Shmaryahu, D., Shani, G., Hoffmann, J., & Steinmetz, M. (2016). Constructing plan trees for simulated penetration testing.
Siepmann, F., Ziegler, L., Kortkamp, M., & Wachsmuth, S. (2014). Deploying a modeling framework for reusable robot behavior to enable informed strategies for domestic service robots. Robotics and Autonomous Systems, 62(5), 619–631.
Article Google Scholar
Smith, T., & Simmons, R. (2004). Heuristic search value iteration for pomdps. In Proceedings of the 20th conference on Uncertainty in artificial intelligence (pp. 520–527). AUAI Press.
Son, J.-W., Park, S.-B., & Park, S.-Y. (2006). Program plagiarism detection using parse tree kernels. In Pacific Rim international conference on artificial intelligence (pp. 1000–1004). Springer.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge: MIT Press.
MATH Google Scholar
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. MIT press.
Vidal, V., & Geffner, H. (2006). Branching and pruning: An optimal temporal pocl planner based on constraint programming. Artificial Intelligence, 170(3), 298–335.
Article MathSciNet MATH Google Scholar
Yang, J., Zhihua, Q., Wang, J., & Conrad, K. (2010). Comparison of optimal solutions to real-time path planning for a mobile vehicle. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40(4), 721–731.
Article Google Scholar
Yoon, S. W., Fern, A., & Givan, R. (2007). FF-Replan: A baseline for probabilistic planning. In ICAPS.

Download references

Acknowledgements

This work was supported by ISF Grant 933/13, and by the Israeli Cyber Center.

Author information

Authors and Affiliations

Ben Gurion University of the Negev, Beersheba, Israel
Dorin Shmaryahu & Guy Shani
Saarland University, Saarbrücken, Germany
Jörg Hoffmann

Authors

Dorin Shmaryahu
View author publications
You can also search for this author in PubMed Google Scholar
Guy Shani
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dorin Shmaryahu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Parts of this paper appeared in [48]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shmaryahu, D., Shani, G. & Hoffmann, J. Comparative criteria for partially observable contingent planning. Auton Agent Multi-Agent Syst 33, 481–517 (2019). https://doi.org/10.1007/s10458-019-09406-0

Download citation

Published: 09 May 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s10458-019-09406-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative criteria for partially observable contingent planning

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Game-theoretic multi-agent motion planning in a mixed environment

Classification of global catastrophic risks connected with artificial intelligence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparative criteria for partially observable contingent planning

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Game-theoretic multi-agent motion planning in a mixed environment

Classification of global catastrophic risks connected with artificial intelligence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation