OnPlan: A Framework for Simulation-Based Online Planning

Belzner, Lenz; Hennicker, Rolf; Wirsing, Martin

doi:10.1007/978-3-319-28934-2_1

Lenz Belzner¹⁵,
Rolf Hennicker¹⁵ &
Martin Wirsing¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9539))

Included in the following conference series:

Formal Aspects of Component Software

535 Accesses
5 Citations

Abstract

This paper proposes the OnPlan framework for modeling autonomous systems operating in domains with large probabilistic state spaces and high branching factors. The framework defines components for acting and deliberation, and specifies their interactions. It comprises a mathematical specification of requirements for autonomous systems. We discuss the role of such a specification in the context of simulation-based online planning. We also consider two instantiations of the framework: Monte Carlo Tree Search for discrete domains, and Cross Entropy Open Loop Planning for continuous state and action spaces. The framework’s ability to provide system autonomy is illustrated empirically on a robotic rescue example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kolobov, A., Dai, P., Mausam, M., Weld, D.S.: Reverse iterative deepening for finite-horizon MDPS with large branching factors. In: Proceedings of the 22nd International Conference on Automated Planning and Scheduling, ICAPS (2012)
Google Scholar
Keller, T., Helmert, M.: Trial-based Heuristic Tree Search for Finite Horizon MDPs. In: Proceedings of the 23rd International Conference on Automated Planning and Scheduling (ICAPS 2013), pp. 135–143. AAAI Press, June 2013
Google Scholar
Weinstein, A.: Local Planning for Continuous Markov Decision Processes. Ph.D. thesis, Rutgers, The State University of New Jersey (2014)
Google Scholar
Kephart, J.: An architectural blueprint for autonomic computing. IBM (2003)
Google Scholar
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Article MATH MathSciNet Google Scholar
Rubinstein, R.Y., Kroese, D.P.: Simulation and the Monte Carlo Method, vol. 707. Wiley, New York (2011)
Google Scholar
Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer Science & Business Media, New York (2013)
Book Google Scholar
Audibert, J.Y., Munos, R., Szepesvári, C.: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 410(19), 1876–1902 (2009)
Article MATH Google Scholar
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Game 4(1), 1–43 (2012)
Article Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Weinstein, A., Littman, M.L.: Open-loop planning in large-scale stochastic domains. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)
Google Scholar
Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., Teytaud, O.: The grand challenge of computer go: Monte carlo tree search and extensions. Commun. ACM 55(3), 106–113 (2012)
Article Google Scholar
Silver, D., Sutton, R.S., Müller, M.: Temporal-difference search in computer go. In: Borrajo, D., Kambhampati, S., Oddi, A., Fratini, S. (eds.) Proceedings of the Twenty-Third International Conference on Automated Planning and Scheduling, ICAPS 2013, Rome, Italy, June 10–14, 2013. AAAI (2013)
Google Scholar
Gelly, S., Silver, D.: Monte-carlo tree search and rapid action value estimation in computer go. Artif. Intell. 175(11), 1856–1875 (2011)
Article MathSciNet Google Scholar
Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), 1–122 (2012)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article MATH Google Scholar
Sebastio, S., Vandin, A.: Multivesta: Statistical model checking for discrete event simulators. In: Proceedings of the 7th International Conference on Performance Evaluation Methodologies and Tools, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), pp. 310–315 (2013)
Google Scholar
de Boer, P., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Annals OR 134(1), 19–67 (2005)
Article MATH Google Scholar
Margolin, L.: On the convergence of the cross-entropy method. Ann. Oper. Res. 134(1), 201–214 (2005)
Article MATH MathSciNet Google Scholar
Kobilarov, M.: Cross-entropy motion planning. I. J. Robotic Res. 31(7), 855–871 (2012)
Article Google Scholar
Livingston, S.C., Wolff, E.M., Murray, R.M.: Cross-entropy temporal logic motion planning. In: Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control, HSCC 2015, pp. 269–278 (2015)
Google Scholar
Box, G.E., Muller, M.E.: A note on the generation of random normal deviates. Ann. Math. Stat. 29, 610–611 (1958)
Article MATH Google Scholar
Hester, T., Stone, P.: Texplore: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(3), 385–429 (2013)
Article MathSciNet Google Scholar
Bonet, B., Geffner, H.: Labeled RTDP: Improving the convergence of real-time dynamic programming. In: ICAPS, vol. 3, pp. 12–21 (2003)
Google Scholar
Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1238–1246 (2013)
Google Scholar
Cazenave, T., Pepels, T., Winands, M.H.M., Lanctot, M.: Minimizing simple and cumulative regret in monte-carlo tree search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds.) CGW 2014. CCIS, vol. 504, pp. 1–15. Springer, Heidelberg (2014)
Chapter Google Scholar
Mansley, C.R., Weinstein, A., Littman, M.L.: Sample-based planning for continuous action markov decision processes. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS (2011)
Google Scholar
Weinstein, A., Littman, M.L.: Bandit-based planning and learning in continuous-action markov decision processes. In: Proceedings of the 22nd International Conference on Automated Planning and Scheduling, ICAPS (2012)
Google Scholar
Baier, C., Katoen, J.P., et al.: Principles of Model Checking, vol. 26202649. MIT Press, Cambridge (2008)
MATH Google Scholar
Wirsing, M., Hölzl, M., Koch, N., Mayer, P. (eds.): Software Engineering for Collective Autonomic Systems: Results of the ASCENS Project. LNCS, vol. 8998. Springer, Heidelberg (2015)
Google Scholar
Hölzl, M.M., Gabor, T.: Continuous collaboration: A case study on the development of an adaptive cyber-physical system. In: 1st IEEE/ACM International Workshop on Software Engineering for Smart Cyber-Physical Systems, SEsCPS 2015, pp. 19–25 (2015)
Google Scholar

Download references

Acknowledgements

The authors thank Andrea Vandin for his help with the MultiVeStA statistical model checker [17].

Author information

Authors and Affiliations

Institut für Informatik, Ludwig-Maximilians-Universität München, Munich, Germany
Lenz Belzner, Rolf Hennicker & Martin Wirsing

Authors

Lenz Belzner
View author publications
You can also search for this author in PubMed Google Scholar
Rolf Hennicker
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wirsing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lenz Belzner .

Editor information

Editors and Affiliations

Universidade Federal Fluminense, Niterói, Brazil
Christiano Braga
University of Oslo, Oslo, Norway
Peter Csaba Ölveczky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belzner, L., Hennicker, R., Wirsing, M. (2016). OnPlan: A Framework for Simulation-Based Online Planning. In: Braga, C., Ölveczky, P. (eds) Formal Aspects of Component Software. FACS 2015. Lecture Notes in Computer Science(), vol 9539. Springer, Cham. https://doi.org/10.1007/978-3-319-28934-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-28934-2_1
Published: 29 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28933-5
Online ISBN: 978-3-319-28934-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics