Skip to main content

\(\mathsf {SafePILCO}\): A Software Tool for Safe and Data-Efficient Policy Synthesis

  • Conference paper
  • First Online:
Quantitative Evaluation of Systems (QEST 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12289))

Included in the following conference series:

Abstract

\(\mathsf {Safe PILCO}\) is a software tool for safe and data-efficient policy search with reinforcement learning. It extends the known \(\mathsf {PILCO}\) algorithm, originally written in MATLAB, to support safe learning. \(\mathsf {Safe PILCO}\) is a Python implementation and leverages existing libraries that allow the codebase to remain short and modular, towards wider use by the verification, reinforcement learning, and control communities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    An extended version of this paper is at: http://arxiv.org/abs/2008.03273.

  2. 2.

    Main package repository: https://github.com/nrontsis/PILCO.

  3. 3.

    Experiments and figures reproduction repository: https://github.com/kyr-pol/SafePILCO_Tool-Reproducibility.

  4. 4.

    By way of comparison, all gradient calculations in the \(\mathsf {PILCO}\) Matlab implementation are hand-coded, thus extensions are laborious as any additional user-defined controller or reward function has to include these gradient calculations too.

  5. 5.

    Code for the BAS simulator: https://gitlab.com/natchi92/BASBenchmarks.

References

  1. Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. software available from tensorflow.org

  2. Berkenkamp, F., Krause, A., Schoellig, A.P.: Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. CoRR abs/1602.04450 (2016). http://arxiv.org/abs/1602.04450

  3. Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)

  4. Cauchi, N., Abate, A.: Benchmarks for cyber-physical systems: a modular model library for building automation systems. In: Proceedings of ADHS, pp. 49–54 (2018)

    Google Scholar 

  5. Chatzilygeroudis, K., Rama, R., Kaushik, R., Goepp, D., Vassiliades, V., Mouret, J.B.: Black-box data-efficient policy search for robotics. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 51–58. IEEE (2017)

    Google Scholar 

  6. Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, pp. 4754–4765 (2018)

    Google Scholar 

  7. Deisenroth, M.P.: Efficient reinforcement learning using Gaussian processes. Ph.D. thesis, Karlsruhe Institute of Technology (2010)

    Google Scholar 

  8. Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task policy search for robotics. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3876–3881. IEEE (2014)

    Google Scholar 

  9. Deisenroth, M.P., Neumann, G., Peters, J., et al.: A survey on policy search for robotics. Found. Trends\(\textregistered \) Robot. 2(1–2), 1–142 (2013)

    Google Scholar 

  10. Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems (2011)

    Google Scholar 

  11. Deisenroth, M.P., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: In Proceedings of the International Conference on Machine Learning (2011)

    Google Scholar 

  12. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning (ICML), pp. 1329–1338 (2016)

    Google Scholar 

  13. Duivenvoorden, R.R., Berkenkamp, F., Carion, N., Krause, A., Schoellig, A.P.: Constrained Bayesian optimization with particle swarms for safe adaptive controller tuning. In: Proceedings of the IFAC (International Federation of Automatic Control) World Congress, pp. 12306–12313 (2017)

    Google Scholar 

  14. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)

  15. Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6059–6066. IEEE (2018)

    Google Scholar 

  16. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  17. Mataric, M.J.: Reward functions for accelerated learning. In: Machine Learning Proceedings 1994, pp. 181–189. Elsevier (1994)

    Google Scholar 

  18. Matthews, A.G.d.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(40), 1–6 (2017). http://jmlr.org/papers/v18/16-537.html

  19. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  20. Ng, A.Y., Jordan, M.I.: Shaping and policy search in reinforcement learning. Ph.D. thesis, University of California, Berkeley Berkeley (2003)

    Google Scholar 

  21. Polymenakos, K., Abate, A., Roberts, S.: Safe policy search using Gaussian process models. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1565–1573. International Foundation for Autonomous Agents and Multiagent Systems (2019)

    Google Scholar 

  22. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  23. Sui, Y., Gotovos, A., Burdick, J., Krause, A.: Safe exploration for optimization with Gaussian processes. In: Proceedings of The 32nd International Conference on Machine Learning, pp. 997–1005 (2015)

    Google Scholar 

  24. Vinogradska, J., Bischoff, B., Achterhold, J., Koller, T., Peters, J.: Numerical quadrature for probabilistic policy search. IEEE Trans. Pattern Anal. Mach. Intell. 42, 164–175 (2018)

    Article  Google Scholar 

  25. Vinogradska, J., Bischoff, B., Nguyen-Tuong, D., Romer, A., Schmidt, H., Peters, J.: Stability of controllers for gaussian process forward models. In: Proceedings of The 33rd International Conference on Machine Learning, pp. 545–554 (2016)

    Google Scholar 

  26. Vuong, T.L., Tran, K.: Uncertainty-aware model-based policy optimization. arXiv preprint arXiv:1906.10717 (2019)

  27. Wang, T., et al.: Benchmarking model-based reinforcement learning (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyriakos Polymenakos .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 242 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Polymenakos, K., Rontsis, N., Abate, A., Roberts, S. (2020). \(\mathsf {SafePILCO}\): A Software Tool for Safe and Data-Efficient Policy Synthesis. In: Gribaudo, M., Jansen, D.N., Remke, A. (eds) Quantitative Evaluation of Systems. QEST 2020. Lecture Notes in Computer Science(), vol 12289. Springer, Cham. https://doi.org/10.1007/978-3-030-59854-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59854-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59853-2

  • Online ISBN: 978-3-030-59854-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics