AdaPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems

Harrison, James; Garg, Animesh; Ivanovic, Boris; Zhu, Yuke; Savarese, Silvio; Fei-Fei, Li; Pavone, Marco

doi:10.1007/978-3-030-28619-4_34

James Harrison¹⁴,
Animesh Garg¹⁵,
Boris Ivanovic¹⁵,
Yuke Zhu¹⁵,
Silvio Savarese¹⁵,
Li Fei-Fei¹⁵ &
…
Marco Pavone¹⁶

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 10))

2703 Accesses
5 Citations

Abstract

Model-free policy learning has enabled good performance on complex tasks that were previously intractable with traditional control techniques. However, this comes at the cost of requiring a perfectly accurate model for training. This is infeasible due to the very high sample complexity of model-free methods preventing training on the target system. This renders such methods unsuitable for physical systems. Model mismatch due to dynamics parameter differences and unmodeled dynamics error may cause suboptimal or unsafe behavior upon direct transfer. We introduce the Adaptive Policy Transfer for Stochastic Dynamics (AdaPT) algorithm that achieves provably safe and robust, dynamically-feasible zero-shot transfer of RL-policies to new domains with dynamics error. AdaPT combines the strengths of offline policy learning in a black-box source simulator with online tube-based MPC to attenuate bounded dynamics mismatch between the source and target dynamics. AdaPT allows online transfer of policies, trained solely in a simulation offline, to a family of unknown targets without fine-tuning. We also formally show that (i) AdaPT guarantees bounded state and control deviation through state-action tubes under relatively weak technical assumptions and, (ii) AdaPT results in a bounded loss of reward accumulation relative to a policy trained and evaluated in the source environment. We evaluate AdaPT on 2 continuous, non-holonomic simulated dynamical systems with 4 different disturbance models, and find that AdaPT performs between 50 and \(300\%\) better on mean reward accrual than direct policy transfer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., Quigley, M., Ng, A.Y.: Using inaccurate models in reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning. ACM (2006)
Google Scholar
Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., Downs, L., Ibarz, J., Pastor, P., Konolige, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping (2017). arXiv:1709.07857
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym (2016). arXiv:1606.01540
Christiano, P., Shah, Z., Mordatch, I., Schneider, J., Blackwell, T., Tobin, J., Abbeel, P., Zaremba, W.: Transfer from simulation to real world through learning deep inverse dynamics model (2016). arXiv:1610.03518
Deisenroth, M., Rasmussen, C.E.: Pilco: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11) (2011)
Google Scholar
Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer (2016). arXiv:1609.07088
Farshidian, F., Pardo, D., Buchli, J.: Sequential linear quadratic optimal control for nonlinear switched systems (2016). arXiv:1609.02198
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep Q-learning with modelbased acceleration. ICML (2016)
Google Scholar
Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., Tassa, Y.: Learning continuous control policies by stochastic value gradients. In: NIPS (2015)
Google Scholar
Kahn, G., Zhang, T., Levine, S., Abbeel, P.: Plato: policy learning using adaptive trajectory optimization (2016). arXiv:1603.00622
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 0 278 364 913 495 721 (2013)
Google Scholar
Levine, S., Abbeel, P.: Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems (2014)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)
MathSciNet MATH Google Scholar
Mandlekar*, A., Zhu*, Y., Garg*, A., Fei-Fei, L., Savarese, S.: Adversarially robust policy learning through active construction of physically-plausible perturbations. In: IEEE International Conference on Intelligent Robots and Systems (IROS) (2017) (* equal contribution)
Google Scholar
Mayne, D.Q., Kerrigan, E.C., Van Wyk, E., Falugi, P.: Tube-based robust nonlinear model predictive control. Int. J. Robust Nonlinear Control (2011)
Google Scholar
Mitrovic, D., Klanke, S., Vijayakumar, S.: Adaptive optimal feedback control with learned internal dynamics models. In: From Motor Learning to Interaction Learning in Robots, pp. 65–84. Springer, New York (2010)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Moerland, T., Broekens, J., Jonker, C.: Learning multimodal transition dynamics for model-based reinforcement learning (2017). arXiv:1705.00470
Mordatch, I., Lowrey, K., Todorov, E.: Ensemble-cio: full-body dynamic motion planning that transfers to physical humanoids. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5307–5314. IEEE (2015)
Google Scholar
Neunert, M., de Crousaz, C., Furrer, F., Kamel, M., Farshidian, F., Siegwart, R., Buchli, J.: Fast nonlinear model predictive control for unified trajectory optimization and tracking. In: Proceedings of the IEEE Conference on Robotics and Automation (2016)
Google Scholar
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization (2017). arXiv:1710.06537
Pinto, L., Davidson, J., Sukthankar, R., Gupta, A.: Robust adversarial reinforcement learning (2017). arXiv:1703.02702
Rajeswaran, A., Ghotra, S., Levine, S., Ravindran, B.: EPOpt: learning robust neural network policies using model ensembles (2016). arXiv:1610.01283
Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., Hadsell, R.: Progressive neural networks (2016). arXiv:1606.04671
Schulman, J., Levine, S., Moritz, P., Jordan, M., Abbeel, P.: Trust region policy optimization. In: ICML (2015)
Google Scholar
Singh, S., Majumdar, A., Slotine, J.-J., Pavone, M.: Robust online motion planning via contraction theory and convex optimization. In: Proceedings of the IEEE Conference on Robotics and Automation (2017)
Google Scholar
Tassa, Y., Erez, T., Todorov, E.: Synthesis and stabilization of complex behaviors through online trajectory optimization. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4906–4913. IEEE (2012)
Google Scholar
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10 (2009)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world (2017). arXiv:1703.06907
Todorov, E., Li, W.: A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems. In: Proceedings of the 2005 American Control Conference, pp. 300-306. IEEE (2005)
Google Scholar
Webb, D.J., van den Berg, J.: Kinodynamic RRT*: asymptotically optimal motion planning for robots with linear dynamics. In: IEEE International Conference on Robotics and Automation (ICRA) (2013)
Google Scholar
Zhou, K., Doyle, J.C., Glover, K., et al.: Robust and Optimal Control, vol. 40 (1996)
Google Scholar

Download references

Acknowledgements

This work was supported by the Office of Naval Research (Grant N00014-15-1-2673) and by the Toyota Research Institute (“TRI”). This article solely reflects the opinions and conclusions of its authors and not ONR, TRI, or any other Toyota entity. James Harrison was supported in part by the Stanford Graduate Fellowship and the National Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Department of Mechanical Engineering, Stanford University, Stanford, CA, 94305, USA
James Harrison
Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
Animesh Garg, Boris Ivanovic, Yuke Zhu, Silvio Savarese & Li Fei-Fei
Department of Aeronautics and Astronautics, Stanford University, Stanford, CA, 94305, USA
Marco Pavone

Authors

James Harrison
View author publications
You can also search for this author in PubMed Google Scholar
Animesh Garg
View author publications
You can also search for this author in PubMed Google Scholar
Boris Ivanovic
View author publications
You can also search for this author in PubMed Google Scholar
Yuke Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Savarese
View author publications
You can also search for this author in PubMed Google Scholar
Li Fei-Fei
View author publications
You can also search for this author in PubMed Google Scholar
Marco Pavone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Harrison .

Editor information

Editors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Nancy M. Amato
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Greg Hager
Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
Shawna Thomas
Department of Electrical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
Miguel Torres-Torriti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Harrison, J. et al. (2020). AdaPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems. In: Amato, N., Hager, G., Thomas, S., Torres-Torriti, M. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 10. Springer, Cham. https://doi.org/10.1007/978-3-030-28619-4_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-28619-4_34
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28618-7
Online ISBN: 978-3-030-28619-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics