Polytopic Trees for Verification of Learning-Based Controllers

Sadraddini, Sadra; Shen, Shen; Bastani, Osbert

doi:10.1007/978-3-030-28423-7_8

Sadra Sadraddini¹⁰,
Shen Shen¹⁰ &
Osbert Bastani¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11652))

Included in the following conference series:

International Workshop on Numerical Software Verification

457 Accesses
1 Citations

Abstract

Reinforcement learning is increasingly used to synthesize controllers for a broad range of applications. However, formal guarantees on the behavior of learning-based controllers are elusive due to the blackbox nature of machine learning models such as neural networks. In this paper, we propose an algorithm for verifying learning-based controllers—in particular, deep neural networks with ReLU activations, and decision trees with linear decisions and leaf values—for deterministic, piecewise affine (PWA) dynamical systems. In this setting, our algorithm computes the safe (resp., unsafe) region of the state space—i.e., the region of the state space on which the learned controller is guaranteed to satisfy (resp., fail to satisfy) a given reach-avoid specification. Knowing the safe and unsafe regions is substantially more informative than the boolean characterization of safety (i.e., safe or unsafe) provided by standard verification algorithms—for example, this knowledge can be used to compose controllers that are safe on different portions of the state space. At a high level, our algorithm uses convex programming to iteratively compute new regions (in the form of polytopes) that are guaranteed to be entirely safe or entirely unsafe. Then, it connects these polytopic regions together in a tree-like fashion. We conclude with an illustrative example on controlling a hybrid model of a contact-based robotics problem.

This research was partially supported by ONR award N00014-17-1-2699.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, pp. 1–8 (2007)
Google Scholar
Althoff, M., Stursberg, O., Buss, M.: Computing reachable sets of hybrid systems using a combination of zonotopes and polytopes. Nonlinear Anal.: Hybrid Syst. 4(2), 233–249 (2010)
MathSciNet MATH Google Scholar
Alvarez-Melis, D., Jaakkola, T.S.: Towards robust interpretability with self-explaining neural networks. arXiv preprint arXiv:1806.07538 (2018)
Andrychowicz, M., et al.: Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177 (2018)
Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., Criminisi, A.: Measuring neural net robustness with constraints. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2016)
Google Scholar
Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. arXiv preprint arXiv:1805.08328 (2018)
Breiman, L.: Classification and Regression Trees. Routledge, Abingdon (2017)
Book Google Scholar
Chen, X., Abraham, E., Sankaranarayanan, S.: Taylor model flowpipe construction for non-linear hybrid systems. In: 2012 IEEE 33rd Real-Time Systems Symposium (RTSS), pp. 183–192. IEEE (2012)
Google Scholar
Collins, S., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive-dynamic walkers. Science 307(5712), 1082–1085 (2005)
Article Google Scholar
Gao, S., Kong, S., Clarke, E.M.: dReal: an SMT solver for nonlinear theories over the reals. In: Bonacina, M.P. (ed.) CADE 2013. LNCS, vol. 7898, pp. 208–214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2_14
Chapter Google Scholar
Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: AI2: safety and robustness certification of neural networks with abstract interpretation
Google Scholar
Girard, A.: Reachability of uncertain linear systems using zonotopes. In: Morari, M., Thiele, L. (eds.) HSCC 2005. LNCS, vol. 3414, pp. 291–305. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31954-2_19
Chapter MATH Google Scholar
Gurobi Optimization, Inc.: Gurobi optimizer reference manual (2016). http://www.gurobi.com
Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1
Chapter Google Scholar
Ivanov, R., Weimer, J., Alur, R., Pappas, G.J., Lee, I.: Verisig: verifying safety properties of hybrid systems with neural network controllers. arXiv preprint arXiv:1811.01828 (2018)
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Chapter Google Scholar
Kong, S., Gao, S., Chen, W., Clarke, E.: dReach: \(\delta \)-reachability analysis for hybrid systems. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 200–205. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_15
Chapter Google Scholar
Kopetzki, A.K., Schürmann, B., Althoff, M.: Efficient methods for order reduction of zonotopes. In: Proceedings of the 56th IEEE Conference on Decision and Control (2017)
Google Scholar
Lei, T., Barzilay, R., Jaakkola, T.: Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Levine, S., Koltun, V.: Guided policy search. In: International Conference on Machine Learning, pp. 1–9 (2013)
Google Scholar
Li, Y., Liang, Y.: Learning overparameterized neural networks via stochastic gradient descent on structured data. In: Advances in Neural Information Processing Systems, pp. 8168–8177 (2018)
Google Scholar
Marcucci, T., Deits, R., Gabiccini, M., Biechi, A., Tedrake, R.: Approximate hybrid model predictive control for multi-contact push recovery in complex environments. In: 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pp. 31–38. IEEE (2017)
Google Scholar
Pan, Y., et al.: Learning deep neural network control policies for agile off-road autonomous driving. In: The NIPS Deep Reinforcement Learning Symposium (2017)
Google Scholar
Raghunathan, A., Steinhardt, J., Liang, P.S.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Advances in Neural Information Processing Systems, pp. 10900–10910 (2018)
Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)
Google Scholar
Sadraddini, S., Tedrake, R.: Linear encodings for polytope containment problems. arXiv preprint arXiv:1903.05214 (2019)
Sadraddini, S., Tedrake, R.: Sampling-based polytopic trees for approximate optimal control of piecewise affine systems. In: International Conference on Robotics and Automation (ICRA) (2019)
Google Scholar
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. (2019)
Google Scholar
Sun, X., Khedr, H., Shoukry, Y.: Formal verification of neural network controlled autonomous systems. arXiv preprint arXiv:1810.13072 (2018)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Google Scholar
Tedrake, R., Manchester, I.R., Tobenkin, M., Roberts, J.W.: LQR-trees: feedback motion planning via sums-of-squares verification. Int. J. Robot. Res. 29(8), 1038–1052 (2010)
Article Google Scholar
Tjeng, V., Tedrake, R.: Verifying neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356 (2017)
Tjeng, V., Xiao, K.Y., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming (2018)
Google Scholar
Tøndel, P., Johansen, T.A., Bemporad, A.: Evaluation of piecewise affine control via binary search tree. Automatica 39(5), 945–950 (2003)
Article MathSciNet Google Scholar
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: International Conference on Machine Learning, pp. 5283–5292 (2018)
Google Scholar
Xiang, W., Lopez, D.M., Musau, P., Johnson, T.T.: Reachable set estimation and verification for neural network models of nonlinear dynamic systems. In: Yu, H., Li, X., Murray, R.M., Ramesh, S., Tomlin, C.J. (eds.) Safe, Autonomous and Intelligent Vehicles. UST, pp. 123–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97301-2_7
Chapter Google Scholar
Xiang, W., Tran, H.D., Johnson, T.T.: Specification-guided safety verification for feedforward neural networks. arXiv preprint arXiv:1812.06161 (2018)

Download references

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, USA
Sadra Sadraddini & Shen Shen
University of Pennsylvania, Philadelphia, USA
Osbert Bastani

Authors

Sadra Sadraddini
View author publications
You can also search for this author in PubMed Google Scholar
Shen Shen
View author publications
You can also search for this author in PubMed Google Scholar
Osbert Bastani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sadra Sadraddini .

Editor information

Editors and Affiliations

University of Colorado Boulder, Boulder, CO, USA
Majid Zamani
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Damien Zufferey

A Appendix

1.1 A.1 Proof of Theorem 2

Proof

Let \(X^i_{r}=X^{\max }_{s} \setminus X^i_{s}\). We need to show that with probability one. The same argument holds for \(X^{\max }_{f} \setminus X^i_{f}\). We prove by contradiction. Let \(\lim _{i \rightarrow \infty } X_r^i\) be a measurable set. It follows from Assumption 1 that we obtain a sample from \(X_r^{\infty }\) with non-zero probability. Then we simulate the system forward from the sampled state \(x_0\). Once the trajectory is obtained, the polytopic trajectory is computed, and by Assumption 2, we obtain a measurable \(\mathbb {P}_0\), centered at that has non-empty intersection with \(X_r^{\infty }\). Therefore, by non-zero probability, the volume of \(X_r^{i}\) shrinks in one iteration by at least \(\lambda \). Thus we reach a contradiction.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sadraddini, S., Shen, S., Bastani, O. (2019). Polytopic Trees for Verification of Learning-Based Controllers. In: Zamani, M., Zufferey, D. (eds) Numerical Software Verification. NSV 2019. Lecture Notes in Computer Science(), vol 11652. Springer, Cham. https://doi.org/10.1007/978-3-030-28423-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-28423-7_8
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28422-0
Online ISBN: 978-3-030-28423-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Polytopic Trees for Verification of Learning-Based Controllers

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Proof of Theorem 2

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation