Abstract
Reinforcement learning is increasingly used to synthesize controllers for a broad range of applications. However, formal guarantees on the behavior of learning-based controllers are elusive due to the blackbox nature of machine learning models such as neural networks. In this paper, we propose an algorithm for verifying learning-based controllers—in particular, deep neural networks with ReLU activations, and decision trees with linear decisions and leaf values—for deterministic, piecewise affine (PWA) dynamical systems. In this setting, our algorithm computes the safe (resp., unsafe) region of the state space—i.e., the region of the state space on which the learned controller is guaranteed to satisfy (resp., fail to satisfy) a given reach-avoid specification. Knowing the safe and unsafe regions is substantially more informative than the boolean characterization of safety (i.e., safe or unsafe) provided by standard verification algorithms—for example, this knowledge can be used to compose controllers that are safe on different portions of the state space. At a high level, our algorithm uses convex programming to iteratively compute new regions (in the form of polytopes) that are guaranteed to be entirely safe or entirely unsafe. Then, it connects these polytopic regions together in a tree-like fashion. We conclude with an illustrative example on controlling a hybrid model of a contact-based robotics problem.
This research was partially supported by ONR award N00014-17-1-2699.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, pp. 1–8 (2007)
Althoff, M., Stursberg, O., Buss, M.: Computing reachable sets of hybrid systems using a combination of zonotopes and polytopes. Nonlinear Anal.: Hybrid Syst. 4(2), 233–249 (2010)
Alvarez-Melis, D., Jaakkola, T.S.: Towards robust interpretability with self-explaining neural networks. arXiv preprint arXiv:1806.07538 (2018)
Andrychowicz, M., et al.: Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177 (2018)
Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., Criminisi, A.: Measuring neural net robustness with constraints. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2016)
Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. arXiv preprint arXiv:1805.08328 (2018)
Breiman, L.: Classification and Regression Trees. Routledge, Abingdon (2017)
Chen, X., Abraham, E., Sankaranarayanan, S.: Taylor model flowpipe construction for non-linear hybrid systems. In: 2012 IEEE 33rd Real-Time Systems Symposium (RTSS), pp. 183–192. IEEE (2012)
Collins, S., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive-dynamic walkers. Science 307(5712), 1082–1085 (2005)
Gao, S., Kong, S., Clarke, E.M.: dReal: an SMT solver for nonlinear theories over the reals. In: Bonacina, M.P. (ed.) CADE 2013. LNCS, vol. 7898, pp. 208–214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2_14
Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: AI2: safety and robustness certification of neural networks with abstract interpretation
Girard, A.: Reachability of uncertain linear systems using zonotopes. In: Morari, M., Thiele, L. (eds.) HSCC 2005. LNCS, vol. 3414, pp. 291–305. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31954-2_19
Gurobi Optimization, Inc.: Gurobi optimizer reference manual (2016). http://www.gurobi.com
Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1
Ivanov, R., Weimer, J., Alur, R., Pappas, G.J., Lee, I.: Verisig: verifying safety properties of hybrid systems with neural network controllers. arXiv preprint arXiv:1811.01828 (2018)
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Kong, S., Gao, S., Chen, W., Clarke, E.: dReach: \(\delta \)-reachability analysis for hybrid systems. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 200–205. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_15
Kopetzki, A.K., Schürmann, B., Althoff, M.: Efficient methods for order reduction of zonotopes. In: Proceedings of the 56th IEEE Conference on Decision and Control (2017)
Lei, T., Barzilay, R., Jaakkola, T.: Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Levine, S., Koltun, V.: Guided policy search. In: International Conference on Machine Learning, pp. 1–9 (2013)
Li, Y., Liang, Y.: Learning overparameterized neural networks via stochastic gradient descent on structured data. In: Advances in Neural Information Processing Systems, pp. 8168–8177 (2018)
Marcucci, T., Deits, R., Gabiccini, M., Biechi, A., Tedrake, R.: Approximate hybrid model predictive control for multi-contact push recovery in complex environments. In: 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pp. 31–38. IEEE (2017)
Pan, Y., et al.: Learning deep neural network control policies for agile off-road autonomous driving. In: The NIPS Deep Reinforcement Learning Symposium (2017)
Raghunathan, A., Steinhardt, J., Liang, P.S.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Advances in Neural Information Processing Systems, pp. 10900–10910 (2018)
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)
Sadraddini, S., Tedrake, R.: Linear encodings for polytope containment problems. arXiv preprint arXiv:1903.05214 (2019)
Sadraddini, S., Tedrake, R.: Sampling-based polytopic trees for approximate optimal control of piecewise affine systems. In: International Conference on Robotics and Automation (ICRA) (2019)
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. (2019)
Sun, X., Khedr, H., Shoukry, Y.: Formal verification of neural network controlled autonomous systems. arXiv preprint arXiv:1810.13072 (2018)
Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)
Tedrake, R., Manchester, I.R., Tobenkin, M., Roberts, J.W.: LQR-trees: feedback motion planning via sums-of-squares verification. Int. J. Robot. Res. 29(8), 1038–1052 (2010)
Tjeng, V., Tedrake, R.: Verifying neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356 (2017)
Tjeng, V., Xiao, K.Y., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming (2018)
Tøndel, P., Johansen, T.A., Bemporad, A.: Evaluation of piecewise affine control via binary search tree. Automatica 39(5), 945–950 (2003)
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: International Conference on Machine Learning, pp. 5283–5292 (2018)
Xiang, W., Lopez, D.M., Musau, P., Johnson, T.T.: Reachable set estimation and verification for neural network models of nonlinear dynamic systems. In: Yu, H., Li, X., Murray, R.M., Ramesh, S., Tomlin, C.J. (eds.) Safe, Autonomous and Intelligent Vehicles. UST, pp. 123–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97301-2_7
Xiang, W., Tran, H.D., Johnson, T.T.: Specification-guided safety verification for feedforward neural networks. arXiv preprint arXiv:1812.06161 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Proof of Theorem 2
Proof
Let \(X^i_{r}=X^{\max }_{s} \setminus X^i_{s}\). We need to show that with probability one. The same argument holds for \(X^{\max }_{f} \setminus X^i_{f}\). We prove by contradiction. Let \(\lim _{i \rightarrow \infty } X_r^i\) be a measurable set. It follows from Assumption 1 that we obtain a sample from \(X_r^{\infty }\) with non-zero probability. Then we simulate the system forward from the sampled state \(x_0\). Once the trajectory is obtained, the polytopic trajectory is computed, and by Assumption 2, we obtain a measurable \(\mathbb {P}_0\), centered at that has non-empty intersection with \(X_r^{\infty }\). Therefore, by non-zero probability, the volume of \(X_r^{i}\) shrinks in one iteration by at least \(\lambda \). Thus we reach a contradiction.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sadraddini, S., Shen, S., Bastani, O. (2019). Polytopic Trees for Verification of Learning-Based Controllers. In: Zamani, M., Zufferey, D. (eds) Numerical Software Verification. NSV 2019. Lecture Notes in Computer Science(), vol 11652. Springer, Cham. https://doi.org/10.1007/978-3-030-28423-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-28423-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28422-0
Online ISBN: 978-3-030-28423-7
eBook Packages: Computer ScienceComputer Science (R0)