Skip to main content

Polytopic Trees for Verification of Learning-Based Controllers

  • Conference paper
  • First Online:
Numerical Software Verification (NSV 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11652))

Included in the following conference series:

Abstract

Reinforcement learning is increasingly used to synthesize controllers for a broad range of applications. However, formal guarantees on the behavior of learning-based controllers are elusive due to the blackbox nature of machine learning models such as neural networks. In this paper, we propose an algorithm for verifying learning-based controllers—in particular, deep neural networks with ReLU activations, and decision trees with linear decisions and leaf values—for deterministic, piecewise affine (PWA) dynamical systems. In this setting, our algorithm computes the safe (resp., unsafe) region of the state space—i.e., the region of the state space on which the learned controller is guaranteed to satisfy (resp., fail to satisfy) a given reach-avoid specification. Knowing the safe and unsafe regions is substantially more informative than the boolean characterization of safety (i.e., safe or unsafe) provided by standard verification algorithms—for example, this knowledge can be used to compose controllers that are safe on different portions of the state space. At a high level, our algorithm uses convex programming to iteratively compute new regions (in the form of polytopes) that are guaranteed to be entirely safe or entirely unsafe. Then, it connects these polytopic regions together in a tree-like fashion. We conclude with an illustrative example on controlling a hybrid model of a contact-based robotics problem.

This research was partially supported by ONR award N00014-17-1-2699.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Advances in Neural Information Processing Systems, pp. 1–8 (2007)

    Google Scholar 

  2. Althoff, M., Stursberg, O., Buss, M.: Computing reachable sets of hybrid systems using a combination of zonotopes and polytopes. Nonlinear Anal.: Hybrid Syst. 4(2), 233–249 (2010)

    MathSciNet  MATH  Google Scholar 

  3. Alvarez-Melis, D., Jaakkola, T.S.: Towards robust interpretability with self-explaining neural networks. arXiv preprint arXiv:1806.07538 (2018)

  4. Andrychowicz, M., et al.: Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177 (2018)

  5. Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A., Criminisi, A.: Measuring neural net robustness with constraints. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2016)

    Google Scholar 

  6. Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. arXiv preprint arXiv:1805.08328 (2018)

  7. Breiman, L.: Classification and Regression Trees. Routledge, Abingdon (2017)

    Book  Google Scholar 

  8. Chen, X., Abraham, E., Sankaranarayanan, S.: Taylor model flowpipe construction for non-linear hybrid systems. In: 2012 IEEE 33rd Real-Time Systems Symposium (RTSS), pp. 183–192. IEEE (2012)

    Google Scholar 

  9. Collins, S., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive-dynamic walkers. Science 307(5712), 1082–1085 (2005)

    Article  Google Scholar 

  10. Gao, S., Kong, S., Clarke, E.M.: dReal: an SMT solver for nonlinear theories over the reals. In: Bonacina, M.P. (ed.) CADE 2013. LNCS, vol. 7898, pp. 208–214. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38574-2_14

    Chapter  Google Scholar 

  11. Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M.: AI2: safety and robustness certification of neural networks with abstract interpretation

    Google Scholar 

  12. Girard, A.: Reachability of uncertain linear systems using zonotopes. In: Morari, M., Thiele, L. (eds.) HSCC 2005. LNCS, vol. 3414, pp. 291–305. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31954-2_19

    Chapter  MATH  Google Scholar 

  13. Gurobi Optimization, Inc.: Gurobi optimizer reference manual (2016). http://www.gurobi.com

  14. Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1

    Chapter  Google Scholar 

  15. Ivanov, R., Weimer, J., Alur, R., Pappas, G.J., Lee, I.: Verisig: verifying safety properties of hybrid systems with neural network controllers. arXiv preprint arXiv:1811.01828 (2018)

  16. Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5

    Chapter  Google Scholar 

  17. Kong, S., Gao, S., Chen, W., Clarke, E.: dReach: \(\delta \)-reachability analysis for hybrid systems. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 200–205. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_15

    Chapter  Google Scholar 

  18. Kopetzki, A.K., Schürmann, B., Althoff, M.: Efficient methods for order reduction of zonotopes. In: Proceedings of the 56th IEEE Conference on Decision and Control (2017)

    Google Scholar 

  19. Lei, T., Barzilay, R., Jaakkola, T.: Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016)

  20. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  21. Levine, S., Koltun, V.: Guided policy search. In: International Conference on Machine Learning, pp. 1–9 (2013)

    Google Scholar 

  22. Li, Y., Liang, Y.: Learning overparameterized neural networks via stochastic gradient descent on structured data. In: Advances in Neural Information Processing Systems, pp. 8168–8177 (2018)

    Google Scholar 

  23. Marcucci, T., Deits, R., Gabiccini, M., Biechi, A., Tedrake, R.: Approximate hybrid model predictive control for multi-contact push recovery in complex environments. In: 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids), pp. 31–38. IEEE (2017)

    Google Scholar 

  24. Pan, Y., et al.: Learning deep neural network control policies for agile off-road autonomous driving. In: The NIPS Deep Reinforcement Learning Symposium (2017)

    Google Scholar 

  25. Raghunathan, A., Steinhardt, J., Liang, P.S.: Semidefinite relaxations for certifying robustness to adversarial examples. In: Advances in Neural Information Processing Systems, pp. 10900–10910 (2018)

    Google Scholar 

  26. Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635 (2011)

    Google Scholar 

  27. Sadraddini, S., Tedrake, R.: Linear encodings for polytope containment problems. arXiv preprint arXiv:1903.05214 (2019)

  28. Sadraddini, S., Tedrake, R.: Sampling-based polytopic trees for approximate optimal control of piecewise affine systems. In: International Conference on Robotics and Automation (ICRA) (2019)

    Google Scholar 

  29. Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. (2019)

    Google Scholar 

  30. Sun, X., Khedr, H., Shoukry, Y.: Formal verification of neural network controlled autonomous systems. arXiv preprint arXiv:1810.13072 (2018)

  31. Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)

    Google Scholar 

  32. Tedrake, R., Manchester, I.R., Tobenkin, M., Roberts, J.W.: LQR-trees: feedback motion planning via sums-of-squares verification. Int. J. Robot. Res. 29(8), 1038–1052 (2010)

    Article  Google Scholar 

  33. Tjeng, V., Tedrake, R.: Verifying neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356 (2017)

  34. Tjeng, V., Xiao, K.Y., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming (2018)

    Google Scholar 

  35. Tøndel, P., Johansen, T.A., Bemporad, A.: Evaluation of piecewise affine control via binary search tree. Automatica 39(5), 945–950 (2003)

    Article  MathSciNet  Google Scholar 

  36. Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: International Conference on Machine Learning, pp. 5283–5292 (2018)

    Google Scholar 

  37. Xiang, W., Lopez, D.M., Musau, P., Johnson, T.T.: Reachable set estimation and verification for neural network models of nonlinear dynamic systems. In: Yu, H., Li, X., Murray, R.M., Ramesh, S., Tomlin, C.J. (eds.) Safe, Autonomous and Intelligent Vehicles. UST, pp. 123–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-97301-2_7

    Chapter  Google Scholar 

  38. Xiang, W., Tran, H.D., Johnson, T.T.: Specification-guided safety verification for feedforward neural networks. arXiv preprint arXiv:1812.06161 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sadra Sadraddini .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Proof of Theorem 2

Proof

Let \(X^i_{r}=X^{\max }_{s} \setminus X^i_{s}\). We need to show that with probability one. The same argument holds for \(X^{\max }_{f} \setminus X^i_{f}\). We prove by contradiction. Let \(\lim _{i \rightarrow \infty } X_r^i\) be a measurable set. It follows from Assumption 1 that we obtain a sample from \(X_r^{\infty }\) with non-zero probability. Then we simulate the system forward from the sampled state \(x_0\). Once the trajectory is obtained, the polytopic trajectory is computed, and by Assumption 2, we obtain a measurable \(\mathbb {P}_0\), centered at that has non-empty intersection with \(X_r^{\infty }\). Therefore, by non-zero probability, the volume of \(X_r^{i}\) shrinks in one iteration by at least \(\lambda \). Thus we reach a contradiction.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sadraddini, S., Shen, S., Bastani, O. (2019). Polytopic Trees for Verification of Learning-Based Controllers. In: Zamani, M., Zufferey, D. (eds) Numerical Software Verification. NSV 2019. Lecture Notes in Computer Science(), vol 11652. Springer, Cham. https://doi.org/10.1007/978-3-030-28423-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28423-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28422-0

  • Online ISBN: 978-3-030-28423-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics