DeepSafe: A Data-Driven Approach for Assessing Robustness of Neural Networks

  • Divya Gopinath
  • Guy Katz
  • Corina S. PăsăreanuEmail author
  • Clark Barrett
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11138)


Deep neural networks have achieved impressive results in many complex applications, including classification tasks for image and speech recognition, pattern analysis or perception in self-driving vehicles. However, it has been observed that even highly trained networks are very vulnerable to adversarial perturbations. Adding minimal changes to inputs that are correctly classified can lead to wrong predictions, raising serious security and safety concerns. Existing techniques for checking robustness against such perturbations only consider searching locally around a few individual inputs, providing limited guarantees. We propose DeepSafe, a novel approach for automatically assessing the overall robustness of a neural network. DeepSafe applies clustering over known labeled data and leverages off-the-shelf constraint solvers to automatically identify and check safe regions in which the network is robust, i.e. all the inputs in the region are guaranteed to be classified correctly. We also introduce the concept of targeted robustness, which ensures that the neural network is guaranteed not to misclassify inputs within a region to a specific target (adversarial) label. We evaluate DeepSafe on a neural network implementation of a controller for the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu) and for the well known MNIST network. For these networks, DeepSafe identified many regions which were safe, and also found adversarial perturbations of interest.



This work was partially supported by grants from NASA, NSF, FAA and Intel.


  1. 1.
    Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2001). Scholar
  2. 2.
    Carlini, N., Katz, G., Barrett, C., Dill, D.: Ground-truth adversarial examples. Technical Report (2017). arXiv:1709.10207
  3. 3.
    Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: Proceedings of 38th IEEE Symposium on Security and Privacy (2017)Google Scholar
  4. 4.
    Ehlers, R.: Formal verification of piece-wise linear feed-forward neural networks. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 269–286. Springer, Cham (2017). Scholar
  5. 5.
    Feinman, R., Curtin, R.R., Shintre, S, Gardner, A.B.: Detecting adversarial samples from artifacts. Technical Report (2017). arXiv:1703.00410
  6. 6.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. Technical Report (2014). arXiv:1412.6572
  7. 7.
    Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). Scholar
  8. 8.
    Julian, K., Lopez, J., Brush, J., Owen, M., Kochenderfer, M.: Policy compression for aircraft collision avoidance systems. In: Proceedings of 35th Digital Avionics System Conference (DASC), pp. 1–10 (2016)Google Scholar
  9. 9.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Angela, Y.Wu.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)CrossRefGoogle Scholar
  10. 10.
    Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). Scholar
  11. 11.
    Katz, G., Barrett, C., Dill, D., Julian, K., Kochenderfer, M.: Towards proving the adversarial robustness of deep neural networks. In: Proceedings of 1st Workshop on Formal Verification of Autonomous Vehicles (FVAV), pp. 19–26 (2017)CrossRefGoogle Scholar
  12. 12.
    LeCun, Y., Cortes, C., Burges, C.J.C.: The MNIST database of handwritten digits.
  13. 13.
    Papernot, N., McDaniel, P.D.: On the effectiveness of defensive distillation. Technical Report (2016). arXiv:1607.05113
  14. 14.
    Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: Proceedings of 1st IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387 (2016)Google Scholar
  15. 15.
    Pulina, L., Tacchella, A.: An abstraction-refinement approach to verification of artificial neural networks. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 243–257. Springer, Heidelberg (2010). Scholar
  16. 16.
    Pulina, L., Tacchella, A.: Challenging SMT solvers to verify neural networks. AI Commun. 25(2), 117–135 (2012)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Szegedy, C., et al.: Intriguing properties of neural networks. Technical Report (2013). arXiv:1312.6199

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Divya Gopinath
    • 1
  • Guy Katz
    • 3
  • Corina S. Păsăreanu
    • 1
    • 2
    Email author
  • Clark Barrett
    • 3
  1. 1.Carnegie Mellon UniversityMountain ViewUSA
  2. 2.NASA Ames Research CenterMountain ViewUSA
  3. 3.Stanford UniversityStanfordUSA

Personalised recommendations