Trust, Resilience and Interpretability of AI Models

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11652)


In this tutorial, we present our recent work on building trusted, resilient and interpretable AI models by combining symbolic methods developed for automated reasoning with connectionist learning methods that use deep neural networks. The increasing adoption of artificial intelligence and machine learning in systems, including safety-critical systems, has created a pressing need for developing scalable techniques that can be used to establish trust over their safe behavior, resilience to adversarial attacks, and interpretability to enable human audits. This tutorial is comprised of three components: review of techniques for verification of neural networks, methods for using geometric invariants to defend against adversarial attacks, and techniques for extracting logical symbolic rules by reverse engineering machine learning models. These techniques form the core of TRINITY: Trusted, Resilient and Interpretable AI framework being developed at SRI. In this tutorial, we identify the key challenges in building the TRINITY framework, and report recent results on each of these three fronts.



The author acknowledges support from the US ARL Cooperative Agreement W911NF-17-2-0196 on Internet of Battlefield Things (IoBT) and National Science Foundation (NSF) #1750009 and #1740079.


  1. 1.
    Abouzied, A., Angluin, D., Papadimitriou, C., Hellerstein, J.M., Silberschatz, A.: Learning and verifying quantified boolean queries by example. In: ACM Symposium on Principles of Database Systems, pp. 49–60. ACM (2013)Google Scholar
  2. 2.
    Angluin, D.: Computational learning theory: survey and selected bibliography. In: ACM Symposium on Theory of Computing, pp. 351–369. ACM (1992)Google Scholar
  3. 3.
    Angluin, D., Kharitonov, M.: When won’t membership queries help? In: ACM Symposium on Theory of Computing, pp. 444–454. ACM (1991)Google Scholar
  4. 4.
    Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
  5. 5.
    Bengio, Y., Mesnil, G., Dauphin, Y., Rifai, S.: Better mixing via deep representations. In: International Conference on Machine Learning, pp. 552–560 (2013)Google Scholar
  6. 6.
    Bittner, B., Bozzano, M., Cimatti, A., Gario, M., Griggio, A.: Towards pareto-optimal parameter synthesis for monotonie cost functions. In: FMCAD, pp. 23–30, October 2014Google Scholar
  7. 7.
    Boigelot, B., Godefroid, P.: Automatic synthesis of specifications from the dynamic observation of reactive programs. In: Brinksma, E. (ed.) TACAS 1997. LNCS, vol. 1217, pp. 321–333. Springer, Heidelberg (1997). Scholar
  8. 8.
    Botinčan, M., Babić, D.: Sigma*: symbolic learning of input-output specifications. In: POPL, pp. 443–456 (2013).
  9. 9.
    Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644 (2016)
  10. 10.
    Cook, B., Kroening, D., Rümmer, P., Wintersteiger, C.M.: Ranking function synthesis for bit-vector relations. FMSD 43(1), 93–120 (2013). Scholar
  11. 11.
    Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Learning and verification of feedback control systems using feedforward neural networks. IFAC-PapersOnLine 51(16), 151–156 (2018)CrossRefGoogle Scholar
  12. 12.
    Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 121–138. Springer, Cham (2018). Scholar
  13. 13.
    Ehrenfeucht, A., Haussler, D., Kearns, M., Valiant, L.: A general lower bound on the number of examples needed for learning. Inf. Comput. 82(3), 247–261 (1989). Scholar
  14. 14.
    Elizalde, F., Sucar, E., Noguez, J., Reyes, A.: Generating explanations based on Markov decision processes. In: Aguirre, A.H., Borja, R.M., Garciá, C.A.R. (eds.) MICAI 2009. LNCS, vol. 5845, pp. 51–62. Springer, Heidelberg (2009). Scholar
  15. 15.
    Feng, C., Muggleton, S.: Towards inductive generalisation in higher order logic. In: 9th International Workshop on Machine Learning, pp. 154–162 (2014)CrossRefGoogle Scholar
  16. 16.
    Gardner, J.R., et al.: Deep manifold traversal: changing labels with convolutional features. arXiv preprint arXiv:1511.06421 (2015)
  17. 17.
    Godefroid, P., Taly, A.: Automated synthesis of symbolic instruction encodings from I/O samples. SIGPLAN Not. 47(6), 441–452 (2012). Scholar
  18. 18.
    Goldsmith, J., Sloan, R.H., Szörényi, B., Turán, G.: Theory revision with queries: horn, read-once, and parity formulas. Artif. Intell. 156(2), 139–176 (2004)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
  20. 20.
    Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.: On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017)
  21. 21.
    Gurfinkel, A., Belov, A., Marques-Silva, J.: Synthesizing safe bit-precise invariants. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 93–108. Springer, Heidelberg (2014). Scholar
  22. 22.
    Harbers, M., Meyer, J.J., van den Bosch, K.: Explaining simulations through self explaining agents. J. Artif. Soc. Soc. Simul. (2010).
  23. 23.
    Hellerstein, L., Servedio, R.A.: On PAC learning algorithms for rich boolean function classes. Theoret. Comput. Sci. 384(1), 66–76 (2007)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  25. 25.
    Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided component-based program synthesis. In: ICSE, pp. 215–224. IEEE (2010)Google Scholar
  26. 26.
    Jha, S., Jang, U., Jha, S., Jalaian, B.: Detecting adversarial examples using data manifolds. In: 2018 IEEE Military Communications Conference (MILCOM), MILCOM 2018, pp. 547–552. IEEE (2018)Google Scholar
  27. 27.
    Jha, S., Raman, V., Pinto, A., Sahai, T., Francis, M.: On learning sparse boolean formulae for explaining AI decisions. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 99–114. Springer, Cham (2017). Scholar
  28. 28.
    Jha, S., Sahai, T., Raman, V., Pinto, A., Francis, M.: Explaining AI decisions using efficient methods for learning sparse boolean formulae. J. Autom. Reason. 1–21 (2018)Google Scholar
  29. 29.
    Jha, S., Seshia, S.A.: A theory of formal synthesis via inductive learning. Acta Informatica 54, 693–726 (2016). Special Issue on SynthesisMathSciNetCrossRefGoogle Scholar
  30. 30.
    Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91(433), 401–407 (1996)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)
  32. 32.
    Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). Scholar
  33. 33.
    Kearns, M., Li, M., Valiant, L.: Learning boolean formulas. J. ACM 41(6), 1298–1328 (1994)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Kearns, M., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. J. ACM (JACM) 41(1), 67–95 (1994)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Kos, J., Fischer, I., Song, D.: Adversarial examples for generative models. arXiv preprint arXiv:1702.06832 (2017)
  36. 36.
    Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset (2014).
  37. 37.
    Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
  38. 38.
    LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006)CrossRefGoogle Scholar
  39. 39.
    LeCun, Y.: The MNIST database of handwritten digits (1998).
  40. 40.
    Lee, J., Moray, N.: Trust, control strategies and allocation of function in human-machine systems. Ergonomics 35(10), 1243–1270 (1992)CrossRefGoogle Scholar
  41. 41.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)zbMATHGoogle Scholar
  42. 42.
    Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
  43. 43.
    Mansour, Y.: Learning boolean functions via the Fourier transform. In: Roychowdhury, V., Siu, K.Y., Orlitsky, A. (eds.) Theoretical Advances in Neural Computation and Learning, pp. 391–424. Springer, Boston (1994). Scholar
  44. 44.
    Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017)
  45. 45.
    Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)Google Scholar
  46. 46.
    Papernot, N., et al.: CleverHans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768 (2016)
  47. 47.
    Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519. ACM (2017)Google Scholar
  48. 48.
    Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE (2016)Google Scholar
  49. 49.
    Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: 2016 IEEE Military Communications Conference, MILCOM 2016, pp. 49–54. IEEE (2016)Google Scholar
  50. 50.
    Pitt, L., Valiant, L.G.: Computational limitations on learning from examples. J. ACM (JACM) 35(4), 965–984 (1988)MathSciNetCrossRefGoogle Scholar
  51. 51.
    Raman, V., Lignos, C., Finucane, C., Lee, K.C.T., Marcus, M.P., Kress-Gazit, H.: Sorry Dave, I’m afraid I can’t do that: explaining unachievable robot tasks using natural language. In: Robotics: Science and Systems (2013)Google Scholar
  52. 52.
    Reynolds, A., Deters, M., Kuncak, V., Tinelli, C., Barrett, C.: Counterexample-guided quantifier instantiation for synthesis in SMT. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9207, pp. 198–216. Springer, Cham (2015). Scholar
  53. 53.
    Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144 (2016).
  54. 54.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  55. 55.
    Sankaranarayanan, S., Miller, C., Raghunathan, R., Ravanbakhsh, H., Fainekos, G.: A model-based approach to synthesizing insulin infusion pump usage parameters for diabetic patients. In: Annual Allerton Conference on Communication, Control, and Computing, pp. 1610–1617, October 2012.
  56. 56.
    Sankaranarayanan, S.: Automatic invariant generation for hybrid systems using ideal fixed points. In: HSCC, pp. 221–230 (2010).
  57. 57.
    Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Constructing invariants for hybrid systems. FMSD 32(1), 25–55 (2008). Scholar
  58. 58.
    Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4(Jun), 119–155 (2003)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Shaham, U., Yamada, Y., Negahban, S.: Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432 (2015)
  60. 60.
    Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. KIS 41(3), 647–665 (2014). Scholar
  61. 61.
    Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  62. 62.
    Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  63. 63.
    Urban, C., Gurfinkel, A., Kahsai, T.: Synthesizing ranking functions from bits and pieces. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 54–70. Springer, Heidelberg (2016). Scholar
  64. 64.
    Yuan, C., Lim, H., Lu, T.C.: Most relevant explanation in Bayesian networks. J. Artif. Intell. Res. (JAIR) 42, 309–352 (2011)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computer Science LaboratorySRI InternationalMenlo ParkUSA

Personalised recommendations