Trust, Resilience and Interpretability of AI Models

Jha, Susmit

doi:10.1007/978-3-030-28423-7_1

Susmit Jha¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11652))

Included in the following conference series:

International Workshop on Numerical Software Verification

636 Accesses
1 Citations

Abstract

In this tutorial, we present our recent work on building trusted, resilient and interpretable AI models by combining symbolic methods developed for automated reasoning with connectionist learning methods that use deep neural networks. The increasing adoption of artificial intelligence and machine learning in systems, including safety-critical systems, has created a pressing need for developing scalable techniques that can be used to establish trust over their safe behavior, resilience to adversarial attacks, and interpretability to enable human audits. This tutorial is comprised of three components: review of techniques for verification of neural networks, methods for using geometric invariants to defend against adversarial attacks, and techniques for extracting logical symbolic rules by reverse engineering machine learning models. These techniques form the core of TRINITY: Trusted, Resilient and Interpretable AI framework being developed at SRI. In this tutorial, we identify the key challenges in building the TRINITY framework, and report recent results on each of these three fronts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abouzied, A., Angluin, D., Papadimitriou, C., Hellerstein, J.M., Silberschatz, A.: Learning and verifying quantified boolean queries by example. In: ACM Symposium on Principles of Database Systems, pp. 49–60. ACM (2013)
Google Scholar
Angluin, D.: Computational learning theory: survey and selected bibliography. In: ACM Symposium on Theory of Computing, pp. 351–369. ACM (1992)
Google Scholar
Angluin, D., Kharitonov, M.: When won’t membership queries help? In: ACM Symposium on Theory of Computing, pp. 444–454. ACM (1991)
Google Scholar
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
Bengio, Y., Mesnil, G., Dauphin, Y., Rifai, S.: Better mixing via deep representations. In: International Conference on Machine Learning, pp. 552–560 (2013)
Google Scholar
Bittner, B., Bozzano, M., Cimatti, A., Gario, M., Griggio, A.: Towards pareto-optimal parameter synthesis for monotonie cost functions. In: FMCAD, pp. 23–30, October 2014
Google Scholar
Boigelot, B., Godefroid, P.: Automatic synthesis of specifications from the dynamic observation of reactive programs. In: Brinksma, E. (ed.) TACAS 1997. LNCS, vol. 1217, pp. 321–333. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0035397
Chapter Google Scholar
Botinčan, M., Babić, D.: Sigma*: symbolic learning of input-output specifications. In: POPL, pp. 443–456 (2013). https://doi.org/10.1145/2429069.2429123
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644 (2016)
Cook, B., Kroening, D., Rümmer, P., Wintersteiger, C.M.: Ranking function synthesis for bit-vector relations. FMSD 43(1), 93–120 (2013). https://doi.org/10.1007/s10703-013-0186-4
Article MATH Google Scholar
Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Learning and verification of feedback control systems using feedforward neural networks. IFAC-PapersOnLine 51(16), 151–156 (2018)
Article Google Scholar
Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 121–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77935-5_9
Chapter Google Scholar
Ehrenfeucht, A., Haussler, D., Kearns, M., Valiant, L.: A general lower bound on the number of examples needed for learning. Inf. Comput. 82(3), 247–261 (1989). https://doi.org/10.1016/0890-5401(89)90002-3
Article MathSciNet MATH Google Scholar
Elizalde, F., Sucar, E., Noguez, J., Reyes, A.: Generating explanations based on Markov decision processes. In: Aguirre, A.H., Borja, R.M., Garciá, C.A.R. (eds.) MICAI 2009. LNCS, vol. 5845, pp. 51–62. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05258-3_5
Chapter Google Scholar
Feng, C., Muggleton, S.: Towards inductive generalisation in higher order logic. In: 9th International Workshop on Machine Learning, pp. 154–162 (2014)
Chapter Google Scholar
Gardner, J.R., et al.: Deep manifold traversal: changing labels with convolutional features. arXiv preprint arXiv:1511.06421 (2015)
Godefroid, P., Taly, A.: Automated synthesis of symbolic instruction encodings from I/O samples. SIGPLAN Not. 47(6), 441–452 (2012). https://doi.org/10.1145/2345156.2254116
Article Google Scholar
Goldsmith, J., Sloan, R.H., Szörényi, B., Turán, G.: Theory revision with queries: horn, read-once, and parity formulas. Artif. Intell. 156(2), 139–176 (2004)
Article MathSciNet Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.: On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017)
Gurfinkel, A., Belov, A., Marques-Silva, J.: Synthesizing safe bit-precise invariants. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 93–108. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_7
Chapter Google Scholar
Harbers, M., Meyer, J.J., van den Bosch, K.: Explaining simulations through self explaining agents. J. Artif. Soc. Soc. Simul. (2010). http://EconPapers.repec.org/RePEc:jas:jasssj:2009-25-1
Hellerstein, L., Servedio, R.A.: On PAC learning algorithms for rich boolean function classes. Theoret. Comput. Sci. 384(1), 66–76 (2007)
Article MathSciNet Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Jha, S., Gulwani, S., Seshia, S.A., Tiwari, A.: Oracle-guided component-based program synthesis. In: ICSE, pp. 215–224. IEEE (2010)
Google Scholar
Jha, S., Jang, U., Jha, S., Jalaian, B.: Detecting adversarial examples using data manifolds. In: 2018 IEEE Military Communications Conference (MILCOM), MILCOM 2018, pp. 547–552. IEEE (2018)
Google Scholar
Jha, S., Raman, V., Pinto, A., Sahai, T., Francis, M.: On learning sparse boolean formulae for explaining AI decisions. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 99–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57288-8_7
Chapter Google Scholar
Jha, S., Sahai, T., Raman, V., Pinto, A., Francis, M.: Explaining AI decisions using efficient methods for learning sparse boolean formulae. J. Autom. Reason. 1–21 (2018)
Google Scholar
Jha, S., Seshia, S.A.: A theory of formal synthesis via inductive learning. Acta Informatica 54, 693–726 (2016). Special Issue on Synthesis
Article MathSciNet Google Scholar
Jones, M.C., Marron, J.S., Sheather, S.J.: A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 91(433), 401–407 (1996)
Article MathSciNet Google Scholar
Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Chapter Google Scholar
Kearns, M., Li, M., Valiant, L.: Learning boolean formulas. J. ACM 41(6), 1298–1328 (1994)
Article MathSciNet Google Scholar
Kearns, M., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. J. ACM (JACM) 41(1), 67–95 (1994)
Article MathSciNet Google Scholar
Kos, J., Fischer, I., Song, D.: Adversarial examples for generative models. arXiv preprint arXiv:1702.06832 (2017)
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset (2014). http://www.cs.toronto.edu/kriz/cifar.html
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006)
Book Google Scholar
LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
Lee, J., Moray, N.: Trust, control strategies and allocation of function in human-machine systems. Ergonomics 35(10), 1243–1270 (1992)
Article Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MATH Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Mansour, Y.: Learning boolean functions via the Fourier transform. In: Roychowdhury, V., Siu, K.Y., Orlitsky, A. (eds.) Theoretical Advances in Neural Computation and Learning, pp. 391–424. Springer, Boston (1994). https://doi.org/10.1007/978-1-4615-2696-4_11
Chapter Google Scholar
Metzen, J.H., Genewein, T., Fischer, V., Bischoff, B.: On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Papernot, N., et al.: CleverHans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768 (2016)
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519. ACM (2017)
Google Scholar
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE (2016)
Google Scholar
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: 2016 IEEE Military Communications Conference, MILCOM 2016, pp. 49–54. IEEE (2016)
Google Scholar
Pitt, L., Valiant, L.G.: Computational limitations on learning from examples. J. ACM (JACM) 35(4), 965–984 (1988)
Article MathSciNet Google Scholar
Raman, V., Lignos, C., Finucane, C., Lee, K.C.T., Marcus, M.P., Kress-Gazit, H.: Sorry Dave, I’m afraid I can’t do that: explaining unachievable robot tasks using natural language. In: Robotics: Science and Systems (2013)
Google Scholar
Reynolds, A., Deters, M., Kuncak, V., Tinelli, C., Barrett, C.: Counterexample-guided quantifier instantiation for synthesis in SMT. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9207, pp. 198–216. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21668-3_12
Chapter Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: explaining the predictions of any classifier. In: KDD, pp. 1135–1144 (2016). https://doi.org/10.1145/2939672.2939778
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Sankaranarayanan, S., Miller, C., Raghunathan, R., Ravanbakhsh, H., Fainekos, G.: A model-based approach to synthesizing insulin infusion pump usage parameters for diabetic patients. In: Annual Allerton Conference on Communication, Control, and Computing, pp. 1610–1617, October 2012. https://doi.org/10.1109/Allerton.2012.6483413
Sankaranarayanan, S.: Automatic invariant generation for hybrid systems using ideal fixed points. In: HSCC, pp. 221–230 (2010). https://doi.org/10.1145/1755952.1755984
Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Constructing invariants for hybrid systems. FMSD 32(1), 25–55 (2008). https://doi.org/10.1007/s10703-007-0046-1
Article MATH Google Scholar
Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4(Jun), 119–155 (2003)
MathSciNet MATH Google Scholar
Shaham, U., Yamada, Y., Negahban, S.: Understanding adversarial training: increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432 (2015)
Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. KIS 41(3), 647–665 (2014). https://doi.org/10.1007/s10115-013-0679-x
Article Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Urban, C., Gurfinkel, A., Kahsai, T.: Synthesizing ranking functions from bits and pieces. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 54–70. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_4
Chapter Google Scholar
Yuan, C., Lim, H., Lu, T.C.: Most relevant explanation in Bayesian networks. J. Artif. Intell. Res. (JAIR) 42, 309–352 (2011)
MathSciNet MATH Google Scholar

Download references

Acknowledgement

The author acknowledges support from the US ARL Cooperative Agreement W911NF-17-2-0196 on Internet of Battlefield Things (IoBT) and National Science Foundation (NSF) #1750009 and #1740079.

Author information

Authors and Affiliations

Computer Science Laboratory, SRI International, Menlo Park, USA
Susmit Jha

Authors

Susmit Jha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susmit Jha .

Editor information

Editors and Affiliations

University of Colorado Boulder, Boulder, CO, USA
Majid Zamani
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Damien Zufferey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jha, S. (2019). Trust, Resilience and Interpretability of AI Models. In: Zamani, M., Zufferey, D. (eds) Numerical Software Verification. NSV 2019. Lecture Notes in Computer Science(), vol 11652. Springer, Cham. https://doi.org/10.1007/978-3-030-28423-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-28423-7_1
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28422-0
Online ISBN: 978-3-030-28423-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics