Recovering Localized Adversarial Attacks

Göpfert, Jan Philip; Wersing, Heiko; Hammer, Barbara

doi:10.1007/978-3-030-30487-4_24

Jan Philip Göpfert¹²,
Heiko Wersing¹³ &
Barbara Hammer¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11727))

Included in the following conference series:

International Conference on Artificial Neural Networks

2905 Accesses
2 Citations
2 Altmetric

Abstract

Deep convolutional neural networks have achieved great successes over recent years, particularly in the domain of computer vision. They are fast, convenient, and – thanks to mature frameworks – relatively easy to implement and deploy. However, their reasoning is hidden inside a black box, in spite of a number of proposed approaches that try to provide human-understandable explanations for the predictions of neural networks. It is still a matter of debate which of these explainers are best suited for which situations, and how to quantitatively evaluate and compare them [1]. In this contribution, we focus on the capabilities of explainers for convolutional deep neural networks in an extreme situation: a setting in which humans and networks fundamentally disagree. Deep neural networks are susceptible to adversarial attacks that deliberately modify input samples to mislead a neural network’s classification, without affecting how a human observer interprets the input. Our goal with this contribution is to evaluate explainers by investigating whether they can identify adversarially attacked regions of an image. In particular, we quantitatively and qualitatively investigate the capability of three popular explainers of classifications – classic salience, guided backpropagation, and LIME – with respect to their ability to identify regions of attack as the explanatory regions for the (incorrect) prediction in representative examples from image classification. We find that LIME outperforms the other explainers.

This work was supported by Honda Research Institute Europe GmbH, Offenbach am Main, Germany.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mohseni, S., Zarei, N., Ragan, E.D.: A survey of evaluation methods and measures for interpretable machine learning (2018). arXiv:1811.11839
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Fischer, L., Hammer, B., Wersing, H.: Optimal local rejection for classifiers. Neurocomputing 214, 445–457 (2016)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). arXiv:1602.04938
Samek, W., Wiegand, T., Müller, K.-R.: Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models (2017). arXiv:1708.08296
Schulz, A., Gisbrecht, A., Hammer, B.: Using discriminative dimensionality reduction to visualize classifiers. Neural Process. Lett. 42, 27–54 (2014)
Article Google Scholar
Göpfert, J.P., Wersing, H., Hammer, B.: Adversarial attacks hidden in plain sight (2019). arXiv:1902.09286
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision (ICCV), pp. 618–626 (2017)
Google Scholar
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net (2014). arXiv:1412.6806
Rauber, J., Brendel, W., Bethge, M.: Foolbox: a python toolbox to benchmark the robustness of machine learning models (2017). arXiv:1707.04131
Papernot, N., et al.: Technical report on the CleverHans v2.1.0 Adversarial Examples Library (2016). arXiv:1610.00768
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014). arXiv:1412.6572
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial machine learning at scale (2016). arXiv:1611.01236
Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013)
Article Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 167–181 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bielefeld University, Research Institute for Cognition and Robotics, Universitätsstraße 25, 33615, Bielefeld, Germany
Jan Philip Göpfert & Barbara Hammer
Honda Research Institute Europe GmbH, Carl-Legien-Straße 30, 63065, Offenbach, Germany
Heiko Wersing

Authors

Jan Philip Göpfert
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Wersing
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Hammer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Philip Göpfert .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Göpfert, J.P., Wersing, H., Hammer, B. (2019). Recovering Localized Adversarial Attacks. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation. ICANN 2019. Lecture Notes in Computer Science(), vol 11727. Springer, Cham. https://doi.org/10.1007/978-3-030-30487-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-30487-4_24
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30486-7
Online ISBN: 978-3-030-30487-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics