Advertisement

Multimedia Tools and Applications

, Volume 78, Issue 14, pp 20409–20429 | Cite as

Integration of statistical detector and Gaussian noise injection detector for adversarial example detection in deep neural networks

  • Weiqi Fan
  • Guangling SunEmail author
  • Yuying Su
  • Zhi Liu
  • Xiaofeng Lu
Article
  • 114 Downloads

Abstract

Deep Neural Networks (DNN) has achieved a great success in many tasks in recent years. However, researchers found that DNN is vulnerable to adversarial examples that are maliciously perturbed inputs. The elaborately designed adversarial perturbations can easily confuse the model whereas have no impacts on human perception. To counter adversarial examples, we propose an integrated detection framework for detecting adversarial examples, which involves statistical detector and Gaussian noise injection detector. The statistical detector extracts Subtractive Pixel Adjacency Matrix (SPAM) and uses the second order Markov transition probability matrix to model SPAM so as to highlight the statistical anomaly hidden in an adversarial input. Then an ensemble classifier using SPAM based feature is applied to detect the adversarial input containing large perturbation. The Gaussian noise injection detector first injects an additive Gaussian noise into the input, and then feeds both the original input and its Gaussian noise injected counterpart into a targeted network. By comparing the two outputs difference, the detector is applied to detect adversarial input containing small perturbation: if the difference exceeds a threshold, the input is adversarial; otherwise legitimate. The two detectors are adaptive to different characteristics of adversarial perturbation so that the proposed detection framework is capable of detecting multiple types of adversarial examples. In our work, we test six categories of adversarial examples produced by Fast Gradient Sign Method (FGSM, untargeted), Randomized Fast Gradient Sign Method (R-FGSM, untargeted), Basic Iterative Method (BIM, untargeted), DeepFool (untargeted), Carlini&Wagner Method (CW_UT, untargeted) and CW_T(targeted). Comprehensive empirical results show that the proposed detection framework has achieved a promising performance on ImageNet database.

Keywords

Deep neural networks Adversarial examples Integrated detection Statistical detector Gaussian noise injection detector 

Notes

Acknowledgements

This work was supported by Shanghai Municipal Natural Science Foundation under Grant No. 16ZR1411100 and the National Natural Science Foundation of China under Grant No. 61771301.

References

  1. 1.
    Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. IEEE Symp Sec Privacy (SP):39–57Google Scholar
  2. 2.
    Das N, Shanbhogue M, Chen S. T, Hohman F, Chen L, Kounavis M. E, Chau D. H (2017) Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv:1705.02900Google Scholar
  3. 3.
    Deng J, Dong W, Socher R, Li L. J, Li K, Fei-Fei L. (2009). Imagenet: a large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255Google Scholar
  4. 4.
    Dziugaite G K, Ghahramani Z, Roy D M (2016) A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv:1608.00853Google Scholar
  5. 5.
    Eykholt K, Evtimov I, Fernandes E, Li B, Rahmati A, Xiao C, Song D (2018) Robust physical-world attacks on deep learning models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Google Scholar
  6. 6.
    Goodfellow I J, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR)Google Scholar
  7. 7.
    Grosse K, Manoharan P, Papernot N, Backes M, McDaniel P (2017) On the (statistical) detection of adversarial examples arXiv preprint arXiv:1702.06280Google Scholar
  8. 8.
    Gryllias KC, Antoniadis IA (2012) A support vector machine approach based on physical model training for rolling element bearing fault detection in industrial environments. Eng Appl Artif Intell 25(2):326–344CrossRefGoogle Scholar
  9. 9.
    Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Therm Sci 14(7):38–39Google Scholar
  10. 10.
    Kodovsky J, Fridrich J, Holub V (2012) Ensemble classifiers for steganalysis of digital media. IEEE Trans Inform Forens Sec 7(2):432–444CrossRefGoogle Scholar
  11. 11.
    Kurakin A, Goodfellow I, Bengio S (2017) Adversarial examples in the physical world. In International Conference on Learning Representations (ICLR) Workshop TrackGoogle Scholar
  12. 12.
    Li X, Li F (2017) Adversarial examples detection in deep networks with convolutional filter statistics. In International Conference on Computer Vision (ICCV) pp. 5775–5783Google Scholar
  13. 13.
    Liu Y, Zheng Y, Liang Y, Liu S, Rosenblum D. S. (2016). Urban water quality prediction based on multi-task multi-view learning. In International joint conference on artificial intelligence (IJCAI), pp. 2576– 2582Google Scholar
  14. 14.
    Liu Y, Zhang L, Nie L, Yan Y, Rosenblum D. S. (2016). Fortune teller: predicting your career path. In American Association for Artificial Intelligence (AAAI). Vol. 2016, pp. 201–207Google Scholar
  15. 15.
    Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115CrossRefGoogle Scholar
  16. 16.
    Lu J, Issaranon T, Forsyth D (2017) Safetynet: detecting and rejecting adversarial examples robustly. In International Conference on Computer Vision (ICCV) pp. 446–454Google Scholar
  17. 17.
    Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083Google Scholar
  18. 18.
    Meng D, Chen H. (2017) Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 135–147Google Scholar
  19. 19.
    Metzen J. H, Genewein T, Fischer V, Bischoff B (2017) On detecting adversarial perturbations. In International Conference on Learning Representations (ICLR)Google Scholar
  20. 20.
    Miyato T, Dai A M, Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725Google Scholar
  21. 21.
    Moosavi Dezfooli S M, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2674–2582Google Scholar
  22. 22.
    Papernot N, McDaniel P, Sinha A, Wellman M (2016) Towards the science of security and privacy in machine learning arXiv preprint arXiv:1611.03814Google Scholar
  23. 23.
    Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE European Symposium on Security and Privacy (SP), pp. 582–597Google Scholar
  24. 24.
    Papernot N, Goodfellow I, Sheatsley R, Feinman R, McDaniel P (2016) cleverhans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768Google Scholar
  25. 25.
    Pevny T, Bas P, Fridrich J (2010) Steganalysis by subtractive pixel adjacency matrix. IEEE Trans Inform Forens Sec 5(2):215–224CrossRefGoogle Scholar
  26. 26.
    Santhanam G. K, Grnarova P. (2018). Defending Against Adversarial Attacks by Leveraging an Entire GAN. arXiv preprint arXiv:1805.10652Google Scholar
  27. 27.
    Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRefGoogle Scholar
  28. 28.
    Sharif M, Bhagavatula S, Bauer L, Reiter M. K (2016) Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, pp. 1528–1540Google Scholar
  29. 29.
    Shen S, Jin G, Gao K, Zhang Y (2017) APE-GAN: Adversarial Perturbation Elimination with GAN, arXiv preprint arXiv preprint arXiv:1707.05474Google Scholar
  30. 30.
    Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR) Workshop TrackGoogle Scholar
  31. 31.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826Google Scholar
  32. 32.
    Tramèr F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P (2018) Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations (ICLR)Google Scholar
  33. 33.
    Xie C, Wang J, Zhang Z, Ren Z, Yuille A (2018) Mitigating adversarial effects through randomization. In International Conference on Learning Representations (ICLR)Google Scholar
  34. 34.
    Xu, Weilin, David Evans, And Yanjun Qi (2017) feature squeezing: detecting adversarial examples in deep neural networks. In Network and Distributed System Security Symposium (NDSS)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Communication and Information Engineering SchoolShanghai UniversityShanghaiChina

Personalised recommendations