One-Pixel Adversarial Example that Is Safe for Friendly Deep Neural Networks
Deep neural networks (DNNs) offer superior performance in machine learning tasks such as image recognition, speech recognition, pattern analysis, and intrusion detection. In this paper, we propose a one-pixel adversarial example that is safe for friendly deep neural networks. By modifying only one pixel, our proposed method generates a one-pixel-safe adversarial example that can be misclassified by an enemy classifier and correctly classified by a friendly classifier. To verify the performance of the proposed method, we used the CIFAR-10 dataset, ResNet model classifiers, and the Tensorflow library in our experiments. Results show that the proposed method modified only one pixel to achieve success rates of 13.5% and 26.0% in targeted and untargeted attacks, respectively. The success rate is slightly lower than that of the conventional one-pixel method, which has success rates of 15% and 33.5% in targeted and untargeted attacks, respectively; however, this method protects 100% of the friendly classifiers. In addition, if the proposed method modifies five pixels, this method can achieve success rates of 20.5% and 52.0% in targeted and untargeted attacks, respectively.
KeywordsDeep neural network (DNN) Adversarial example One-pixel attack Differential evolution (DE)
This work was supported by National Research Foundation (NRF) of Korea grants funded by the Korean government (MSIT) (2016R1A4A1011761 and 2017R1A2B4006026) and an Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korean government (MSIT) (No. 2016-0-00173).
- 1.Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. OSDI. 16, 265–283 (2016)Google Scholar
- 2.Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)Google Scholar
- 4.Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)Google Scholar
- 5.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
- 7.Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset (2014). http://www.cs.toronto.edu/kriz/cifar.html
- 8.Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. In: ICLR Workshop (2017)Google Scholar
- 10.Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)Google Scholar
- 11.Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE (2016)Google Scholar
- 14.Su, J., Vargas, D.V., Kouichi, S.: One pixel attack for fooling deep neural networks. arXiv preprint arXiv:1710.08864 (2017)
- 15.Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014). http://arxiv.org/abs/1312.6199