Keywords

1 Introduction

Deep learning algorithms have been revolutionary in improving the performance of a wide range of applications, including computer vision, speech processing, and natural language processing. In particular, Convolutional Neural Networks (CNNs) have been extremely successful in detecting and recognizing the content of images [8, 20, 22]. Due to the success of deep learning, many companies including Amazon [1] and Naver [2] have unveiled image recognition and analysis APIs to be used for various applications. However, as Szegedy et al. [23] and Goodfellow et al. [7] showed that an imperceptible small perturbation to an input image can arbitrarily change the prediction of a deep learning-based classifier. These examples are referred to as adversarial examples, which optimize perturbations to maximize prediction errors. Moreover, Goodfellow et al. [7] showed that these adversarial examples are not difficult to generate, and are robust and generalizable. Therefore, the robustness and stability of DNNs when facing adversarial examples have recently drawn the attention of many researchers [6, 7, 23, 25]. In particular, adversarial examples can be a serious threat to image processing applications such as airport security systems, self-driving cars, and user identification for financial transaction systems.

In this work, unlike other DNN-based attack methods [6], we propose an alternative approach to generate adversarial images using an evolutionary genetic algorithm (GA) to deceive the DNN based state-of-the-art image recognition APIs. We perform GenAttack, a simple yet powerful practical black-box attack using our GA to fool commercial APIs, and show that those commercial APIs are easily fooled with a high probability of success. Our contributions are summarized below:

  1. 1.

    We propose GenAttack, an attack algorithm using GA to generate adversarial images. We test GenAttack against larger, and more complex realistic images ranging from 200 \(\times \) 300 to 2,100 \(\times \) 2,800 pixels, unlike other research that utilizes the small image sizes. GenAttack adopts the heuristic optimization method so that it can easily deal with large number of pixels in parallel.

  2. 2.

    We evaluate our attacks with state-of-the-art commercial celebrity detection APIs from Amazon [1] and Naver [2] as representative test cases. Our approach effectively creates adversarial images and deceives Amazon and Naver APIs with 86.6% and 100% success rate.

  3. 3.

    We also show transfer learning of an adversarial image. We demonstrate that an adversarial example successfully fools one classifier (e.g. Naver) can be used to fool another classifier (e.g. Amazon), which could not be deceived originally. Therefore, transfer learning can be maliciously used to fool a classifier more effectively.

This paper is organized as follows. We discuss related work of adversarial examples in Sect. 2. We explain our GA and GenAttack in Sect. 3, and describe our experiment in Sect. 4. Section 5 presents the results of our evaluation of GenAttack. In Sect. 6, additional experiment for transfer learning is presented. We provide the possible defense mechanism, discussion, and limitations in Sect. 7. Finally, Sect. 8 offers conclusions.

2 Related Work

Adversarial examples [23] are examples, which machine learning models misclassify, where those examples are only slightly different from correctly classified examples. Applying an imperceptible perturbation to a test image can produce an adversarial example. Adversarial examples were first discussed and used against conventional machine learning algorithms by Barreno et al. [3] to evade handcrafted features. In particular, Biggio et al. [4] created adversarial examples for a linear classifier, SVM, and neural network using a gradient-based method. Szegedy et al. [23] first introduced the adversarial examples for the deep neural networks by adding small perturbations on the input images. They used the white-box L-BFGS method to generate adversarial examples using MNIST, ImageNet, AlexNet, and QuocNet with high probability. Since L-BFGS uses an expensive linear search, Fast Gradient Sign Method (FGSM) was proposed by Goodfellow et al. [7], which can be computed using back-propagation. RAND-FGSM [24] is proposed to add randomness during the gradient update. Papernot et al. [18] presented an efficient saliency adversarial map (JSMA). Their approach can find the input features that make the most significant change to the output so that a small portion of features can fool DNNs. DeepFool was proposed by Moosavi-Dezfooli et al. [15], which determines the closest distance from original input to the decision boundary, and performs an iterative attack to approximate the perturbation. Moreover, Carlini and Wager [6] showed that the back propagation method with DNNs can generate adversarial examples more effectively, and demonstrated that existing defense methods are not effective. Papernot et al. [17] introduced the practical black-box attack approach. Their approach consists of training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. But, their evaluation results are based on the trivial MNIST dataset. Nguyen et al. [16] implemented the evolutionary algorithm to generate images that humans cannot recognize but DNNs can. In addition, Vidnerová and Neruda [25] showed that the evolutionary method can generate adversarial examples from random noise. But they only tested classifiers to detect the trivial 0–9 digit images. Hosseini et al. [10] showed that Google’s Cloud Vision API can be deceived by images added with random noise. Our approach is more sophisticated than the approach by Hosseini et al. [10] which simply adds uniform noise. In our approach, the GA locally optimizes the noise level at each iteration. Our advantage is that we generate adversarial images more effectively. We provide the comparison between our and random noise distribution in Sect. 5. Network distillation was proposed by Papernot et al. [19] to reduce the size of DNNs by extracting knowledge from DNNs to improve the robustness by 0.5% to 5% on MNIST and CIFAR10 dataset, respectively. Goodfellow et al. [7] and Huang et al. [11] introduce the adversarial training, an approach to include adversarial examples in the training stage. They incorporated adversarial examples in training sets, and showed it improved the robustness. Tramèr et al. [24] proposed Ensemble Adversarial Training method to train a defense model with adversarial examples generated from multiple sources. However, they found that it is difficult to anticipate specific adversarial examples and include those during the training stage. Madry et al. [14] proved that adversarial training with large network capacity can defend the first-order adversary. Also, adversarial detection [5, 13, 21] have been proposed by others to detect adversarial examples during testing.

3 Design of Our Approach

First, we define the adversarial example problem and the objective of our approach. Next, we present the details of the GA to generate adversarial examples, and GenAttack to deceive commercial APIs.

3.1 Adversarial Examples for Image Classification

Szegedy et al. [23] shows the existence of targeted adversarial examples as follows: given a valid input image \(\mathbb {I}\), and a target \(t \ne C^*(\mathbb {I})\), it is possible to find a similar input \(\mathbb {I'}\) such that \(C^*(\mathbb {I'})=t\), yet \(\mathbb {I}\) and \(\mathbb {I'}\) are close according to some distance metric. In Untargeted adversarial examples, attacks only search for an input \(\mathbb {I'}\) so that \(C(\mathbb {I}) \ne C^*(\mathbb {I'})\) and \(\mathbb {I}\) and \(\mathbb {I'}\) are close. Then, finding adversarial examples can formulated as follows similar to [23, 26]:

$$\begin{aligned}&\min _{\mathbb {I'}} ||\mathbb {I'}-\mathbb {I}|| \nonumber \\&s.t.\quad C(\mathbb {I}) \ne C^*(\mathbb {I'}), \end{aligned}$$
(1)

where \(||\cdot ||\) is the distance between two samples, and C is a trained deep learning image classifier. The goal is to find the input \(\mathbb {I'}\) minimizes the distance between \(\mathbb {I}\) with small perturbations. We aim to find adversarial examples for an untargeted case, where we find an image, \(\mathbb {I'}\) that C misclassifies from \(\mathbb {I}\) to \(\mathbb {I'}\).

Fig. 1.
figure 1

Amazon API misclassifies the noise-added Audrey Hepburn image \({\mathbb {I}}'\) to Jack Kamen, while it correctly classifies the original image \({\mathbb {I}}\) to Audrey Hepburn

3.2 Creating Adversarial Examples Using Genetic Algorithm (GA)

In order to perform a black-box attack, we develop GA to effectively generate adversarial images against commercial APIs without access to any of their DNN model parameters, and do not require any GPU resources. The goal of our GA is to inject a small amount of optimum noise to an original image so that commercial APIs misclassify the original image, while humans can still easily recognize the original celebrity as shown in Fig. 1. We formulate our GA as follows:

Population and Individuals: A population is a set of individuals, and they are defined as uniform noise matrices, where their size is the same as the original input celebrity image. To produce the noise-added adversarial images from the noise matrices, we use the modified method based on Carlini and Wagner [6], as follows:

$$\begin{aligned} \mathbb {X} = \tanh ( \tanh ^{-1} ( \frac{\mathbb {I}}{\mathbb {I}_{max}} - 0.5 ) + \alpha \times \mathbb {\mathbb {N}} ) \end{aligned}$$
(2)
$$\begin{aligned} {\mathbb {I}}' = \frac{(\mathbb {X}-\min (\mathbb {X}))}{(\max (\mathbb {X})-\min (\mathbb {X}))} \times \mathbb {I}_{max} \end{aligned}$$
(3)

In Eq. 2, we transform an original (target) image \(\mathbb {I}\) to \(\tanh ^{-1}\) space, and map it from \(-0.5\) to 0.5 range by dividing by \(\mathbb {I}_{max}\) and subtracting 0.5, where \(\mathbb {I}_{max}\) is the maximum RGB pixel value. Next, we add \(\mathbb {I}\) with a noise matrix \(\mathbb {N}\) multiplied by the coefficient \(\alpha \). Then, we re-transform the noise added image back to the original space to obtain the adversarial example \({\mathbb {I}}'\) in Eq. 3. As shown in Fig. 1, \(\alpha \) adjusts a noise level in generating an adversarial image, and \(\alpha \) is searched from multiplying by 2 or subtracting 0.05 in [0.0, 0.9] interval. Generally, a higher \(\alpha \) increases the success rate of our attack, however, it will produce a very noisy image. Hence, minimize the noise amount, \(\alpha \), using the following fitness function.

Fitness Function: We use the following \(L_1\) loss as a distance measure between the original image \({\mathbb {I}}\), and the adversarial image \({\mathbb {I}}'\):

$$\begin{aligned} L_1\ = \frac{1}{n} \sum | ( \mathbb {I} - { \mathbb {I} }' ) |, \end{aligned}$$
(4)

where n is the number of pixels in the image \(\mathbb {I}\). Then, we define the fitness function f as follows in Eq. 5:

$$\begin{aligned} f=P_{o} - P_{d} + \gamma \times L_{1}, \end{aligned}$$
(5)

where \(P_{o}\) is the predicted probability for the original label and \(P_{d}\) is the predicted probability for any other wrong labels. We can obtain either one of \(P_{o}\) or \(P_{d}\) and set the other to zero, because the commercial APIs only return the highest probability of one of \(P_{o}\) or \(P_{d}\). Next, we formulate our GA as a minimization problem with the value of the fitness function in Eq. 5. to produce the best individual which has high \(P_{d}\), and the low \(P_{o}\), and \(L_{1}\) values. In Eq. 5, \(\gamma \) is another coefficient to balance the noise amount to deceive APIs by guiding a GA to find adversarial images with the least amount of noise, where \(\gamma \) can be chosen from 0.01 to 0.1 in this work. Also, we automatically choose \(\gamma \), which is inversely proportional to \(\alpha \), because \(P_{o}\) and \(P_{d}\) always have the values between 0 to 1. In a default setting, we run 5 epochs to generate an adversarial example for a target image after fixing \(\alpha \), where \(\alpha \) requires from several ten to three hundred steps. The number of steps in one epoch – children generated by crossover and then accepted to inherit to the next generation – is the same to the number of populations. The number of API calls per each step in the algorithm will be affected by the chance how much mutation will be called.

Selection: We implement a tournament selection, where we set the tournament size to four. Then, two of four individuals in one tournament will be selected. In our design, the more fit has 80% chance to win, and the less fit has 20% chance to maintain a good variety in the population and explore wider search areas to find a global optimum. After selection, two chosen parents move to the next crossover stage.

Crossover and Inheritance: Crossover permutes two selected parents. We design a simple crossover for 2D matrix as shown in Fig. 2. First, we obtain a random point (x, y) in the noise matrices of two selected individuals. We use this point as an origin point to start. Next, we throw a tetrahedron die. If we get N, between 1 to 4, the quadrant N of the noise matrices will be exchanged between two individuals.

Then, the newly generated children are chosen to inherit to the next generation, if they have a better fitness than their parents. To conserve the best fit individuals and not to lose them, we also added the following inheritance heuristics: if the best individual in the current generation is better than any individuals in the next generation, we copy the best individual in the current generation to the next generation.

Mutation: Mutation aims to reduce the noise level of adversarial images. We design two mutation methods based on the class labels of the newly produced individuals. The first mutation method is used, when the noise added image is still classified into the original class. Then, we add a small amount of random noise to individuals to produce more variations. We use the second mutation method, when a noise injected image is classified into another class. In this case, we try to reduce the noise slightly by using the following local optimization technique: We randomly choose 2% of the pixels in the noise matrix, and reduce their magnitude by 30%. If successful, we repeat the same process for up to 5 more times. In this local optimization step, we only accept mutated individuals with improved fitness values.

Fig. 2.
figure 2

Description of the crossover process

3.3 Genetic Algorithm-Based Attack (GenAttack)

We propose the GA-based attack, GenAttack, against commercial APIs, and present the details of our attack procedures. First, we test the commercial API with an original image, and check whether the API correctly recognizes the celebrity image from the returned initial output label, I.Label, and its confidence value. If it is correctly recognized, it becomes our target celebrity image to create an adversarial example. If not, we discard this image, since the API is wrong in the first place. Next, we initiate GenAttack and start querying each commercial API with the noise-added image. If the returned result produces an incorrect output label (i.e., some other celebrity), that means our attack is successful, and we successfully create an adversarial image. We label this output class as an adversarial label, A.Label. If it consistently returns the correct I.Label, we slightly increase and adjust \(\alpha \), and compute the fitness function, searching for the optimum noise combinations according to our GA. We iteratively repeat this process for several epochs. until we successfully force the API to produce an incorrect output (A.Label). Finally, if we can deceive an API, so that the API returns a different name from the I.Label, we declare the attack is successful. If we cannot deceive, or the API returns ‘Unrecognized’ (UNKR), the attack is not successful, and we fail to create an adversarial image. Our attack criteria is much stronger than prior research [9, 10], which includes UNKR as a success.

4 Experiment

The goal of our experiment is to evaluate generated adversarial examples, and further test the robustness of commercial APIs using GenAttack. We used Amazon Rekognizer [1], and Naver Clover Service [2] to provide a side-by-side attack success, and robustness comparison between these providers. In particular, we used the celebrity recognition APIs, which are offered from both providers, and celebrity images are relatively easy to find. Also, they are complex and realistic. Although Cloud Vision API by Google also provides the face image analysis information, and returns the top 20 relevant entities from the Internet, their returned labels are based on web search results, not images themselves. Hence, a side-by-side comparison to Amazon and Naver is difficult, therefore, we do not evaluate Google’s API in this paper.

Fig. 3.
figure 3

Original vs. generated adversarial images for 65 celebrities

4.1 Dataset

We chose 119 famous celebrities (72 men and 47 women) as a dataset. Although we try to select celebrities that are both popular in America and Asia, we hypothesize that the Naver API based in Asia would be more optimized for Asian celebrities over American and European celebrities. Hence, we include several Asian celebrities to test, even though they may not be so well known in America or Europe. We use the practical image sizes ranging from 200 \(\times \) 300 to 2,100 \(\times \) 2,800 pixels. These are much larger than the small benchmark datasets such as MNIST (28 \(\times \) 28), CIFAR-10 (32 \(\times \) 32), and ImageNet (227 \(\times \) 227) which have been used in prior research [6, 16]. Some sample celebrity images and names are shown in Fig. 3 and Table 1.

4.2 Experimental Setup

We run 5 epochs to get an adversarial example for all 119 target images, starting with \(\alpha = 0.1\). Then we automatically adjust \(\alpha \) from 0.05 to 0.9 based on the attack success and confidence value returned from the API. We run 5 more epochs to generate an adversarial example for a target image after obtaining \(\alpha \) from the GA. If we consecutively fail to produce an adversarial image in the next 10 steps, we increase \(\alpha \) and repeat the process again. If we find an adversarial images in 10 consecutive steps, we decrease \(\alpha \) to reduce the noise.

5 Results

In this section, we report the attack success rate and analyze generated noise in adversarial examples from GenAttack.

5.1 Attack Success Rate

Table 1 summarizes our attack results for several celebrity images. Due to space limitations, we only present celebrities whose original image was correctly recognized by both APIs. In Table 1, the fist column is the correct celebrity name for each image followed by its initial I.Label and I.Pr., where those indicate the original input label and its confidence probability returned by each API. And A.Label and A.Pr. are the output label and its confidence probability for adversarial images we generate with our GenAttack. ‘UNKR’ means that the original image is successfully recognized, while the noise-added image is unrecognized.

Table 1. Examples of attack result against Amazon and Naver APIs with each celebrity

Amazon API correctly recognizes 112 images from the 119 input images. Our algorithm attacked those 112 images, and achieved the overall 86.61% success rate, successfully creating 97 adversarial examples. We find that GenAttack effectively adds and improves noise from a predicted label with a low initial confidence value returned for its initial adversarial example generation attempt. From Table 1, we can observe that GenAttack guides noise to find a path from one output celebrity class to another celebrity class with fairly high confidence values (A.Pr.) in many cases shown in the 5th column in Table 1.

On the other hand, the Naver API correctly recognizes only 45 out of the 119 original images, misclassifying many original celebrity images from America and Europe. Hence, we validate Naver is more localized to Asian faces. Among those correctly recognized 49 images, GenAttack successfully creates the adversarial images for all 49 images, yielding 100.00% success rate. Naver seems to generate different output labels for many Asian celebrities even with a small amount of added perturbations, and Naver is much easier to fool. However, their A.Pr. are generally lower than Amazon, which means Naver outputs the new label with smaller confidence value. With the Naver API, we observed that Tom Cruise was the most difficult one to find an adversarial example for. We hypothesize that Naver might not have many faces that are similar to Tom Cruises or have faces that are clearly distinctive. Therefore, Naver locks on to the features of Tom Cruise and we think GenAttack could not easy to find other similar classes. In Fig. 3, we present 65 original celebrity images (left) and adversarial images (right) generated from GenAttack side-by-side for a comparison. As we can examine, the generated adversarial images are very close to the original images, and humans can trivially recognize the generated adversarial examples.

5.2 Noise and Image Analysis

We carefully analyze the noise patterns of the adversarial images, where we add random noise in \(\tanh \) space. Also, we compare our noise with uniform noise in \(\tanh \) space to characterize differences.

Fig. 4.
figure 4

Comparison of noise distribution (GenAttack vs. uniform noise) (Color figure online)

Figures 4(a) and (f) are the adversarial examples we produced for Jack Ma and Jennifer Lawrence. In Fig. 4, brighter yellow represents a higher pixel value, and dark blue indicates a lower pixel value. Figures 4(b) and (g) only show the generated \(L_2\) noise, and Figs. 4(c) and (h) are the uniformly generated noise for the same images. As we compare these two sets of images, we can clearly observe that our GA tends to better capture face features of the input and injects noise, while random noise spreads over all pixels. In order to analyze the differences more clearly, we zoom in the face areas. As we can observe from Figs. 4(d), (e), and (i), (j), we find noise generated from the GA more closely changes the face features so that the CNNs based classifier can more easily make a mistake and steer towards another celebrity. On the other hand, random noise is distributed uniformly over all pixels. Hence, we clearly observe the differing noise distribution, and our generated noise appears to better learn the face features with the GA to optimize noise to increase a classification error.

Noise Filtering Defense and Generated Image Sizes: Generally, pre-filtering can be an effective defense mechanism as proven in other research [9]. However, it is not in our case. We applied both Gaussian (linear) and Median (non-linear) filters to remove added noise from GA, where these filters have been a successful defense shown from other research [9]. In our case, noise filtering cannot prevent from generating adversarial examples for both Amazon and Naver, but generated adversarial images need slightly more noise than the non-filtered case. Also, we find that our approach effectively generates adversarial examples for any size of input celebrity images in our dataset, ranging from 200 \(\times \) 300 to 2,100 \(\times \) 2,800 in pixels. Hence, we demonstrate that our GA can generate almost size-invariant adversarial images without loss of any performance.

6 Transfer Learning for Attacks

We performed the transfer learning capability of our proposed method. If our algorithm can deceive one classifier, we hypothesize we can deceive another API. Hence, attackers can use this transfer learning for an attack, where adversarial features (noise matrices) learned from one DNNs (e.g. Naver) can be used to create an adversarial image for another classifier (e.g. Amazon), and vice versa. Among all 119 celebrity images, we obtained ten adversarial samples that successfully fool only one of the APIs, as shown in Table 2. In our attack, we query both Naver and Amazon APIs simultaneously, and calculate fitness as follows by extending the fitness function for the single API in Eq. 5:

$$\begin{aligned} f=P_{o}^{Amazon} - P_{d}^{Amazon} + P_{o}^{Naver} - P_{d}^{Naver} + \gamma \times L_{1}, \end{aligned}$$
(6)

where \(P_{o}^{Amazon}\) and \(P_{o}^{Naver}\) are the predicted probability for original label from Amazon and Naver, and \(P_{d}^{Amazon}\) are \(P_{d}^{Naver}\) are the predicted probability for other label produced from Amazon and Naver similar to Eq. 5. When optimizing \(\alpha \) in Eq. 2, we only consider the adversarial image generation success rate of the target API, which was originally unsuccessful. For example, if we want to find adversarial examples for the Amazon API with the help from the Naver API, we optimize \(\alpha \) based on the success rate of the Amazon API.

We performed the transfer learning attack experiment for all available 10 test cases. Overall, 7 out of 10 transfer learning attack were successful, improving the most of UNKR (originally failed attacks by the single API) to other celebrities. Among those, where Amazon API fails to recognize 8 of celebrities initially, our algorithm successfully fools Amazon API with the help from Naver API.

As shown in Table 2, six of (Before) ‘UNKR’ were successfully classified to (After) other celebrities. However, creating adversarial images for Kate Maria, and Kim Yuna were unsuccessful, even using Naver API. On the other two cases, where our algorithm successfully attacked Amazon API but not Naver API initially, we performed the transfer learning attack on Naver with the help from Amazon. Naver was originally correct for “Sohee” but Amazon led Naver to misclassify the correct label “Sohee” to “Park Soo-jin” (the same adversarial label in Amazon, as shown in blue). Hence, this shows that targeted attack is possible via transfer learning, making other celebrity to a specific victim label (e.g. Park Soo-jin). Also, Naver successfully launched the targeted attack for “Sooyoung” to be “Solji” (shown in blue) in the same way. This demonstrates that the same fake label can be exactly transferred from one classifier to another classifier. Hence, noise generated from our algorithm is transferable between classifiers for generating adversarial examples. Hence, attackers can practically leverage transfer learning to improve his attacks against DNNs.

Table 2. Transfer learning attack, where one API assists in deceiving another API, which was originally unsuccessful

7 Discussions, and Limitations

Robust DNNs and Conservative Reporting: One possible defense approach is to make DNNs more robust against noise via adversarial training with GA [7]. Also, it is better to be more conservative in reporting an output label, when a confidence value is low. For example, if the confidence value is below 70%, APIs can generate ‘UNKR’. In this way, APIs do not provide any feedback to attackers, and adversarial example generation cannot be proceeded. Instead of attempting to make the best guess always, it is important to know “when APIs do not know.” From the defense perspective, it is better to be conservative and even not to report any results when confused. However, clear trade-offs among customers’ service needs, performance, and security requirements have to be considered to better design the overall defense mechanisms.

Network Level Rate Limiting and Noise Filtering: In order to create adversarial examples, several queries need to be made to obtain returned output labels and confidence values. The large number of API queries per sec. for the same or similar images can be a suspicious adversarial attack activity. Hence, various rate limiting techniques such as CAPTCHAs and network defense mechanisms can be employed. However, this cannot be effective for distributed GenAttack querying over multiple IPs or with slower rates. Also, we need a more sophisticated pre-filtering strategy to learn noise patterns generated from our GA, and remove those more effectively. Currently, we are investigating improved noise filtering techniques.

Limitations and Future Work: Even though GA searches for an optimum noise value, it is not guaranteed to find a global optimum noise. GA can resort on the local optimum, because of the nature of the evolutionary algorithm. Also, finding an optimum noise without access to DNN parameters is a challenging task. Further empirical experiments and theoretical analysis are needed to control different GA parameters to fine-tune the noise. For future work, we plan to compare GenAttack with other attack and defense mechanisms [7, 12, 14, 24].

8 Conclusion

We introduce a simple yet powerful method, GenAttack, to generate adversarial images, which does not require any knowledge about DNNs or use GPU resources. GenAttack optimizes noise using a iterative approach and can provide significant benefits over other complex gradient based estimation attacks. Further, we show that GenAttack is highly practical, and is transferable to attack other classifiers.