GanDef: A GAN Based Adversarial Training Defense for Neural Network Classifier

Liu, Guanxiong; Khalil, Issa; Khreishah, Abdallah

doi:10.1007/978-3-030-22312-0_2

GanDef: A GAN Based Adversarial Training Defense for Neural Network Classifier

Guanxiong Liu¹⁹,
Issa Khalil²⁰ &
Abdallah Khreishah¹⁹

Conference paper
First Online: 05 June 2019

1341 Accesses
12 Citations

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 562))

Abstract

Machine learning models, especially neural network (NN) classifiers, are widely used in many applications including natural language processing, computer vision and cybersecurity. They provide high accuracy under the assumption of attack-free scenarios. However, this assumption has been defied by the introduction of adversarial examples – carefully perturbed samples of input that are usually misclassified. Many researchers have tried to develop a defense against adversarial examples; however, we are still far from achieving that goal. In this paper, we design a Generative Adversarial Net (GAN) based adversarial training defense, dubbed GanDef, which utilizes a competition game to regulate the feature selection during the training. We analytically show that GanDef can train a classifier so it can defend against adversarial examples. Through extensive evaluation on different white-box adversarial examples, the classifier trained by GanDef shows the same level of test accuracy as those trained by state-of-the-art adversarial training defenses. More importantly, GanDef-Comb, a variant of GanDef, could utilize the discriminator to achieve a dynamic trade-off between correctly classifying original and adversarial examples. As a result, it achieves the highest overall test accuracy when the ratio of adversarial examples exceeds 41.7%.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Due to the surprisingly good representation power of complex distributions, NN models are widely used in many applications including natural language processing, computer vision and cybersecurity. For example, in cybersecurity, NN based classifiers are used for spam filtering, phishing detection as well as face recognition [1, 18]. However, the training and usage of NN classifiers are based on an underlying assumption that the environment is attack free. Therefore, such classifiers fail when adversarial examples are presented to them.

Adversarial examples were first introduced in [21] in the context of image classification. It shows that a visually insignificant modification with specially designed perturbations can result in a huge change of prediction results with nearly $100\%$ success rate. Generally, adversarial examples can be used to mislead NN models to output any aimed prediction. They could be extremely harmful for many applications that utilize NNs, such as automatic cheque withdrawal in banks, traffic speed detection, and medical diagnosis in hospitals. As a result, this serious threat inspires a new line of research to explore the vulnerability of NN classifiers and develop appropriate defensive methods.

Recently, a plethora of methods to countermeasure adversarial examples has been introduced and evaluated. Among these methods, adversarial training defenses play an important role since they (1) effectively enhance the robustness, and (2) do not limit adversary’s knowledge. However, most of them lack the trade-off between classifying original and adversarial examples. For applications that are sensitive to misbehavior or operate in risky environment, it is worth to enhance defenses against adversarial examples by sacrificing performance on original examples. The ability to dynamically control such trade-off makes the defense even more valuable.

In this paper, we propose a GAN based defense against adversarial examples, dubbed GanDef. GanDef is designed based on adversarial training combined with feature learning [10, 12, 24]. As a GAN model, GanDef contains a classifier and a discriminator which form a minimax game. To achieve the dynamic trade-off between classifying original and adversarial examples, we also propose a variant of GanDef, GanDef-Comb, that utilizes both classifier and discriminator. During evaluation, we select several state-of-the-art adversarial training defenses as references, including Pure PGD training (Pure PGD) [13], Mix PGD training (Mix PGD) [7] and Logit Pairing [7]. The comparison results show that GanDef performs better than state-of-the-art adversarial training defenses in terms of test accuracy. Our contributions can be summarized as follows:

We propose the defensive method, GanDef, which is based on the idea of using a discriminator to regularize classifier’s feature selection.
We mathematically prove that the solution of the proposed minimax game in GanDef contains an optimal classifier, which usually makes correct predictions on adversarial examples by using perturbation invariant features.
We empirically show that the trained classifier in GanDef achieves the same level of test accuracy as that in state-of-the-art approaches. Adding the discriminator, GanDef-Comb can dynamically control the trade-off on classifying original and adversarial examples and achieves the highest overall test accuracy when the ratio of adversarial examples exceeds 41.7%.

The rest of the paper is organized as follows: Sect. 2 presents background material, Sect. 3 details the design and mathematical proof of GanDef, Sect. 4 shows evaluation results, and Sect. 5 concludes the paper.

2 Background and Related Work

In this section, we introduce high-level background material about threat model, adversarial example generators and defensive mechanisms for the better understanding of concepts presented in this work. We also provide relevant references for further information about each topic.

2.1 Threat Model

The adversary aims at misleading the NN model utilized by the application to achieve a malicious goal. For example, adversary adds adversarial perturbation to the image of a cheque. As a result, this image may mislead the NN model utilized by the ATM machine to cash out a huge amount of money. During the preparation of adversarial examples we assume that adversary has full knowledge of the targeted NN model, which is the white-box scenario. Also, we assume that adversary has limited computational power. As a result, the adversary can generate iterative adversarial examples but cannot exhaustively search all possible input perturbation.

2.2 Generating Adversarial Examples

The adversarial examples could be classified into white-box and black-box attacks based on adversary’s knowledge of target NN classifier. Based on the generating process, they could be also classified as single-step and iterative adversarial examples.

Fast Gradient Sign Method (FGSM) is introduced by Goodfellow et al. in [6] as a single-step white-box adversarial example generator against NN image classifiers. This method tries to maximize the loss function value, $\mathcal {L}$, of NN classifier, $\mathcal {C}$, to find adversarial examples. The function $\mathcal {F}$ is used to ensure that the generated adversarial example is still a valid image.

$$\begin{aligned} \begin{aligned}&\underset{\delta }{\text {maximize}}&\mathcal {L} (\hat{z}=\mathcal {C}(\hat{x}), t)&\text { subject to}&\hat{x} = \mathcal {F} (\bar{x}, \delta ) \in \mathbb {R}_{[0,1]}^{m} \end{aligned} \end{aligned}$$

To keep visual similarity and enhance generation speed, this maximization problem is solved by running gradient ascent for one iteration. It simply generates adversarial examples, $\hat{x}$, from original images, $\bar{x}$, by adding small perturbation, $\delta $, which changes each pixel value along the gradient direction of the loss function. As a single step adversarial example generator, FGSM can generate adversarial examples efficiently. However, the quality of the generated adversarial examples is relatively low due to the linear approximation of the loss function landscape.

Basic Iterative Method (BIM) is introduced by Kurakin et al. in [8] as an iterative white-box adversarial example generator against NN image classifiers. In the algorithm design, BIM utilizes the same mathematical model as FGSM. But, different from the FGSM, BIM is an iterative attack method. Instead of making the adversarial perturbation in one iteration, BIM runs the gradient ascent algorithm multiple iterations to maximize the loss function. In each iteration, BIM applies smaller perturbation and maps the perturbed image through the function $\mathcal {F}$. As a result, BIM approximates the loss function landscape by linear spline interpolation. Therefore, it generates stronger adversarial examples than FGSM within the same neighboring area.

Projected Gradient Descent (PGD) is another iterative white-box adversarial example generator recently introduced by Madry et al. in [13]. Similar to BIM, PGD also solves the same optimization problem iteratively with the projected gradient descent algorithm. However, PGD randomly selects an initial point within a limited area of the original image and repeats this several times to generate an adversarial example. With this multiple time random initialization, PGD is shown experimentally to solve the optimization problem efficiently and generate more serious adversarial examples since the loss landscape has a surprisingly tractable structure [13].

2.3 Adversarial Example Defensive Methods

Many defense methods have been proposed recently. In the following, we summarize and present representative samples from three major defense classes.

Augmentation and Regularization aims at penalizing overconfident prediction or utilizing synthetic data during training. One of the early ideas is the defensive distillation. Defensive distillation uses the prediction score from original NN, usually called teacher, as ground truth to train another smaller NN, usually called student [16, 17]. It has been shown that the calculated gradients from the student model become very small or even reach zero and hence become useless to the adversarial example generator [16]. Some of the recent works that belong to this set of methods are referred to as Fortified Network [9] and Manifold Mixup [23]. Fortified Network utilizes denoising autoencoder to regularize the hidden states. Manifold Mixup also focuses on the hidden states but follows a different way. During the training, Manifold Mixup uses interpolations of hidden states and logits during training to enhance the diversity of training data. Compared with adversarial training defenses, this set of defenses has significant limitations. For example, defensive distillation is vulnerable to Carlini attack [4] and Manifold Mixup can only defend against single step attacks.

Protective Shell is a set of defensive methods which aim at using a shell to reject or reform the adversarial examples. An example of these methods is introduced by Meng et al. in [14] which is called MagNet. In this work, the authors design two types of functional components: the detector and the reformer. Adversarial examples are either rejected by the detector or reformed to eliminate the perturbations. Other recent works such as [11] and [19] try to utilize different methods to build the shell. In [11], authors inject adaptive noise to input images which breaks the adversarial perturbations without significant decrease of classification accuracy. In [19], a generator is utilized to generate images that are similar to the inputs. By replacing the inputs with generated images, it achieves resistance to adversarial examples. However, this set of methods usually assume the shell itself is black-box to the adversary and the work in [2] has already found ways to break such an assumption.

Adversarial Training is based on a straightforward idea that treats adversarial examples as blind spots of the original training data [25]. Through retraining with adversarial examples, the classifier learns the perturbation pattern and generalizes its prediction to account for such perturbations. In [6], the adversarial examples generated by FGSM are used for adversarial training and the trained NN classifier can defend single step adversarial examples. Later works in [13] and [22] enhance the adversarial training method to defend examples like BIM and PGD. A more recent work in [7] requires that the pre-softmax logits from original and adversarial examples to be similar. Authors believe this method could utilize more information during adversarial training. A common problem in existing adversarial training defenses is that the trained classifier has no control of the trade-off between correctly classifying original and adversarial examples. Our work achieves this flexibility and shows the benefit.

3 GanDef: GAN Based Adversarial Training

In this section, we present the design of our defensive method (GanDef) as follows. First, the design of GanDef is introduced as a minimax game of the classifier and discriminator. Then we conduct a theoretical analysis of the proposed minimax game in GanDef. Finally, we conduct experimental analysis to evaluate the convergence of GanDef.

3.1 Design

Given the training data pair $\langle x, t \rangle $, where $x \in \cup (\bar{X}, \hat{X})$, we try to find a classification function $\mathcal {C}$ that uses x to produce pre-softmax logits z such that:

$$\begin{aligned} \begin{aligned}&t_{i} = f(z_{i}) = \frac{e^{z_{i}}}{\sum _{z_{j}} e^{z_{j}}}&\text {The mapping between } z \text { and } t \text { is the softmax function.} \end{aligned} \end{aligned}$$

Since x can be either original example $\bar{x}$ or adversarial example $\hat{x}$, we want the classifier to model the conditional probability $q_{C}(z|x)$ with only non-adversarial features. To achieve this, we employ another NN and call it discriminator $\mathcal {D}$. $\mathcal {D}$ uses the pre-softmax logits z from $\mathcal {C}$ as inputs and predicts whether the input to classifier is $\bar{x}$ or $\hat{x}$. This process can be performed by maximizing the conditional probability $q_{D}(s|z)$, where s is a Boolean variable indicating the source of x is original or adversarial. Finally, by combining the classifier and the discriminator, we formulate the following minimax game:

$$\begin{aligned} \begin{array}{c} \underset{\mathcal {C}}{\text {min}} ~ \underset{\mathcal {D}}{\text {max}} ~ J(\mathcal {C}, \mathcal {D})\\ \\ \text {where} ~~ J(\mathcal {C}, \mathcal {D}) = \underset{x \sim X, t \sim T}{\mathbb {E}} \{- log [q_{C}(z|x)]\} - \underset{z \sim Z, s \sim S}{\mathbb {E}} \{- log [q_{D}(s|z=\mathcal {C}(x))]\} \end{array} \end{aligned}$$

In this work, we envision that the classifier could be seen as a generator that generates pre-softmax logits based on selected features from input images. Then, the classifier and the discriminator engage in a minimax game, which is also known as Generative Adversarial Net (GAN) [5]. Therefore, we name our proposed defense as “GAN based Adversarial Training” (GanDef). While other defenses ignore or only compare $\bar{z}$ and $\hat{z}$, utilizing discriminator with z adds a second line of defense when the classifier is defeated by adversarial examples.

The pseudocode of GanDef training is summarized in Algorithm 1 and is visualized in Fig. 1. A summary of the notations used throughout this work is available in Table 1.

Table 1. Summary of notations

Full size table

3.2 Theoretical Analysis

With the formal definition of our GanDef, we perform a theoretical analysis in this subsection. We show that under the current definition where J is a combination of log likelihood of Z|X and S|Z, the solution of the minimax game contains an optimal classifier which can correctly classify adversarial examples. It is worth noting that our analysis is conducted in a non-parametric setting, which means that the classifier and the discriminator have enough capacity to model any distribution.

Proposition 1

If there exists a solution $(\mathcal {C}^{*}, \mathcal {D}^{*})$ for the aforementioned minmax game J such that $J(\mathcal {C}^{*}, \mathcal {D}^{*}) = H(Z|X) - H(S)$, then $\mathcal {C}^{*}$ is a classifier that can defend against adversarial examples.

Proof

For any fixed classification model $\mathcal {C}$, the optimal discriminator can be formulated as

$$\begin{aligned} \begin{aligned} \mathcal {D}^{*}&= \text {arg}~ \underset{\mathcal {D}}{\text {max}}~ J(\mathcal {C}, \mathcal {D})&= \text {arg}~ \underset{\mathcal {D}}{\text {min}}~ \underset{z \sim Z, s \sim S}{\mathbb {E}} \{- log [q_{D}(s|z=\mathcal {C}(x))]\} \end{aligned} \end{aligned}$$

In this case, the discriminator can perfectly model the conditional distribution and we have $q_{D}(s|z=\mathcal {C}(x)) = p(s|z=\mathcal {C}(x))$ for all z and all s. Therefore, we can rewrite J with optimal discriminator as $J'$ and denote the second half of J as a conditional entropy H(S|Z)

$$\begin{aligned} \begin{aligned}&J'(\mathcal {C}) = \underset{x \sim X, t \sim T}{\mathbb {E}} \{- log [q_{C}(z|x)]\} - H(S|Z) \end{aligned} \end{aligned}$$

For the optimal classification model, the goal is to achieve the conditional probability $q_{C}(z|x) = p(z|x)$ since z can determine t by taking softmax transformation. Therefore, the first part of $J'(\mathcal {C})$ (the expectation) is larger than or equal to H(Z|X). Combined with the basic property of conditional entropy that $H(S|Z) \le H(S)$, we can get the following lower bound of J with optimal classifier and discriminator

$$\begin{aligned} \begin{aligned}&J(\mathcal {C}^{*}, \mathcal {D}^{*}) \ge H(Z|X) - H(S|Z) \ge H(Z|X) - H(S) \end{aligned} \end{aligned}$$

This equality holds when the following two conditions are satisfied:

The classifier perfectly models the conditional distribution of z given x, $q_{C}(z|x) = p(z|x)$, which means that $\mathcal {C}^{*}$ is an optimal classifier.
S and Z are independent, $H(S|Z) = H(S)$, which means that adversarial perturbations do not affect pre-softmax logits.

In practice, the assumption of unlimited capacity in classifier and discriminator may not hold and it would be hard or even impossible to build an optimal classifier that outputs pre-softmax logits that are independent from adversarial perturbation. Therefore, we introduce a trade-off hyper-parameter $\gamma $ into the minimax function as follows:

$$\begin{aligned} \begin{aligned}&\underset{x \sim X, t \sim T}{\mathbb {E}} \{- log [q_{C}(z|x)]\} - \gamma \underset{z \sim Z, s \sim S}{\mathbb {E}} \{- log [q_{D}(s|z=\mathcal {C}(x))]\} \end{aligned} \end{aligned}$$

When $\gamma = 0$, GanDef is the same as traditional adversarial training. When $\gamma $ increases, the discriminator becomes more and more sensitive to information of s contained in pre-softmax logits, z.

3.3 Convergence Analysis

Beyond the theoretical analysis, we also conduct an experimental analysis of the convergence of GanDef. Based on the pseudocode in Algorithm 1, we train a classifier on MNIST dataset. In order to compare the convergence, we also implement Pure PGD, Mix PGD and Logit Pairing and present their test accuracies on original test images during different training epochs.

As we can see from Fig. 2, the convergence of GanDef is not as good as other state-of-the-art adversarial training defenses. Although all these methods converge to over 95% test accuracy, GanDef shows significant fluctuation during the training process.

In order to improve the convergence of GanDef, we carefully trace back the design process and identify the root cause of the fluctuations. During the training of the classifier, we subtract the penalty term $\underset{z \sim Z, s \sim S}{\mathbb {E}} \{- log [q_{D}(s|z=\mathcal {C}(x))]\}$ which encourages the classifier to hide information of s in every z. Compared with Logit Pairing which requires similar z from original and adversarial examples, our penalty term is too strong. Therefore, we modify the training loss of the classifier to:

$$\begin{aligned} \begin{aligned}&\underset{x \sim X, t \sim T}{\mathbb {E}} \{- log [q_{C}(z|x)]\} - \gamma \underset{\hat{z} \sim \hat{Z}, \hat{s} \sim \hat{S}}{\mathbb {E}} \{- log [q_{D}(\hat{s}|\hat{z}=\mathcal {C}(\hat{x}))]\} \end{aligned} \end{aligned}$$

Recall that $\hat{x}$, $\hat{z}$ and $\hat{s}$ represent the adversarial example, its pre-softmax logits, and the source indicator, respectively. It is also worth to mention that this modification is only applied to the classifier. Therefore, it does not affect the consistency of the previous proof. During convergence analysis, we denote the modified version of our defensive method as GanDef V2 and its convergence results are also shown in Fig. 2. It is clear that GanDef V2 significantly improves the convergence and stability during the training. Moreover, its test accuracy on the original as well as several different white-box adversarial examples is also higher than the initial design. Due to these improvements, we use it as the standard implementation of GanDef in the rest of this work.

4 Experiments and Results

In this section, we present comparative evaluation results of the adversarial training defenses introduced previously.

4.1 Datasets, NN Structures and Hyper-parameter

During evaluation, we conduct experiments for classifying original and adversarial examples on both MNIST and CIFAR10 datasets. To ensure the quality of evaluation, we utilize the standard python library (CleverHans [15]) and run all experiments on a Linux Workstation with NVIDIA GTX-1080 GPU. We choose the adversarial examples introduced in Sect. 2 and denote them as FGSM, BIM, PGD-1 and PGD-2 examples. For MNIST dataset, PGD-1 represents 40-iteration PGD attack while PGD-2 corresponds to 80-iteration PGD attack. Moreover, the maximum perturbation limitation is 0.3. The per step perturbation limitations for BIM and PGD examples are 0.05 and 0.01. For CIFAR10 dataset, these two sets of adversarial examples are 7-iteration and 20-iteration PGD attack. The maximum perturbation limitation is $\frac{8}{255}$ while per step perturbation limitation for BIM and PGD is $\frac{2}{255}$.

During the training, the vanilla classifier only uses original training data while defensive methods utilize original and PGD-1 examples except for Pure PGD which only requires the PGD-1 examples. For the testing part, we generate adversarial examples based on test data which was not used in training. These adversarial examples together with original test data form the complete test dataset during the evaluation stage. To make a fair comparison, defensive methods and vanilla classifier share the same NN structures which are (1) LeNet [13] for MNIST, and (2) allCNN [20] for CIFAR10. Due to the page limitation, the detailed structure is shown in the Appendix. The hyper-parameter of existing defensive methods are the same as the original papers [7, 13]. During the training of Logit Pairing on CIFAR10, we found that using the same trade-off parameter as MNIST lead to divergence. To resolve the issue, we try to change the optimizer, learning rate, initialization and weight decay. However, none of them work until the weight of logit comparison loss is decreased to 0.01.

To validate the NN structure as well as the adversarial examples, we utilize the vanilla classifier to classify original and adversarial examples. Based on the results in Table 2, the test accuracy of the vanilla classifier on original examples matches the records of benchmarks in [3]. Moreover, the test accuracy of the vanilla classifier on any kind of adversarial examples has significant degeneration which shows the adversarial example generators are working properly.

4.2 Comparative Evaluation of Defensive Approaches

As the first step, we compare the GanDef with state-of-the-art adversarial training defenses in terms of test accuracy on original and white-box adversarial examples. The results are presented in Fig. 3 and summarized in Table 2.

On MNIST, all defensive methods achieve around 99% test accuracy on original examples and Pure PGD is slightly better than others. In general, the test accuracy of defensive methods are almost the same and does not go lower than that of the vanilla model. On CIFAR10, we can see that the test accuracy of defensive methods on original data is around 83% and these of Logit Pairing and GanDef are slightly higher than others. Compared with the vanilla classifier, there is about 5% decrease in test accuracy. Similar degeneration is also reported in previous works on Pure PGD, Mix PGD and Logit Pairing [7, 13].

During the evaluation on MNIST, there are no significant differences among defensive methods and each could achieve around 95% test accuracy. The Pure PGD method is the best on the evaluation of FGSM and BIM examples, while the Logit Pairing is the best on the evaluation of PGD-1 and PGD-2 examples. Based on the evaluation results from CIFAR10, we can see the differences between defensive methods are slightly larger. On all four kinds of white-box adversarial examples, Pure PGD is the best method and the test accuracy ranges from 48.33% (PGD-1) to 56.18% (FGSM). In the rest of defensive methods, GanDef is the best choice with test accuracy ranges from 45.62% (PGD-1) to 54.14% (FGSM).

Based on the comparison as well as visualization in Fig. 3, it is clear that the proposed GanDef has the same level of performance as state-of-the-art adversarial training defenses in terms of the trained classifier’s test accuracy on original and different adversarial examples.

4.3 Evaluation of GanDef-Comb

In the second phase of evaluation, we consider GanDef-Comb which is a variant of GanDef. This variant utilizes both classifier and discriminator trained by GanDef. As we show in Sect. 3, the discriminator could be seen as a second line of defense when the trained classifier fails to make correct predictions on adversarial examples. By setting different threshold values for the discriminator, GanDef can dynamically control the trade-off between classifying original and adversarial examples. In current evaluation, the threshold is set to 0.5.

Table 2. Summary of test accuracy on original and adversarial examples

Full size table

On MNIST, the test accuracy of GanDef-Comb on original, FGSM and BIM examples is the same as that of GanDef. On PGD-1 and PGD-2 examples, the test accuracy of GanDef-Comb has a small degeneration (less than 0.3%). This is because MNIST dataset is so simple such that the classifier alone can provide near optimal defense. Those misclassified corner cases are hard to be patched by utilizing discriminator. In common cases, the classifier has much larger degeneration on classifying adversarial examples. For example, on the CIFAR10, the benefit of utilizing discriminator is obvious due to such degeneration. From the results of test accuracy, GanDef-Comb is significantly better than state-of-the-art adversarial training defenses on mitigating FGSM, BIM, PGD-1 and PGD-2 examples. Based on the comparison, GanDef-Comb enhances test accuracy by at least 31.43% on FGSM, 26.81% on BIM, 28.88% on PGD-1 and 25.23% on PGD-2. Although the test accuracy of GanDef-Comb on original examples has about 20% degeneration, the enhancement on defending adversarial examples benefits the overall test accuracy when the ratio of adversarial examples exceeds a certain limit.

To show the benefit of being able to control the trade-off, we design two experiments on CIFAR10 dataset. We form test dataset with original and adversarial examples (FGSM examples in the first experiment and PGD-2 examples in the second one). The ratio of adversarial examples, $\rho $, changes from 0 to 1. Giving similar weight losses in classifying original and adversarial examples, $\rho $ represents the probability of receiving adversarial examples. Or, giving similar probabilities of receiving original and adversarial examples, $\rho $ represents the weight of correctly classify adversarial examples ($1-\rho $ for original examples). These two evaluations are designed for risky or misbehavior-sensitive running environments, respectively.

The results of the overall test accuracy under different experiments are shown in Fig. 4. It can be seen that GanDef-Comb is better than state-of-the-art defenses in terms of overall test accuracy when $\rho $ exceeds 41.7%. In real applications, we could further enhance the overall test accuracy through changing the discriminator’s threshold value. When $\rho $ is low, GanDef-Comb gives less attention to discriminator (high threshold value) and achieves similar performance as that of the state-of-the-art defenses. When $\rho $ is high, GanDef-Comb relies on discriminator (low threshold value) to detect more adversarial examples.

5 Conclusion

In this paper, we introduce a new defensive method for Adversarial Examples, GanDef, which formulates a minimax game with a classifier and a discriminator during training. Through evaluation, we show that (1) the classifier achieves the same level of defense as classifiers trained by state-of-the-art defenses, and (2) using both classifier and discriminator (GanDef-Comb) can dynamically control the trade-off in classification and achieve higher overall test accuracy under the risky or misbehavior-sensitive running environment. For future work, we consider utilizing more sophisticated GAN models that can mitigate the degeneration when the classifier and the discriminator are combined.

References

Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, pp. 60–69. ACM (2007)
Google Scholar
Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420 (2018)
Benenson, R.: Classification datasets results (2018). http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html. Accessed 06 Apr 2018
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE (2017)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)
Google Scholar
Kannan, H., Kurakin, A., Goodfellow, I.: Adversarial logit pairing. arXiv preprint arXiv:1803.06373 (2018)
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. In: International Conference on Learning Representations (2017)
Google Scholar
Lamb, A., et al.: Fortified networks: improving the robustness of deep networks by modeling the manifold of hidden representations. arXiv preprint arXiv:1804.02485 (2018)
Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., et al.: Fader networks: manipulating images by sliding attributes. In: Advances in Neural Information Processing Systems, pp. 5969–5978 (2017)
Google Scholar
Liang, B., Li, H., Su, M., Li, X., Shi, W., Wang, X.: Detecting adversarial examples in deep networks with adaptive noise reduction. arXiv preprint arXiv:1705.08378 (2017)
Louppe, G., Kagan, M., Cranmer, K.: Learning to pivot with adversarial networks. In: Advances in Neural Information Processing Systems, pp. 982–991 (2017)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Meng, D., Chen, H.: MagNet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 135–147. ACM (2017)
Google Scholar
Papernot, N., et al.: Technical report on the CleverHans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 (2018)
Papernot, N., McDaniel, P.: Extending defensive distillation. arXiv preprint arXiv:1705.05264 (2017)
Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE (2016)
Google Scholar
Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38 (1998)
Article Google Scholar
Samangouei, P., Kabkab, M., Chellappa, R.: Defense-GAN: protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605 (2018)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: International Conference on Learning Representations (2017)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)
Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Verma, V., Lamb, A., Beckham, C., Courville, A., Mitliagkis, I., Bengio, Y.: Manifold mixup: encouraging meaningful on-manifold interpolation as a regularizer. arXiv preprint arXiv:1806.05236 (2018)
Xie, Q., Dai, Z., Du, Y., Hovy, E., Neubig, G.: Controllable invariance through adversarial feature learning. In: Advances in Neural Information Processing Systems, pp. 585–596 (2017)
Google Scholar
Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

New Jersey Institute of Technology, Newark, NJ, 07102, USA
Guanxiong Liu & Abdallah Khreishah
Qatar Computing Research Institute, Doha, Qatar
Issa Khalil

Authors

Guanxiong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Issa Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Abdallah Khreishah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanxiong Liu .

Editor information

Editors and Affiliations

University of North Carolina, Greensboro, NC, USA
Gurpreet Dhillon
Örebro University, Örebro, Sweden
Fredrik Karlsson
Örebro University, Örebro, Sweden
Karin Hedström
University of Aveiro, Aveiro, Portugal
André Zúquete

Appendix Classifier Structures

(See Tables 3 and 4).

Table 3. MNIST LeNet classifier structure

Full size table

Table 4. CIFAR10 allCNN classifier structure

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, G., Khalil, I., Khreishah, A. (2019). GanDef: A GAN Based Adversarial Training Defense for Neural Network Classifier. In: Dhillon, G., Karlsson, F., Hedström, K., Zúquete, A. (eds) ICT Systems Security and Privacy Protection. SEC 2019. IFIP Advances in Information and Communication Technology, vol 562. Springer, Cham. https://doi.org/10.1007/978-3-030-22312-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-22312-0_2
Published: 05 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22311-3
Online ISBN: 978-3-030-22312-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Abstract

1 Introduction

2 Background and Related Work

2.1 Threat Model

2.2 Generating Adversarial Examples

2.3 Adversarial Example Defensive Methods

3 GanDef: GAN Based Adversarial Training

3.1 Design

3.2 Theoretical Analysis

Proposition 1

Proof

3.3 Convergence Analysis

4 Experiments and Results

4.1 Datasets, NN Structures and Hyper-parameter

4.2 Comparative Evaluation of Defensive Approaches

4.3 Evaluation of GanDef-Comb

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix Classifier Structures

Appendix Classifier Structures

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation