Keywords

1 Introduction

Deep Neural Networks (DNNs) have been shown to be highly expressive and have achieved state-of-the-art performance on a wide range of tasks, such as speech recognition [20], image classification [24], natural language understanding [54], and robotics [32]. However, recent studies have found that DNNs are vulnerable to adversarial examples [7,8,9, 17, 31, 38, 40, 45, 47]. Such examples are intentionally perturbed inputs with small magnitude adversarial perturbation added, which can induce the network to make arbitrary incorrect predictions at test time, even when the examples are generated against different models [5, 27, 33, 46]. The fact that the adversarial perturbation required to fool a model is often small and (in the case of images) imperceptible to human observers makes detecting such examples very challenging. This undesirable property of deep networks has become a major security concern in real-world applications of DNNs, such as self-driving cars and identity recognition systems [16, 37]. Furthermore, both white-box and black-box attacks have been performed against DNNs successfully when an attacker is given full or zero knowledge about the target systems [2, 17, 45]. Among black-box attacks, transferability is widely used for generating attacks against real-world systems which do not allow white-box access. Transferability refers to the property of adversarial examples in classification tasks where one adversarial example generated against a local model can mislead another unseen model without any modification [33].

Given these intriguing properties of adversarial examples, various analyses for understanding adversarial examples have been proposed [29, 30, 42, 43], and potential defense/detection techniques have also been discussed mainly for the image classification problem [13, 21, 30]. For instance, image pre-processing [14], adding another type of random noise to the inputs [48], and adversarial retraining [17] have been proposed for defending/detecting adversarial examples when classifying images. However, researchers [4, 19] have shown that these defense or detection methods are easily attacked again by attackers with or even without knowledge of the defender’s strategy. Such observations bring up concerns about safety problems within diverse machine learning based systems.

In order to better understand adversarial examples against different tasks, in this paper we aim to analyze adversarial examples in the semantic segmentation task instead of classification. We hypothesize that adversarial examples in different tasks may contain unique properties that provide in-depth understanding for such examples and encourage potential defensive mechanisms. Different from image classification, in semantic segmentation, each pixel will be given a prediction label which is based on its surrounding information [12]. Such spatial context information plays a more important role for segmentation algorithms, such as [23, 26, 50, 55]. Whether adversarial perturbation would break such spatial context is unknown to the community. In this paper we propose and conduct image spatial consistency analysis, which randomly selects overlapping patches from a given image and checks how consistent the segmentation results are for the overlapping regions. Our pipeline of spatial consistency analysis for adversarial/benign instances is shown in Fig. 1. We find that in segmentation task, adversarial perturbation can be weakened for separately selected patches, and therefore adversarial and benign images will show very different behaviors in terms of the spatial consistency information. Moreover, since such spatial consistency is highly random, it is hard for adversaries to take such constraints into account when performing adaptive attacks. This renders the system less brittle even facing the sophisticated adversaries, who have full knowledge about the model as well as the detection/defense method applied.

We use image scale transformation to perform detection of adversarial examples as a baseline, which has been used for detection in classification tasks [39]. We show that by randomly scaling the images, adversarial perturbation can be destroyed and therefore adversarial examples can be detected. However, when the attacker knows the detection strategy (adaptive attacker), even without the exact knowledge about the scaling rate, attacker can still perform adaptive attacks against the detection mechanism, which is similar with the findings in classification tasks [4]. On the other hand, we show that by incorporating spatial consistency check, existing semantic segmentation networks can detect adversarial examples (average AUC 100%), which are generated by the state-of-the-art attacks considered in this paper, regardless of whether the adversary knows the detection method. Here, we allow the adversaries to have full access to the model and any detection method applied to analyze the robustness of the model against adaptive attacks. We additionally analyze the defense in a black-box setting, which is more practical in real-world systems.

In this paper, our goal is to further understand adversarial attacks by conducting spatial consistency analysis in the semantic segmentation task, and we make the following contributions:

  1. 1.

    We propose the spatial consistency analysis for benign/adversarial images and conduct large scale experiments on two state-of-the-art attack strategies against both DRN and DLA segmentation models with diverse adversarial targets on different dataset, including Cityscapes and real-world autonomous driving video dataset.

  2. 2.

    We are the first to analyze spatial information for adversarial examples in segmentation models. We show that spatial consistency information can be potentially leveraged to distinguish adversarial examples. We also show that spatial consistency check mechanism induce a high degree of randomness and therefore is robust against adaptive adversaries. We evaluate image scaling and spatial consistency, and show that spatial consistency outperform standard scaling based method.

  3. 3.

    In addition, we empirically show that adversarial examples generated by the attack methods considered in our studies barely transfer among models, even when these models are of the same architecture with different initialization, different from the transferability phenomena in classification tasks.

Fig. 1.
figure 1

Spatial consistency analysis for adversarial and benign instances in semantic segmentation.

2 Related Work

Semantic Segmentation has received long lasting attention in the computer vision community [25]. Recent advances in deep learning [24] also show that deep convolutional networks can achieve much better results than traditional methods [28]. Yu et al. [50] proposed using dilated convolutions to build high-resolution feature maps for semantic segmentation. They can improve the performance significantly compared to upsampling approaches [1, 28, 34]. Most of the recent state-of-the-art approaches are based on dilated convolutions [44, 51, 55] and residual networks [18]. Therefore, in this work, we choose dilated residual networks (DRN) [51] and deep layer aggregation (DLA) [52] as our target models for attacking and defense.

Adversarial Examples for Semantic Segmentation have been studied recently in addition to adversarial examples in image classification. Xie et al. proposed a gradient based algorithm to attack pixels within the whole image iteratively until most of the pixels have been misclassified into the target class [49], which is called dense adversary generation (DAG). Later an optimization based attack algorithm has been studied by introducing a surrogate loss function called Houdini in the objective function [10]. The Houdini loss function is made up of two parts. The first part represents the stochastic margin between the score of actual and predicted targets, which reflects the confidence of model prediction. The second part is the task loss, which is independent with the model and corresponds to the actual task. The task loss enables Houdini algorithm to generate adversarial examples in different tasks, including image segmentation, human pose estimation, and speech recognition.

Various detection and defense methods have also been studied against adversarial examples in image classification. For instance, adversarial training [17] and its variations [30, 41] have been proposed and demonstrated to be effective in classification task, which is hard to adapt for the segmentation task. Currently no defense or detection methods have been studied in image segmentation.

Fig. 2.
figure 2

Samples of benign and adversarial examples generated by Houdini on Cityscapes [11] (targeting on Kitty/Pure) and BDD100K [53] (targeting on Kitty/Scene). We select DRN as our target model here. Within each subfigure, the first column shows benign images and corresponding segmentation results, and the second and third columns show adversarial examples with different adversarial targets.

3 Spatial Consistency Based Method

In this section, we will explore the effects that spatial context information has on benign and adversarial examples in segmentation models. We conduct different experiments based on various models and datasets, and due to the space limitation, we will use a small set of examples to demonstrate our discoveries and relegate other examples to the supplementary materials. Figure 2 shows the benign and adversarial examples targeting diverse adversarial targets: “Hello Kitty” (Kitty) and random pure color (Pure) on Cityscapes; and “Hello Kitty” (Kitty) and a real scene without any cars (Scene) on BDD video dataset, respectively. In the rest of the paper, we will use the format “attack method | target” to label each adversarial example. Here we consider both DAG [49] and Houdini [10] attack methods.

Fig. 3.
figure 3

Heatmap of per-pixel self-entropy on Cityscapes dataset against DRN model. (a) and (b) show a benign image and its corresponding per-pixel self-entropy heatmap. (c)–(f) show the heatmaps of the adversarial examples generated by DAG and Houdini attacks targeting “Hello Kitty” (Kitty) and random pure color (Pure).

Fig. 4.
figure 4

Examples of spatial consistency based method on adversarial examples generated by DAG and Houdini attacks targeting on Kitty and Pure. First column shows the original image and corresponding segmentation results. Column \(P_1\) and \(P_2\) show two randomly selected patches, while column \(O_1\) and \(O_2\) represent the segmentation results of the overlapping regions from these two patches, respectively. The mIOU between \(O_1\) and \(O_2\) are reported. It is clear that the segmentation results of the overlapping regions from two random patches are very different for adversarial images (low mIOU), but relatively consistent for benign instance (high mIOU).

3.1 Spatial Context Analysis

To quantitatively analyze the contribution of spatial context information to the segmentation task, we first evaluate the entropy of prediction based on different spatial context. For each pixel m within an image, we randomly select K patches \(\{P_1, P_2, ..., P_K\}\) which contain m. Afterwards, within each patch \(P_i\), the pixel m will be assigned with a confidence vector based on Softmax prediction, so pixel m will correspond to K vectors in total. We discretize each vector to a one-hot vector and sum up these K one-hot vectors to obtain vector \(\mathcal {V}_{m}\). Each component \(\mathcal {V}_{m}[j]\) of the vector represents the number of times pixel m is predicted to be class j. We then normalize \(\mathcal {V}_{m}\) by dividing K. Finally, for each pixel m, we calculate its self-entropy

$$\begin{aligned} \mathcal {H}(m) = -\sum _{j} \mathcal {V}_{m}[j] \log \mathcal {V}_{m}[j] \end{aligned}$$

and therefore calculate the self entropy for each vector. We utilize such entropy information of each pixel to convey the consistency of different surrounding patches and plot this information in the heatmaps in Fig. 3. It is clear that for benign instances, the boundaries of original objects have higher entropy, indicating that these are places harder to predict and can gain more information by considering different surrounding spatial context information (Fig. 4).

3.2 Patch Based Spatial Consistency

The fact that surrounding spatial context information shows different spatial consistency behaviors for benign and adversarial examples motivates us to perform the spatial consistency check hoping to potentially tell these two data distributions apart.

First, we introduce how to generate overlapping spatial contexts by selecting random patches and then validate the spatial consistency information. Let s be the patch size and wh be the width and height of an image \(\mathbf {X}\). We define the first and second patch based on the coordinates of their top-left and bottom-right vertices \((u_1, u_2, u_3, u_4 ), (v_1, v_2, v_3, v_4)\), where Let \((d_{u_{1},v_{1}}, d_{u_2, v_2})\) be displacement between the top-left coordinate of the first and second patch: \(d_{u_1, v_1}=v_1 - u_1, d_{u_2, v_2} = v_2 - u_2\). To guarantee that there is enough overlap, we require \((d_{u_1, v_1}\) and \(d_{u_2, v_2})\) to be in the range \((b_{\mathbf {low}}, b_{\mathbf {upper}})\). Here we randomly select the two patches, aiming to capture diverse enough surrounding spatial context, including information both near and far from the target pixel. The patch selection algorithm (getOverlapPatches) is shown in supplementary materials.

Next we show how to apply the spatial consistency based method to a given input and therefore recognize adversarial examples. The detailed algorithm is shown in Algorithm 1. Here K denotes the number of overlapping regions for which we will check the spatial consistency. We use the mean Intersection Over Union (mIOU) between the overlapping regions \(O_1\), \(O_2\) from two patches \(P_1\), \(P_2\) to measure their spatial consistency. The mIOU is defined as \(\frac{1}{n_\textit{cls}}\sum _{i} n_{ii} / (\sum _{j}n_{ij} + \sum _{j} n_{ji} - n_{ii})\), where \(n_{ij}\) denotes the number of pixels predicted to be class i in \(O_1\) and class j in \(O_2\), and \(n_\textit{cls}\) is the number of the unique classes appearing in both \(O_1\) and \(O_2\). \(\mathbf {getmIOU}\) is a function that computes the mIOU given patches \(P_1\), \( P_2\) along with their overlapping regions \(O_1\) and \(O_2\) shown in supplementary materials.

figure a

4 Scale Consistency Analysis

We have discussed how spatial consistency can be utilized to potentially characterize adversarial examples in segmentation task. In this section, we will discuss another baseline method: image scale transformation, which is another natural factor considered in semantic segmentation [22, 28]. Here we focus on image blur operation by applying Gaussian blur to given images [6], which is studied for detecting adversarial examples in image classification [39]. Similarly, we will analyze the effects of image scaling on benign/adversarial samples. Since spatial context information is important for segmentation task, scaling or performing segmentation on small patches may damage the global information and therefore affect the final prediction. Here we aim to provide quantitative results to understand and explore how image scale transformation would affect adversarial perturbation.

4.1 Scale Consistency Property

Scale theory is commonly applied in image segmentation task [35], and therefore we train scale resilient models to obtain robust ones, which we perform attacks against. On these scale resilient models, we first analyze how image scaling affect segmentation results for benign/adversarial samples. We applied the DAG [49] and Houdili [10] attacks against the DRN and DLA models with different adversarial targets. The images and corresponding segmentation results before and after scaling are shown in Fig. 5. We apply Gaussian kernel with different standard deviations (std) to scale both benign and adversarial instances. It is clear that when we apply Gaussian blurring with higher std (3 and 5), adversarial perturbation is harmed and the segmentation results are not longer adversarial targets for scale transformed adversarial examples as shown in Fig. 5(a)–(e).

Fig. 5.
figure 5

Examples of images and corresponding segmentation results before/after image scaling on Cityscapes against DRN model. For each subfigure, the first column shows benign/adversarial image, while the later columns represent images after scaling by applying Gaussian kernel with std as 0.5, 3, and 5, respectively. (a) shows benign images before/after image scaling and the corresponding segmentation results; (b)–(e) present similar results for adversarial images generated by DAG and Houdini attacks targeting on Kitty and Pure.

5 Experimental Results

In this section, we conduct comprehensive large scale experiments to evaluate the image spatial and scale consistency information for benign and adversarial examples generated by different attack methods. We will also show that the spatial consistency based detection method is robust against sophisticated adversaries with knowledge about defenders, while scale transformation method is not.

5.1 Implementation Details

Datasets. We apply both Cityscapes [11] and BDD100K [53] in our evaluation. We show results on the validation set of both datasets, which contains 500 high resolution images with a combined 19 categories of segmentation labels. These two datasets are both outdoor datasets containing instance-level annotations, which would raise real-wold safety concerns if they were attacked. Comparing with other datasets such as Pascal VOC [15] and CamVid [3], these two dataset are more challenging due to the relatively high resolution and diverse scenes within each image.

Semantic Segmentation Models. We apply Dilated residual networks (DRN) [51] and Deep Layer Aggregation (DLA) [52] as our target models. More specifically, we select DRN-D-22 and DLA-34. For both models, we use 512 crop size and 2 random scale during training to obtain scale resilient models for both the BDD and Cityscapes datasets. The mIOU of these two models on pristine training data are shown in Table 1. More result on different models can be found in supplementary materials.

Adversarial Examples. We generate adversarial examples based on two state-of-the-art attack methods: DAG [49] and Houdini [10] using our own implementation of the methods. We select a complex image, Hello Kitty (Kitty), with different background colors and a random pure color (Pure) as our targets on Cityscapes dataset. Furthermore, in order to increase the diversity, we also select a real-world driving scene (Scene) without any cars from the BDD training dataset as another malicious target on BDD. Such attacks potentially show that every image taken in the real world can be attacked to the same scene without any car showing on the road, which raises great security concerns for future autonomous driving systems. Furthermore, we also add three additional adversarial targets, including “ECCV 2018”, “Remapping”, and “Color strip” in supplementary materials to increase the diversity of adversarial targets.

We generate 500 adversarial examples for Cityscapes and BDD100K datasets against both DRN and DLA segmentation models targeting on various malicious targets (More results can be found in supplementary materials).

5.2 Spatial Consistency Analysis

To evaluate the spatial consistency analysis quantitatively for segmentation task, we leverage it to build up a simple detector to demonstrate its property. Here we perform patch based spatial consistency analysis, and we select patch size and region bound as \(s=512\), \(b_{low}=32, b_{upper} = 64\). We select the number of overlapping regions as \(K \in \{1, 5, 10, 50\}\). Here we first select some benign instances, and calculate the normalize mIOU of overlapping regions from two random patches. We record the lower bound of theses mIOU as the threshold of the detection method. Note that when reporting detection rate in the rest of the paper, we will use the threshold learned from a set of benign training data; while we also report Area Under Curve (AUC) of Receiver Operating Characteristic Curve (ROC) curve of a detection method to evaluate its overall performance. Therefore, given an image, for each overlapping region of two random patches, we will calculate the normalize mIOU and compare with the threshold calculated before. If it is larger, the image is recognized as benign; vice versa. This process is illustrated in Algorithm 1. We report the detection results in terms of AUC in Table 1 for adversarial examples generated in various settings as mentioned above. We observed that such simple detection method based on spatial consistency information can achieve AUC as nearly 100% for adversarial examples that we studied here. In addition, we also select s with a random number between 384 to 512 (too small patch size will affect the segmentation accuracy even on benign instances, so we tend not to choose small patches on the purpose of control variable) and show the result in supplementary materials. We observe that random patch sizes achieve similar detection result.

Table 1. Detection results (AUC) of image spatial (Spatial) and scale consistency (Scale) based methods on Cityscapes dataset. The number in parentheses of the Model shows the number of parameters for the target mode, and mIOU shows the performance of segmentation model on pristine data. We color all the AUC less than 80% with red.

5.3 Image Scale Analysis

As a baseline, we also utilize image scale information to perform as a simple detection method and compare it with the spatial consistency based method. We apply Gaussian kernel to perform the image scaling based detection, and select \(\mathbf {std}_{detect}\in \{0.5, 3,5\}\) as the standard deviation of Gaussian kernel. We compute the normalize mIOU between the original and scalled images. Similarly, the detection results of corresponding AUC are shown in Table 1. It is demonstrated that detection method based on image scale information can achieve similarly high AUC compared with spatial consistency based method.

5.4 Adaptive Attack Evaluation

Regarding the above detection analysis, it is important to evaluate adaptive attacks, where adversaries have knowledge of the detection strategy.

Fig. 6.
figure 6

Performance of adaptive attack. (a) shows adversarial image and corresponding segmentation result for adaptive attack against image scaling. The first two rows show benign images and the corresponding segmentation results; the last two rows show the adaptive adversarial images and corresponding segmentation results under different std of Gaussian kernel (0.5, 3, 5 for column 2–4). (b) and (c) show the performance of adaptive attack against spatial consistency based method with different K. (b) presents mIOU of overlapping regions for benign and adversarial images during along different iterations. (c) shows mIOU for overlapping regions of benign and adversarial instances at iteration 200.

As Carlini and Wagner suggest [4], we conduct attacks with full access to the detection model to evaluate the adaptive adversary based on Kerckhoffs principle [36]. To perform adaptive attack against the image scaling detection mechanism, instead of attacking the original model, we add another convolutional layer after the input layer of the target model similarly with [4]. We select std \(\in \{0.5, 3, 5\}\) to apply adaptive attack, which is the same with the detection model. To guarantee that the attack methods will converge, when performing the adaptive attacks, we select 0.06 for the upper bound for adversarial perturbation, in terms of \(L_2\) distance (pixel values are in range [0,1]), since larger than that the perturbation is already very visible. The detection results against such adaptive attacks are shown in Table 1 on Cityscapes (We omit the results on BDD to supplementary materials). Results on adaptive attack show that the image scale based detection method is easily to be attacked (AUC of detection drops dramatically), which draws similar conclusions as in classification task [4]. We show the qualitative results in Fig. 6(a), and it is obvious that even under large std of Gaussian kernel, the adversarial example can still be fooled into the malicious target (Kitty).

Fig. 7.
figure 7

Detection performance of spatial consistency based method against adaptive attack with different K on Cityscapes with DRN model. X-axis indicates the number of patches selected to perform the adaptive attack (0 means regular attack). Y-axis indicates the number of overlapping regions selected for during detection.

Next, we will apply adaptive attack against the spatial consistency based method. Due to the randomness of the approach, we propose to develop a strong adaptive adversary that we can think of by randomly select K patches (the same value of K used by defender). Then the adversary will try to attack both the whole image and the selected K patches to the corresponding part of malicious target. The detailed attack algorithm is shown in the supplementry materials. The corresponding detection results of the spatial consistency based method against such adaptive attacks on Cityscapes are shown in Table 1. It is interesting to see that even against such strong adaptive attacks, the spatial consistency based method can still achieve nearly 100% detection results. We hypothesize that it is because of the high dimension randomness induced by the spatial consistency based method since the search space for patches and the overlapping regions is pretty high. Figure 6(b) analyzes the convergence of such adaptive attack against spatial consistency based method. From Fig. 6(b) and (c), we can see that with different K, the selected overlapping regions still remain inconsistent with high probability.

Since the spatial consistency based method can induce large randomness, we generate a confusion matrix of detection results for adversaries and detection method choosing various K as shown in Fig. 7. It is clear that for different malicious targets and attack methods, choosing \(K=50\) is already sufficient to detect sophisticated attacks. In addition, based on our empirical observation, attacking with higher K increases the computation complexity of adversaries dramatically.

Fig. 8.
figure 8

Transferability analysis: cell (ij) shows the normalized mIoU value or pixel-wise attack success rate of adversarial examples generated against model j and evaluate on model i. Model A,B,C are DRN (DRN-D-22) with different initialization. We select “Hello Kitty” as target

5.5 Transferability Analysis

Given the common properties of adversarial examples for both classifier and segmentation tasks, next we will analyze whether transferability of adversarial examples exists in segmentation models considering they are particularly sensitive to spatial and scale information. Transferability is demonstrated to be one of the most interesting properties of adversarial examples in classification task, where adversarial examples generated against one model is able to mislead the other model, even if the two models are of different architectures. Given this property, transferability has become the foundation of a lot of black-box attacks in classification task. Here we aim to analyze whether adversarial examples in segmentation task still retain high transferability. First, we train three DRN models with the same architecture (DRN-D-22) but different initialization and generate adversarial images with the same target.

Each adversarial image has at least 96% pixel-wise attack success rate against the original model. We evaluate both the DAG and Houdini attacks and evaluate the transferability using normalized mIoU excluding pixels with the same label for the ground truth adversarial target. We show the transferability evaluation among different models in the confusion matrices in Fig. 8Footnote 1. We observe that the transferability rarely appears in the segmentation task. More results on different network architectures and data sets are in the supplementary materials.

As comparison with classification task, for each network architecture we train a classifier on it and evaluate the transferability results as shown in supplementary materials. As a control experiments, we observe that classifiers with the same architecture still have high transferability aligned with existing findings, which shows that the low transferability is indeed due to the natural of segmentation instead of certain network architectures.

This observation here is quite interesting, which indicates that black-box attacks against segmentation models may be more challenging. Furthermore, the reason for such low transferability in segmentation is possibly because adversarial perturbation added to one image could have focused on a certain region, while such spatial context information is captured differently among different models. We plan to analyze the actual reason for low transferability in segmentation in the future work.

6 Conclusions

Adversarial examples have been heavily studied recently, pointing out vulnerabilities of deep neural networks and raising a lot of security concerns. However, most of such studies are focusing on image classification problems, and in this paper we aim to explore the spatial context information used in semantic segmentation task to better understand adversarial examples in segmentation scenarios. We propose to apply spatial consistency information analysis to recognize adversarial examples in segmentation, which has not been considered in either image classification or segmentation as a potential detection mechanism. We show that such spatial consistency information is different for adversarial and benign instances and can be potentially leveraged to detect adversarial examples even when facing strong adaptive attackers. These observations open a wide door for future research to explore diverse properties of adversarial examples under various scenarios and develop new attacks to understand the vulnerabilities of DNNs.