1 Introduction

Stroke is one of the leading cause of death in developed countries. The disease is caused by either blockage (ischemic stroke) or rupture of a blood vessel (hemorrhagic stroke). Among the two types of stroke, ischemic stroke takes up roughly 80% [1]. The prevailing imaging modalities for diagnosing brain strokes are magnetic resonance imaging (MRI) and computed tomography (CT). Different MRI sequences such as T1 weighted, T2 weighted, Diffusion Weighted Imaging (DWI) and Fluid Attenuated Inversion Recovery (FLAIR) are utilized for specialized applications. DWIs are especially suitable for ischemic strokes since it is highly sensitive to lesion changes [8].

Segmentation of brain area affected by ischemic stroke lesion plays a crucial role in treatment assessment and prognosis. Producing accurate predictions is challenging due to the variability in the shapes and sizes of the targets. Recent studies [2] have shown that perfusion Computed Tomography (CT) shows potential improvement in speed, availability and lack of contraindications compared to MRI. Computer-aided diagnosis (CAD) system using perfusion CT may help clinicians with faster and more accurate diagnosis. In previous works, models such as random forests, support vector machines and autoencoders [9] have been employed to segment ischemic stroke lesion and have shown successful results.

Computer vision tasks such as image recognition, detection, and segmentation have had significant advances in the past few years due to the rise of deep learning, specifically in Convolution Neural Networks (CNN). Medical applications of deep learning have also seen profound successes. As neural networks tend to get deeper as we harness more computational power, the problem of vanishing gradients problem ensues. Vanishing gradients occur when gradients become too small to change the weights of the neuron in back-propagation-trained neural networks. The residual learning networks [4] (ResNet) solves this problem by introducing stacked identity mappings in the form of residual blocks. These residual connections allow the neural network to collapse into a few layers during initialization and gradually expand in the feature space as training takes place. Recently, generative adversarial networks (GAN) [3] have been utilized extensively throughout image generation tasks. Recent studies [6, 10] have shown that GANs can also be used in a critique framework for semantic segmentation tasks. The benefits of using such networks include comparing the higher level of inconsistencies between ground truth and predictions and enforcing spatial continuity. In this framework, generating pixel-wise segmentation masks are modeled as a generative procedure and the discriminator of the model attempts to distinguish between real and fake segmentation masks.

In this paper, we’ve developed a neural-network with adversarial training to segment irreversibly damaged brain area caused by ischemic stroke. The proposed model is trained and validated on the Ischemic Stroke Lesion Segmentation(ISLES) challenge dataset [7]. The ISLES challenge aims at providing a unified platform and high-quality data for training and evaluating models for automatic stroke lesion segmentation. In order to model the variability in the true distribution and improve prediction accuracy, adversarial training. For preprocessing, each modality is normalized and stacked as multi-channel inputs. The overall loss function consists of three terms: negative dice coefficient and binary cross-entropy between the ground truth mask and prediction plus the discriminator loss between real and generated segmentation masks. Our method produced promising results and achieved an average DICE coefficient of 0.87 on the ISLES training dataset.

2 Method

The detailed model architecture and training procedure of the proposed methods are described in this section. First, we address the necessary steps for preprocessing the data. Then we introduce the architecture of the segmentation network. Finally, we illustrate two adversarial paradigms proposed for training the segmentation model.

2.1 Data

We performed training and validation on the 2018 ISLES challenge dataset. The training dataset contained a total of 63 patients each with 5 different perfusion maps: cerebral blood flow (CBF), Mean transit time (MTT), cerebral blood volume (CBV), time to peak of residue function (TMAX) and computed tomography angiography (CTP). An example of the training data can be found in Fig. 1. The training data also included gold standard diffusion-weighted imaging (DWI) maps that are not available in testing data. The ground truth segmentation masks were derived from the DWI. The data provided are in Neuroimaging Informatics Technology Initiative (NIfTI) format. We used Insight Segmentation and Registration Toolkit (ITK) [12] for data inspection and visualization.

Fig. 1.
figure 1

Example of training data and corresponding 3-D annotation

2.2 Preprocess

Preprocessing is necessary due to the significant cross-modality variance. There are also substantial deviation in the spatial resolution as dimension of the z-axis ranges from 2 to 16 for different subjects. First, we conducted bicubic spline interpolation [5] to resize each volume to the same dimension. During training and testing, each modality is then normalized respectively by subtracting the mean intensity and divide by the standard deviation as shown in the following equation:

$$\begin{aligned} x_m'(i,j,k) = \frac{x_m(i,j,k) - \mu _m}{\sigma _m} \end{aligned}$$
(1)

Where m denotes the modality, \(\mu _m\) denotes mean intensity and \(\sigma _m\) denotes the standard deviation. i, j, k is the coordinates of the pixel to be normalized. Finally, the normalized whole volume are stacked as multi-channel inputs for the segmentation network.

2.3 3D Residual U-Net

The backbone of the segmentation network is a 3D U-net with residual connection [11]. Network structure and details of the residual block of the can be found in Fig. 2 The U-net consists of both down-sampling and up-sampling pathways. The down-sampling pathway is made up 4 residual blocks and the upsampling path contained 4 transposed convolution blocks. Each residual block contains three \(3\times 3\times 3\) convolution layer, batch normalization and activation function with leaky reciftied liner unit in between. The up-sampling pathway contained 4 transpose convolution operation and concatenation with corresponding feature maps from the down-sampling pathway.

Fig. 2.
figure 2

Top figure shows the architecture of the 3-D residual U-net. Bottom figure shows a single residual block

2.4 Adversarial Training

The adversarial pipeline is a two-player mini-max game between the segmentation network and the discriminator network. Figure 3 shows the high-level view of the training procedure. In each training iteration, the segmentation network will generate a pixel-wise probability map which is then fed to the discriminator network as inputs. The objective of the discriminator is to distinguish between ground truth segmentation mask and predicted mask. The discriminator is a 7-block network containing 3 residual blocks similar to the 3D U-net. Maxpooling was conducted after every residual block. The discriminator network is solely for auxiliary purposes and therefore removed during testing phase. The discriminator network enforces spatial continuity that is otherwise not obtainable by using only pixel-wise classification loss.

We denote the ground truth mask as y, image data as x, U-net as U and discriminator as D, the solution to the mini-max game can, therefore, be written as:

$$\begin{aligned} \min _U \max _{D} E_{y \sim p(y)}[\log D(y)]+ E_{x \sim p(x)}[\log (1-D(U(x)) ] \end{aligned}$$
(2)
Fig. 3.
figure 3

Overview of the adversarial training paradigm

There are different ways that the adversarial training can be carried out. We proposed two training paradigm for the adversarial pipeline, namely:

  • Integrated loss

  • Second back propagation

Integrated Loss. The integrated loss paradigm adds the adversarial loss to the traditional segmentation losses and forms a integrated loss term. Back-propagation are carried out based on the gradients of the integrated loss term. The discriminator network are back-propagated by the errors of not recognizing true label and misclassifying synthetic label as true. The integrated loss function for the segmentation network contained a total of three terms: binary cross entropy loss, negative dice score and adversarial loss as seen in the following equation:

$$\begin{aligned} \mathcal {L}_{total} = \alpha \mathcal {L}_{adver} + \beta \mathcal {L}_{BCE} + \gamma \mathcal {L}_{dice} \end{aligned}$$
(3)

Where the \(\alpha , \beta , \gamma \) are coefficients for each loss terms. We initialized all three coefficients as 1. The coefficients are adjusted by weight decay mechanism which we describe in implementation details section. Detailed algorithm can be found in Algorithm 1.

Second Back-Propagation. In the second back-propagation pardigram, the segmentation network is back-propagated twice. First, the weights are adjusted according to the gradients of the traditional segmentation loss. At the adversarial training phases of each iteration, gradients from the adversarial loss are then passed onto the segmentation network for a second back-propagation. The discrimination network is back-propagated only once. Detailed training algorithm can be found in Algorithm 2.

figure a
figure b

2.5 Implementation Detail

The proposed model was established with python under the pytorch deep learning framework. The learning rate was set differently for the segmentation network and the critique network to avoid collapsing in early epochs, which is a common phenomenon in GANs. Learning rates were initialized at 0.001 for the segmentation model and 0.0005 for the discriminator network. Learning rate decay will take place if there were no improvements of the loss function 5 consecutive epochs. Each learning rate decay reduces the learning rate to 80% of the previous iteration. Early termination will take place if no improvements were seen for 20 consecutive epochs. The mini-batch size was set at 8. GPU training was conducted on 4 NVIDIA Tesla V100. The total training time for the entire pipeline that included segmentation network and discrimination network was approximately 24 h.

Fig. 4.
figure 4

Effects of adversarial training

3 Results

In this section, we present quantitative results of the proposed model and qualitative comparison of the adversarial training effects. Several matrices including the mean of Dice score, the standard deviation of Dice score, mean of Hausdorff distance and standard deviation of Hausdorff distance were used for model evaluation. Figure 4 is a visualization of the adversarial training effects. As shown in the figure, models with adversarial training are able to capture subtle differences between ground truth and predictions. Table 1 shows that by incorporating adversarial training, dice score increased and Hausdorff distance reduced.

Table 1. Dice and Hausdorff distance comparison between three training paradigm

4 Discussion

In this paper, we’ve presented an automatic ischemic stroke lesion segmentation model using multiple CT perfusion maps with varying dimensions as inputs. We proposed two adversarial training paradigm, namely integrated loss function and second back-propagation. We’ve demonstrated that by incorporating a discriminator network in the training procedure, the segmentation model is able to mimic subtle inconsistencies between ground truth and prediction that cannot be corrected using only pixel-wise loss functions such as binary cross entropy and dice score. Quantitatively, employing adversarial training increases dice score and reduces Hausdorff distance.