1 Introduction

Psoriasis is a chronic, immune-mediated, relapsing, inflammatory skin disease with variable morphology, distribution, severity and course [2]. The prevalence of psoriasis varies 1%–12% among different populations worldwide. It is often difficult to differentiate psoriasis from other erythemato-squamous diseases like Seborrheic dermatitis, Leprosy, Lichen planus, Tinea corporis, Pityriasis, Eczema etc. [3, 4, 6]. Hence, histopathological examination is considered for confirmation.

Psoriasis develops when the immune system mistakes a normal skin cell for a pathogen and sends out faulty signals to yield the over production of new skin cells. Hence, in many cases, neutrophils and nucleated cells infiltrate into stratum corneum (SC). This infiltration occurs either in confluent (throughout the SC layer) or in focal (not confluent) manner. The presence of nucleated cells in SC is termed as parakeratosis and the accumulation of neutrophils in SC along with parakeratosis is termed as Munro’s Microabscess (MM). In clinical pathology, Munro’s Microabses is considered as the diagnostic hallmark of psoriasis [2].

Challenge in detecting MM lies in the fact that due to staining variation, neutrophils in stratum corneum are often misclassified as nucleated keratinocytes since both of them become dark stained. Several imaging artefacts add further challenges and thus, accurate diagnosis of skin biopsy by eye-inspection is challenging, even for highly experienced pathologists. Figure 1 illustrates the problem.

Fig. 1.
figure 1

Patches cropped from stratum corneum layer of WSIs. Neutrophils are circular shaped and dark blue colored. Nucleus are oval shaped and light blue colored. Here, in (a) and (b) both neutrophils and nucleated cells are present, in (c) only nucleated cells are present and in (d) neither nucleated cells nor neutrophils are present.

In the last two decades, several automatic systems are designed and developed to complement the workload of pathologists from microscopic examination of clinical tissue. But there is a dearth in automatic identification of neutrophils in biopsy images. Some research initiatives are reported for acute inflammation diagnosis [8, 9] where Giga-pixel images are used.

The contributions of this paper are many folds. The pathological challenges of detection of MM in skin biopsy have been solved by developing an automated computational framework incorporating the latest advances in deep learning. The capsule network has been designed in such a way that drastically reduces the number of parameters without sacrificing performance. Mega-pixel images are used instead of usual Giga-pixel ones to reduce the computational burden and thereby supporting a low cost imaging system. Preparation of an annotated dataset of 273 whole slide skin biopsy images (WSI) is another important outcome of this research. This datasetFootnote 1 not only helps to demonstrate the efficiency of the present approach, but would also facilitate further research.

2 Proposed Methodology

The goal of this paper is to detect Munro’s Microabscess in whole slide skin biopsy images (WSI). We break down the task into three parts (i) Segmentation of SC layer, (ii) Patch extraction from SC layer and (iii) Neutrophil detection in the extracted patches. Note that we are only interested in detecting neutrophils present in the SC layer as opposed to the entire image and the SC layer lies in a small portion of the WSI image (approx. 2–15%). Hence, segmentation of SC layer followed by neutrophil detection is a logical approach. The pictorial representation of the proposed framework is shown in Fig. 2. A brief description about proposed framework is given in the following subsections.

Fig. 2.
figure 2

Proposed system architecture. The green arrows represent convolution layer followed by ReLU activation, red arrows represent max-pooling layer, pink arrows represent up-sampling layer and dark purple arrow represent skip connection.

Stratum Corneum Segmentation. Nowadays, U-Net [5] is the state-of the art for image segmentation. We trained the U-Net given in Fig. 2 for segmenting the SC layer. Given the segmentation output \(S_{x,y}\) and the corresponding ground truth \(G_{x,y}\), we minimize the dice loss function \(L_{seg}\) given by

$$\begin{aligned} L_{seg} = \frac{2\times \sum _{x,y} G_{x,y}S_{x,y}}{\sum _{x,y} G_{x,y} + \sum _{x,y} S_{x,y}} \end{aligned}$$
(1)

where xy denote the spatial coordinate of the WSI image.

Stratum Corneum Patch Selection. The stratum corneum patches should be selected in such a way that all the pixels of a patch belong to a perceptually similar region as well as their union covers the entire stratum corneum (SC). For dividing the image into perceptual regions, Simple Linear Iterative Clustering (SLIC) [1] super-pixel algorithm is applied. Then a square window around the centroids of the super-pixels which lie in the stratum corneum are selected.

CapsDeMM for Stratum Corneum Patch Classification. Recently, convolutional neural networks have achieved state of the art for several image classification tasks. But the max pooling operation used in traditional convolutional neural network architecture may ignore important spatial information cues which is undesirable. So, in this paper, recently introduced capsule network [7] is adopted for SC patch classification. Capsule network uses “routing by agreement”policy to ensure that significant spatial information in an image is not lost as we go from lower to higher layers.

The capsule network consists of two parts, namely, primary capsule and secondary capsule. We designed the capsule network in such a way that the receptive fields of the capsules in the secondary capsule avoid crowding. Crowding in capsule networks refers to the phenomenon where multiple instances of the same entity is present in the receptive field of a capsule. In such a case, the capsule is unable to encode the instantiation parameters of the concerned entity. The length of the output vector of secondary capsule denotes the probability of neutrophil in a particular portion of the image patch. There can be several neutrophils in a particular image patch. Keeping this in mind, average of the top K probabilities (top-K average pooling) is considered as the probability of neutrophil in the given image patch. Given an image patch I, let \(p_{I}\) denote the probability of neutrophil in the image patch. Then, during training, we minimize the binary cross entropy loss function \(L_{patch}\) given by

$$\begin{aligned} L_{patch} = - y_{I}* log(p_{I}) - (1 - y_{I})* log(1-p_{I}) \end{aligned}$$
(2)

where \(y_{I}\) is 1 if the image patch contains neutrophil and 0 otherwise. This network is further referred to as CapsDeMM (Capsule network for Detection of Munro’s Microabscess).

3 Dataset

In this research, after clinical confirmation of psoriasis, affected tissues are collected in 10% formalin under the supervision of an expert dermatologist. Formalin fixed tissues are dehydrated and embedded in paraffin blocks. Thin sections (5 \(\mu \)M) are used for slide preparation and then stained with hematoxylin and eosin to prepare the histopathological slides. These slides are kept under a microscope with 10X magnification and the images are collected from the microscope using a digital camera. The images are captured in 10X magnification as this is the highest magnification in which the whole biopsy sample fits adequately in the field of view of the camera. The size of the captured images are \(1936 \times 2584\) pixels. Written informed consents were obtained from all patients before recruiting them for the study. This study is conducted after obtaining ethical approval from the ‘Review Committee for Protection of Research Risks to Humans’ of Indian Statistical Institute, Kolkata, West Bengal, India.

The skin biopsy samples are collected from 120 patients. Multiple serial sections of skin tissue present in the biopsy slides are imaged. The images where the stratum corneum (SC) layer is lost during tissue processing are discarded. Then the images are labelled by two experts. The images where both experts’ agreement match are considered for this research. In our dataset, there are 88 images with Munro’s Microabscess and 185 images without Munro’s Microabscess. The ground-truth annotation of SC segmentation of the images is done by an expert. In order to construct the SC patch classifier, multiple squared patches (\(224\times 224\) pixel sized) are cropped from SC layer of the biopsy images. Then the existence of neutrophil in these patches are labelled by two experts and the cases where both experts’ agreement match are chosen. In total, there are 886 patches with neutrophils and 1700 patches without any neutrophil. In rest of the paper, the SC patches having neutrophils are termed as positive patches and the patches not having neutrophils are termed as negative patches.

4 Experiments

4.1 Experimental Setting

The proposed system is tested with three-fold cross validation. Each fold contains random selection of 91 images from our dataset. Among them, two folds contain 29 images and another fold contains 30 images having Munro’s Microabscess. The cropped SC patches are grouped fold-wise (862 patches/fold) to build the fold-wise SC patch classifiers. The validation data is developed by random selection of 10% training images (original WSI) for U-Net and 20% training images (SC patches) for CapsDeMM. The validation data is used for tuning filter number, filter size and other hyper-parameters. The architecture of the used U-Net and capsule network are shown in Fig. 2. In CapsDeMM, each primary capsule contains 8 convolutional units (\(5\times 5\) and stride 2) of 16 channels and the secondary capsule is a convolutional capsule which does routing by agreement between the capsules in the same spatial region of the primary capsule. For U-Net, keeping the resource constraints in mind, the original images and the corresponding ground-truths are down-sampled to \(960\times 1280\). However, the segmentation output is up-scaled to the original dimension and \(224\times 224\) pixel sized SC patches are selected for classification. In order to achieve the best performing CapsDeMM network, the value of K for the top-K average pooling layer is tuned (without making any architectural change to the other layers) and the resulting network is named as CapsDeMM-K.

4.2 Results and Discussion

Segmentation Performance: The accurate segmentation of the SC region improves the diagnosis performance of our system. The U-Net produces good segmentation (Dice Coefficient of 0.8493 ± 0.025) but it generates several spurious holes and isolated segmented regions in the SC regions as shown in Fig. 3. To get rid of it, ‘hole-filling’ algorithm is applied for smoothing such spurious and isolated regions. Figure 3 illustrates that the used smoothing technique is able to remove falsely detected non-SC regions. Final segmentation outcome results in Dice Coefficient of 0.8614 ± 0.014.

Fig. 3.
figure 3

The yellow line is used to denote the ground-truthed region boundary whereas the green line is used to denote the detected region boundary. (a) Segmentation outcome from the U-Net, (b) Segmentation outcome after post-processing.

Fig. 4.
figure 4

Comaprison of ROC curves.

Table 1. Performance comparison of patch classifiers. M = Million, s = second.

Stratum Corneum Patch Classification: The development of stratum corneum patch classification is an important component for the success of the proposed system. The capsule network shown in Fig. 2 is used for this purpose. In \(top-K\) average pooling, lower values of K might misclassify an image patch as positive due to some portions of the patches getting high probabilities whereas higher values of K might overcompensate this effect leading to positive samples being classified as negative. To get the optimum value of K, we compare the ROC curves for different values of K. \(K=5\) is chosen for classification since it provides best AUC score (average of three folds). The diagnosis for presence or absence of neutrophils in a patch is made by comparing the cut off value obtained from the ROC analysis. In case the network output for a patch is less than the cut off value, the predicted output is negative, otherwise, the predicted output is positive. The comparison of ROC curves for first fold for five different K values (1, 3, 5, 7 and 9) is shown in Fig. 4.

The classification performance is evaluated with Recall, Precision, F1 Score and classification accuracy (ACC). The average value for all metrics for all three folds are listed in Table 1. Finally, the performance of the proposed capsule network is compared with a trained CNN i.e. ResNet-50, trained on the same dataset. The experimental result shows that Capsule network achieves comparable accuracy to ResNet-50 despite having orders of magnitude less parameters.

Whole Slide Image (WSI) Diagnosis: Ideally, detection of a single positive SC patch should indicate the presence of Munro’s microabscess. But there are occasional misclassification in classifying SC patches (see Table 1). So, to develop a robust WSI Classification system, a threshold T is decided from the training set. Only those WSIs which have more than T number of positive patches are diagnosed as having Munro’s microabscess. Two different strategies are analysed for selecting the value of T - (I) The system should produce best WSI classification performance: T is selected in such a way that the best WSI classification accuracy is achieved; (II) The system will reduce the workload of pathologists by rejecting the negative cases (slides without having Munro’s Microabscess): T is selected in such way that true negative rate is maximized. Strategy I is evaluated with correct classification accuracy (ACC) and strategy II is evaluated with True Negative Rate (TNR) and Precision. Obviously, the value of T varies across the folds depending on the used thresholding strategy.

In order to compare the performance of the proposed diagnostic system on the super-pixel numbers, 3 different number of super-pixels are considered. The performance averaged over all the folds is listed in Table 2. According to Table 2, both capsule network and ResNet-50 produces comparable performances. Note that when Strategy I is considered, CapsDeMM outperforms ResNet-50 but when Strategy II is considered, ResNet-50 outperform CapsDeMM.

Table 2. Performance comparison of WSI classifiers

5 Conclusion and Future Work

This paper presents the first of its kind system for detection of Munro’s Microabscess in skin biopsy images. The drastic reduction of parameters without notable performance degradation justifies the applicability of CapsDeMM for the present problem. The promising performance of the two strategies for WSI classification presented in the paper shows their applicability for reducing the workload of the pathologists by a huge margin. The use of Mega-pixel images not only reduces the overall computational burden but also attests the use of a low cost system consisting of a light microscope (without digital scanner) and a digital camera. The outcome of the present research along with the dataset of WSIs will help in addressing several other important histopathological analysis of psoriasis including classification of parakeratosis (confluent/focal), detection of Kogoj Microabscesses. Efficiency of the present framework can be validated by employing different architectures and the patch classification can also be attempted with patches of different shapes (e.g. rectangular).