Keywords

1 Introduction

The malignant melanoma is one of the most common and dangerous skin cancer, in fact 100,000 new cases with over 9,000 deaths are diagnosed every year by only considering the USA [7, 8]; in this context automated system for fast and accurate melanoma detection are well accepted, also considering technical approaches like machine learning methods and the Convolutional Neural Networks.

The premise for this work was the development of an annotation tool for epidermal images: this has been the first step to create an heterogeneous data integrated system which architecture is depicted in Fig. 1.

Fig. 1.
figure 1

The system architecture proposed for a medical digital library

This architecture presents the Digital Library with inner modules dedicated to image analysis and feature extraction and external tools to manage the Annotation, the Information Visualization and Query search capacities [3], for example by exploiting the DICOM (Digital Imaging and Communications in Medicine) metadata standard.

The dataset coming from hospital equipment currently consists of 436 dermoscopic skin images in JPEG format with \(4000 \times 2664\) or \(4000 \times 3000\) resolution. In the view of the integrated system architecture, at this stage we focus on the exploit of state-of-art reliable methods for extracting the visual features we need, also introducing workarounds to improve results. The annotation software in Fig. 2 has been developed for domain experts like dermatologists which have peculiar working protocols and do not have much time to skill themselves on externals tools since usually overworked.

As hardware platform we chose the Microsoft Surface Pro 3 which is a powerful non-invasive device that can be used in mobility in medical environments; it ensures that the recognized strokes will be only those that come from the specific Surface Pen, avoiding unwanted strokes coming from touch gestures or oversight movements.

This paper is organized as follows: Sect. 2 introduces some literature about epidermal and melanoma image analysis while Sect. 3 explains the skin lesion detection pipelines; Sect. 4 reports results obtained by the experimental sessions and finally, conclusions and future work are illustrated in Sect. 5.

Fig. 2.
figure 2

Microsoft Surface is the hardware platform for the annotation tool (Color figure online)

2 Related Work

Nowadays standard video devices are commonly used for skin lesion inspection systems, in particular in the telemedicine field [20]; however, these solutions bring some issues, like poor camera resolution (melanoma or other skin details can be very small), variable illumination conditions and distortions caused by the camera lenses [14]. A complete and rich survey about skin lesion characterization is in [5] with artifact removal techniques, evaluation metrics, lesion detection and preprocessing methods. The work of Seidenari et al. [19] provides an overview about the detection of melanoma by image analysis, Wighton et al. [21] presents a general model using supervised learning and MAP estimation that is capable of performing many common tasks in automated skin lesion diagnosis and Emre Celebi et al. [9] treats lesion border detection with thresholding methods as Fan et al. [10] that use saliency combined with Otsu thresholding. The work of Peruch et al. [17] faces lesion segmentation through Mimicking Expert Dermatologists’ Segmentations (MEDS) and in Liu et al. [13] propose an unsupervised sub-segmentation technique for skin lesions.

In Codella et al. [6] manually pre-segmented images, already cropped around the region of interest, have been used in conjunction with hand-coded and unsupervised features to achieve state-of-the-art results in melanoma recognition task, with a dataset of 2,000 images. Learning approaches are exploited in Schaefer et al. [18] and deep learning techniques in Abbes and Sellami [2]; studies in [12, 16] have been exploited in literature, while a combination of hand-coded features, sparse-coding methods, HOG and SVMs are used in Bakheet [4].

Finally, in 2016 a new challenge, called Skin Lesion Analysis toward Melanoma Detection [11], has been presented: the aim is to use one of the most complete dataset of melanoma images, collected by the International Skin Imaging Collaboration (ISIC) and obtained with the aggregation dataset of dermoscopic images from multiple institutions, to test and evaluate the automated techniques for the diagnosis of melanomas. The ISIC database has been also exploited in Yuan et al. [22] with a 19-layer deep convolutional neural network while classification and segmentation are achieved using deep learning approaches are also in Majtner et al. [15].

3 Visual Features and Skin Image Processing

In Fig. 3 is depicted a dermoscopic image where a dermatologist annotated the principal external contour of a skin lesion (red color): the primitive meta-data directly extracted by the annotation tool is in Fig. 4.

Fig. 3.
figure 3

A dermoscopic image annotated by a dermatologist (Color figure online)

Fig. 4.
figure 4

The extracted annotation (primitive feature) (Color figure online)

Fig. 5.
figure 5

External black frame pixel mask (first version)

One of the targets of this system is to create and manage the image dataset for the automated analysis by combining low-level visual features representation, image processing techniques and machine learning algorithms: we want to obtain pipelines that detects skin lesions and automatically draws one or more contours by mimiking and increasing the dermatologist knowledge.

The drawn strokes introduce primitive features consisting of binary masks, coordinate points, color codes and pen sizes; the image processing functions extract derived features like contours, shapes, intersections, color features and numerical values.

Our method will exploit the manual annotations alongside image processing algorithms with the aim to evaluate how many pixels of the skin lesion could be automatically detected by the system; a pre-processing phase is necessary to remove thick skin hairs, because these artifacts influence shape and contours extraction [23].

3.1 Hair Removing

To accomplish the task we were inspired to the DullRazor [1] pipeline consisting of

  1. 1.

    Detection step that locates into an RGB image the slender and elongated structures that resemble the hairs on the skin by making an hair pixel mask

  2. 2.

    Replacement step that replaces each detected hair pixel with the interpolation of two lateral pixels chosen from a line segment built on a straight direction.

The Detection phase exploits the generalized grayscale morphological closing operation \(G_c\): for each color channel c (Red-Green-Blue) the operator makes a set of morphologic closing by using different kernels with the aim to compare (by choosing the highest value \(c_p\)), for each pixel, which kernel better approximates a potential hair shape.

The value of G_c for each pixel p is calculated as:

$$\begin{aligned} \forall c\in r,g,b, \forall p, G_c=| b_c(p)-max(c_p)| \end{aligned}$$
(1)

where \(b_c(p)\) is the actual image pixel value and \(c_p\) measures how many pixels of the kernel are verified as hair structure for an image pixel p.

The kernel represents a sort of ‘skeleton’ that reconstructs pixel-to-pixel the hair shape (elongated, slight, weakly curvilinear) on the kernel-closed image and so the kernel structure is a very critical variable; we used four kernels for each of the possible directions (11 pixels horizontal at \(0^\circ /180^\circ \), 11 pixels vertical at \(90^\circ /270^\circ \), 9 pixels oblique at \(45^\circ /225^\circ \) and 9 pixels oblique at \(135^\circ /315^\circ \)) from which a potential hair could spread from a single pixel (located at the center of the kernel).

The final hair mask M is the union of the resulting pixel masks M_r, M_g, M_b, where each mask is obtained by a threshold on the generalized closing operator value previously calculated for each pixel.

Before the Replacement phase is performed, we must verify that each candidate hair pixel belongs to a valid thick and long structure so, for each direction previously described, a path is built having the pixel at the center until the non-hair regions are reached. The longest path is used to take the two interpolation pixels, selected on both the sides, perpendicular to the directions, at a fixed distance from the hair structure borders.

This pipeline is greatly affected by the variables used at each step, for example the kernel shape, the \(G_c\) threshold for the hair masks and the distance for the choose of the interpolation pixels.

3.2 Skin Lesion Detection

After the treatment that removes most of the hairs, it is possible to design the detection of the skin lesion area: we begin by using standard image processing techniques but with a fine-tuned work pipelines that will filter and refine results:

  1. 1.

    Otsu thresholding with pixel mask

  2. 2.

    Color clustering with pixel mask and cluster tolerance.

By considering the peculiar structure of a dermoscopic image, a binary mask M0 (Fig. 5) must be constructed to approximate the large black frame which surrounds the bright circular skin that contains the lesion. We execute a morphologic erosion on the original blurred image with a kernel size of \(201\times 201\): now the Otsu threshold (Fig. 8) is calculated considering the pixels not in M0 and, for future tests and comparisons, the resulting pixels will be in red color (in Fig. 6 the original image, in Fig. 7 with hairs removed).

Fig. 6.
figure 6

Otsu with pixel mask, no hairs removed (Color figure online)

Fig. 7.
figure 7

Otsu with pixel mask, hairs removed (Color figure online)

For the color clustering technique we must develop an heuristic to differentiate if a cluster belongs to the bright skin or to the lesion skin by assign the label skin or lesion to each cluster; moreover it must be considered that a cluster can easily intersect the two area types (some pixels on the lesion and others on the skin) especially on the borders.

To deal with this ambiguity we will develop a pixel toleration area: we execute a further morphologic closing and erosion on the otsu image by using two different kernels to obtain an enlarged mask (Fig. 9) that builds safety areas around the image borders (a side effect is that scattered hair pixels will be removed but groups may be enlarged).

Fig. 8.
figure 8

Otsu from black binary mask, no hairs removed

Fig. 9.
figure 9

Closed and eroded Otsu for the color cluster toleration area

The color cluster will be computed with OpenCV and, as for Otsu pipeline, by considering only the pixels not in the M0 mask; the number of clusters (K = 10) has been chosen empirically after a set of experimental sessions. Moreover the enlarged pixel mask in Fig. 9 has been divided into two sub-masks M1 and M2: the M1 mask represents the largest connected component that from now will replace M0, while M2 (the second largest connected component) will be considered as the tolerance area that approximates the central skin lesion pattern and that must contains the bigger and uniform clusters.

For each color cluster we count the number of pixels considered ‘skin’ or ‘lesion’ by using the M2 masks so if a cluster has more than 10% of its pixels out from this tolerance area it will be considered as simple skin (all of its pixels labeled ‘skin’). It must be noticed that M1 must necessarily be used to exclude clusters that compose the concentric halo near the border between the bright skin and the black frame.

Examples of the global color cluster consisting by all the pixels labeled ‘lesion’ are in Figs. 10 and 11 respectively for the original image and for the one with hairs removed (for the future comparisons the pixels are in red).

Fig. 10.
figure 10

Final global color cluster labeled ‘lesion’, no hairs removed (Color figure online)

Fig. 11.
figure 11

Final global color cluster labeled ‘lesion’, hairs removed (Color figure online)

4 Experimental Setup and Results

To test which of the proposed detection pipelines reaches the best performance we will take from the dataset a small group of hard images where skin lesions differ from each other for size, colors, patterns, type (pigmented or not) and moreover for the presence of thick hairs at various sizes (Fig. 12).

Fig. 12.
figure 12

Examples of hard dermoscopic images (Color figure online)

Each of the 17 chosen images are differentiated in ‘original’ version (Dataset 1) and ‘hair removed’ version (Dataset 2) for a total of 34 images. The experiment has a within-subjects design with two treatments and its structure is:

  1. 1.

    Dataset 1 - Otsu pipeline

  2. 2.

    Dataset 2 - Otsu pipeline

  3. 3.

    Dataset 1 - Color clustering pipeline

  4. 4.

    Dataset 2 - Color clustering pipeline

From a manual annotation we extract the green labeled area whose pixels must be considered as the lesion ground truth (Fig. 13). It is possible to consider, for each image in the datasets, the resulting labeled pixels (previously seen in red color) coming from the two detection pipelines and intersect them with the ground truth area to calculate measures like Precision, Recall, f1-score and Accuracy. To compute the four measures we consider a sort of global goodness for a detection pipeline, coming from the sum of the pixel of each classification, for each image of the dataset.

An example of the comparison between Figs. 13 and 6 for the Otsu pipeline is depicted in Fig. 14:

  • true positives (TP): the pixel intersection (blue pixels)

  • false positives (FP): red pixel that does not intersect the green ones

  • false negatives (FN): green pixels that does not intersect the red ones

  • true negatives (TN): pixels that are neither red nor green.

Fig. 13.
figure 13

Area derived from the manual annotation: ground truth (Color figure online)

Fig. 14.
figure 14

Lesion detection comparison (blue pixels are the true positives) (Color figure online)

Experimental results are in Tables 1 and 2 for the Otsu pipeline while Tables 3 and 4 shows results for the Color clustering pipeline; it must be noticed that due to the peculiar image template and its size (pixel number) the Accuracy metric does not results significant, in fact the True Negatives (TN) represents most the large areas (as the black frame) that were considered and yet naturally excluded for pixel comparison during the detection pipelines.

When comparing results between the two pipelines on each dataset we notice the detection improvement coming from the Color clustering especially in terms of the Precision metric, in fact it increases from 67% to 94% for the dataset 1 and from 73% to 94% for the second one. Also examining results between the two dataset for each pipeline we notice that all detection metrics always improve; only for the second dataset in Color Clustering pipeline, the Precision remains unchanged but Recall increases significantly): this demonstrates the need of hair removing treatment before each detection pipeline.

The f1-score metric provides the best interpretation of experimental results, in fact it increases in the Otsu pipeline (from 75% for dataset 1 to 79% for dataset 2) and in the Color clustering pipeline (from 79% for dataset 1 to 83% for dataset 2) and, as it is evident, it is always better for the second pipeline at the dataset equality. Finally, to explain these results it must be considered that, as previously explained, we considered images having specific characteristics and that, moreover, they represents only a part of the entire dermoscopic dataset.

Table 1. Results of Otsu pipeline on the original dataset
Table 2. Results of Otsu pipeline on the hair removed dataset
Table 3. Results of Color clustering pipeline on the original dataset
Table 4. Results of Color clustering pipeline on the hair removed dataset

5 Conclusions and Future Work

This work represents the second step (after the development and testing of the annotation tool) towards the development of a complex medical data management system used as an agent to support dermatologists in their decisions by exploiting all of the architecture modules while raw and structured data. The capacities must go from the images gathering and their annotation to the feature extraction and the analysis and visualization of their (meta)data, also exploiting Artificial Intelligence methods and CNN for advanced predictive performances.

Further tests and evaluations are needed, also considering the variety of lesion patterns and their stratification but the experiments presented shows encouraging results also with complex dermoscopic images, moreover the evaluation metrics proposed results adequate to verify and measure the best results: the color clustering technique featuring the pixel mask, the hair removing and the toleration areas reaches very positive performances.