Keywords

1 Introduction

Leishmaniasis is a disease caused by protozoan parasites of the genus Leishmania, transmitted by the bite of an infected insect. There are two clinical presentations: Visceral Leishmaniasis (VL) and Cutaneous Leishmaniasis (CL). VL is the most serious and can be fatal. CL does not cause death, but it represents a large burden due to social stigma. Also, CL is related with psychological effects and decreasing of productivity of patients. Since the incidence of this disease is growing, it is necessary to develop new techniques for its diagnosis [1, 2, 10, 11].

Some studies propose the use of spectral data for the diagnosis of skin diseases. Spectral data is refereed to spectral signatures obtained by spectrophotometer, as well as, multispectral or hyperspectral imagery collected by cameras. Spectral system measures the reflected and emitted energy by a surface along the electromagnetic spectrum. Spectral data from skin can provide accurate information to develop non-invasive techniques for the diagnosis of skin diseases. For example, Vyas et al. [14] proposed a non-invasive estimation of skin thickness from hyperspectral imaging; Attia et al. [4] developed a non-invasive real-time characterization of non-melanoma skin cancer; and [6] reviewed several non-invasive techniques for diagnosis of skin cancer, including some based on spectrophotometry data. Despite the advance in this field, more methodologies and techniques are necessary in order to characterize skin ulcers in their different phases of formation and treatment follow-up.

This paper presents results from a project that seeks to develop a portable non-invasive system based on multispectral imaging for the diagnosis and monitoring of skin ulcer treatments caused by Leishmaniasis. For the development of a new multispectectral system, we need to understand the spectral signature of both healthy skin and CL ulcers. An animal model for CL using golden hamsters was employed to build an spectral library. These include several spectra with nearly 2000 bands between 400 nm to 800 nm from healthy skin and ulcers in different phases. In this paper, we presents the evaluation of two unsupervised band selection algorithms, the first based on similarity [7] and the second based on singular value decomposition (SVD) [3]. These algorithms select the most relevant bands for the discrimination of healthy skin and leishmanial ulcers. The comparison of the unsupervised band selection algorithms is performed by using two classifiers: neural network (NN), and support vector machine (SVM).

2 Spectral Band Subset Selection Algorithms

In the literature, several algorithms for band subset selection - BSS can be found. These methods are known as dimensional reduction approaches, which select a set of bands according to a separability criteria. The difference between BSS algorithms with other dimensional reduction approach, such as principal component analysis, is that BSS selects bands from the measured spectrum, allowing the characterization of the materials, and opening the possibility to build low-cost sensing system using the selected bands. For this work, we select two unsupervised BSS algorithms with low computational complexity: similarity-based band selection [7] and singular value decomposition - SVD based band subset selection [3].

2.1 Similarity-Based Band Selection

Du and Yang [7] proposed two unsupervised methods: Linear Prediction - LP and Orthogonal Subspace Projection - OSP, whose basic idea is to look for the most distinctive bands, but ensuring that the selected bands also are the most informative ones. For this paper, we use the LP algorithm, since both algorithms offer the same results, but LP is computationally more efficient by operating relatively smaller matrices. For both, LP and OSP, the hyperspectral data must go through a pre-processing to eliminate water absorption and low signal-to-noise ration bands [7]. Once these bands are removed, a noise whitening is applied. This whitening is easily achieved thanks to the self-decomposition of the covariance matrix, using the method presented in [12].

The algorithm begins with the combination of the two best bands, and this combination increases consecutively until the desired number of bands is selected. The authors suggest a random selection of the first band and then, a projection of the additional bands in the orthogonal subspace of the first band, this to select the bands most dissimilar to each other. However, we chose a different selection method for the first band seeking to improve the performance of this algorithm. Since the LP algorithm seeks also for the most informative ones, we choose the band with the highest variance as the first one. Then, the next band is selected such that it is the most distant from the first one using the euclidean distance [7].

The LP algorithm assumes two bands, \(B_1\) and \(B_2\), belonging to the subset \(\varphi \), which contains the selected bands, with N pixels each one. To find the band most dissimilar to \(B_1\) and \(B_2\), these bands are used to estimate a third band B using Eq. 1.

$$\begin{aligned} B^\prime =a_0+a_1\ B_1+a_2\ B_2 \end{aligned}$$
(1)

where \(B^\prime \) is the linear estimation of B using \(B_1\) and \(B_2\), and \(a_0\), \(a_1\) and \(a_2\) are the parameters that minimize the error of the linear prediction: \( e = \parallel B-B' \parallel \). The parameter vector will be \( a=(a_0, a_1, a_2)\), which can be determined using the least squares solution shown in 2.

$$\begin{aligned} a=(X^T\ X)^{-1} X^T y \end{aligned}$$
(2)

In 2, X is a matrix N x 3 where the first column is one, the second column includes the N pixels of \(B_1\) and the third column includes the pixels of \(B_2\), and y is a vector of N x 1 with the pixels from the band that is being compared. The band B with the minimum error e is the most closely to the band \(B^\prime \), and then it is chosen as \(B_3\). This process is iteratively repeated until reaching the desired number of bands. A seudo-code for this procedure is presented in the Algorithm 1.

figure a

2.2 SVD-Based Band Subset Selection

Velez and Jiménez [3] proposed an unsupervised method based on the singular value descomposition - SVD. This method combines the SVD with the revealing range QR factorization and allows to obtain a subset of bands that retain the data meaning without a transformation [3]. The method used the strongly restricted projection of a matrix A (see Eq. 3).

$$\begin{aligned} A=P \left[ \begin{matrix}I_p \\ 0\end{matrix}\right] \end{aligned}$$
(3)

where A is a n x p matrix with \(p<n\) and \(A^TA=I_P\), and P is a permutation matrix. To compute the permutation matrix, first it is calculate the covariance \(\varSigma _{data}\) for the hyperspectral data. Then, the QR factorization with pivoting is used to compute the matrix \({V_1}^T\) where \(V_1\) is formed by the first p eigenvectors of \(\varSigma _{data}\). The pivot matrix P that results from this factorization is the permutation matrix for the Eq. 3. Finally, the first p elements of \(\overline{x}\) are the selected bands [3]. A seudo-code for this procedure is presented in the Algorithm 2.

figure b

3 Spectral Classification

Classification is a process during which each sample is labeled as a class [8], by applied decision rules, either in the multispectral or spatial domain. Classification process can be done through supervised or unsupervised approaches. Supervised classification uses a prior information to learn the decisions rules. Instead, unsupervised approaches seek for patterns in the data using some similarity criterion. In this paper, we used two supervised classification methods: support vector machines - SVM and neural networks - NN. Both methods are selected for their high performance documented in the literature with spectral data.

3.1 Support Vector Machines - SVM

SVMs are a useful technique for data classification. The objective of using SVM for classification is to find a optimal decision hyperplane to separate unknown data in two or several classes. A kernel can be used to solve the problem for non-linear separable data. Most used kernels for hyperspectral data are polynomial and radial basis function kernel [9].

3.2 Neural Networks - NN

Neural networks are a learning paradigm based on the human brain. These networks are composed of individual units that process information through highly interconnected individual nodes. NN models are useful algorithms for cognitive tasks, such as classification [8]. In this document, an NN classification was implemented with a network formed by a hidden layer of five neurons (nodes).

4 Experimental Procedure

4.1 Data Set

Animal models are widely used to analyze new drugs and treatments. For CL studies, golden hamsters are recommended due to the similarity of their skin structure with human skin [5, 13]. Diffuse reflectance spectral from healthy and CL ulcers were acquired using a spectrometer Ocean Optics HR4C3337. The acquired spectra were calibrated using white and black diffuse reflectance standards. A total of 39 golden hamsters, distributed in 18 females and 21 males, were used. Hamsters are subject to several conditions of infection and treatment. For this paper, we used only spectral signatures acquired before treatment. From the 39 golden hamsters, 27 were infected with Leishmaniasis Braziliensis (LB), while 4 were hamsters infected with Leishmaniasis Panamensis (LP), and 8 hamsters were in the control group (i.e. without CL).

Fig. 1.
figure 1

Average spectral signatures of healthy skin, ulcer border and ulcer center from golden hamsters infected infected with Leishmaniasis

Spectral signatures of each hamster’s skin are obtained each fifteen days. The first measure is taken before the inoculation of CL, then two more measures are taken during the development of the ulcers. In each date, up to 12 spectra are measured for each area: healthy skin, border and ulcer center. This data collection allows an exhaustive analysis of the evolution of the disease, from the inoculation process followed by the analysis of ulcer development. This protocol had the approval from the Universidad de Antioquia animal ethics committee.

Figure 1 presents the average signatures from healthy skin, ulcer border and ulcer center between 400 nm and 800 nm. After 750 nm, the signature noise increased. We can also see that the spectral response from the ulcer center is lower that from healthy skin; but, the spectral signature from the ulcer border is very similar to healthy skin.

4.2 Experiments

For the evaluation of both BSS algorithms, we used spectral signatures of healthy skin, border and ulcer center captured from Golden hamsters. First, a mean filter with a sliding window of 3 points is applied to each of the captured signatures, in order to reduce noise. Since the bands from 750 nm present higher noise than lower bands, we defined two experiments to analyze the spectral signatures. The first experiment applied the BSS algorithms to spectral signatures between 480 nm to 750 nm, eliminating upper bands for reducing the noise. The second experiment takes all bands between 750 nm to 800 nm. For both BSS algorithms, we select 10 bands. This number is chosen since the selected bands will be used in the development a portable system, and commercial filter wheels for 10 filters are very common. Both experiments applied the two BSS methods: SVD and Similarity-Based band selection. Bands subsets are converted into its respective commercial filter, to evaluate a real configuration for a multispectral system.

The evaluation of the selected bands is performed using supervised classification. SVM and NN are used to evaluated the capability of the selected bands to improve the discrimination of healthy skin, border and ulcer center. The parameters of both classifiers are optimized to obtain the highest overall accuracy. For SVM, a radial basis function kernel is used. For NN, a configuration with a hidden layer of 5 neurons provided the best performance. For training, 30 samples are randomly selected for each class. Since border signatures are close to healthy skin signatures, as shown in Fig. 1, we first classify only healthy skin and ulcer center. Then, we performed the classification process using the three class. Each experiment is repeated 100 times to obtain the general classification accuracy.

5 Results

The selected bands from the BSS algorithms using the signatures between 480 nm to 750 nm are presented in Fig. 2. The spectral signature (blue signal) presented in Fig. 2 is the average of the all spectra used in the experiment. We can note that the selected bands by both algorithms are very close. Then, when we identify the corresponding commercial filters, many spectral bands become the same from both BSS approaches. Values of the commercial filters are presented in the table inside Fig. 2.

The selected bands from the BSS algorithms using the signatures between 480 nm to 800 nm are presented in Fig. 3. Values of the commercial filters also are presented in the table inside Fig. 3. Comparing these results with the first experiment, we note that two bands are selected between 750 nm to 800 nn for both algorithms. In these bands (785 nm and 800 nm) we can see a interesting behavior of healthy skin, border and ulcer center (see Fig. 1), that can be helpful for the discrimination process.

Fig. 2.
figure 2

Selected spectral bands for spectral signatures between 480 nm to 750 nm using SVD (\(*\)) and similarity-based (\(+\)) Band Subset Selection. Table shows the equivalent commercial filters. (Color figure online)

Fig. 3.
figure 3

Selected spectral bands for spectral signatures between 480 nm to 800 nm using SVD (\(*\)) and similarity-based (\(+\)) Band Subset Selection. Table shows the equivalent commercial filters.

Table 1. Overall classification accuracy for two-class problem: healthy skin and ulcer center

Once the band subsets are experimentally obtained, these are classified using SVM and NN. First, a two-class classification is performed, using only healthy skin and ulcer center signatures. A classification baseline is obtained by using all spectral bands (nearly 2000). For the two-class problem, we obtain an average accuracy of 44.66% (±30.43%) using SVM and 58.06% (±13.96%) by NN using all bands. Table 1 shows the overall classification accuracy for the two-class problem using the spectral band subsets. Using the selected bands from 480 nm to 750 nm, the best classification is obtained from the subset selected by similarity-based approach and using SVM classifier. This configuration obtained a overall accuracy of 95.89\(\%\). However, the result obtained using the band subset selected by the SVD approach is very similar (95.74\(\%\)). The NN classifier obtained lower overall accuracies for both subset (similarity and SVD). Using the selected bands from 480 nm to 800 nm, the overall accuracies are very close to the first experiment. Also, best performance was obtained using SVM than NN.

For three-class problem, the baseline accuracy was so low as 26.63% (±20.84%) using SVM and 74.06% (±18.89%) using NN with all the spectral bands. Table 2 shows the overall classification accuracy for the three-class problem using the spectral band subsets. We can note that for the three-class problem, the overall classification accuracy decrease for all configuration in comparison with two-class results. The best performance in this case is obtained using the spectral signatures from 480 nm to 800 nm with the band subset selected by SVD approach and using NN (82.60\(\%\)). Then, the two bands selected between 750 nm to 800 nm are relevants for the discrimiantion between border and healthy skin. This can also be noted in Fig. 1.

Table 2. Overall classification accuracy for three-class problem: healthy skin, ulcer center and ulcer border

Finally, Table 3 shows the confusion matrix for the best result from the three-class problem (band subset selected by SVD and NN classifier). This confusion matrix allows determining that the ulcer border zone is the most sensitive to classification and tends to have a variability such that, depending on the location, it may have a reflectance like areas of healthy skin or ulcer center.

Table 3. Confusion matrix for the best result using the three classes: band subset selected SVD-based algorithm from bands between 480 nm to 800 nm and NN classifier

6 Conclusions

In this article, we presented the evaluation of two band-selection algorithms: the first based on similarity measures and the second based on SVD. These algorithms were applied to spectral data captured from cutaneous ulcers caused by leishmaniasis on golden hamsters. The results shows that both algorithms allows to obtain an appropriate dimensional reduction of spectral signatures without losing key information for their subsequent classification. From the spectral range analyzed, best results are obtained using 480 nm to 800 nm for the discrimination of healthy skin, border and ulcer center. Ulcer border area is highly sensitive and represents a challenge for the classification, as this area tends to be confused with ulcer center and healthy skin.

Since, the band subset selected allows a suitable discrimination of healthy skin and cutaneous ulcers caused by leishmaniasis, this can be used to develop an portable multispectal imaging system, that support the diagnosis and follow-up of treatment of CL. As future work, the selected bands can be evaluated using images and combining spectral-spatial methods, helping to improve the overall classification accuracies.