Keywords

1 Introduction

Recent advances in remote sensing technology has enabled the sensors to acquire hyperspectral images (HSI) in hundreds of continuous narrow spectral channels captured in the wide range of electromagnet spectrum ranging from visible to infrared. Each pixel in HSI is a representation of the spectral characteristics of the spatial location in the scene [29] and composed of a vector of spectral entries form available spectral channels. Rich spectral and spatial information leads to extensive applications such as land cover mapping, target detection, classification, mineral detection, surveillance and so on. However rich information and extensive application comes with lot of challenge connected to high dimensionality and limited available samples [2] together with Hughes phenomena [13] and heterogeneity [4].

Large number of spectral channels but limited number of training samples leads to curse of dimensionality [13]. Therefor exploiting spatial/structural features along with the spectral features and design of a classifiers plays a crucial role in HSI classification. Many techniques has been proposed so far to deal with the HSI classification. Traditional pixel-wise classifiers deal each pixel autonomously without taking spatial information into account. K-nearest neighbour classifier (K-NN), conditional random fields [32], neural networks [24], support vector machine [9, 18] have been investigated. Out of these pixel-wise classifiers, SVM performed better due to its ability to handle high dimensional data. Majority of the above mentioned classifiers suffers from curse of dimensionality and limited training data [1]. Moreover spatial information is not taken into account as there is strong association between adjacent pixels [29] but pixel-wise classifiers deals each pixels independently. Dimensionality reduction approaches were also proposed to handle the higher dimensionality and limited training samples. Principal component analysis (PCA) [25], independent component analysis (ICA) [28] are some of the well known approaches. These approaches reduces the features/dimensions from hundreds to just few bands hence results into loss of spectral information. Band/feature selection [23] is another technique of handling the above mentioned issues.

Integration of spatial information along with pure spectral information for improved performance in HSI classification has been getting more and more attention of the researchers in recent years [3]. It is widely established that the complement of spectral and spatial features can result into more effective classification [22]. It is therefore necessary to incorporate the spatial features into a spectral-spatial classifier. Spectral spatial techniques can broadly be divided into three categories in which the spatial information is incorporated along with the spectral features (a) before the classification (b) during the classification process (c) after the classification process. In the first category, many techniques extracts the spatial features and integrate it with spectral features before the classification process such as spatial feature extraction through morphological profiles [7, 8, 15] and through segmentation [22]. Similarly composite kernel methods concatenate spatial features with other spectral features [17, 33]. However, in most of the cases these features requires human knowledge and are mostly handcraft. In the second category of spectral-spatial classification, spatial features are incorporated into a classifier during the classification process such as statistical learning theory (SLT) [4], simultaneous subspace pursuit (SSP) [5]. In the third category of spectral-spatial classification, spatial features are incorporated after the classification process. Authors in [26] first utilized SVM for pixel-wise spectral classification and watershed segmentation for spatial feature extraction followed by majority voting within the result of pixel-wise classification and watershed segmentation. Authors in [16] utilized augmented Lagrangian multilevel logistic with a multilevel logistic (MLL) prior (LORSAL-MLL). Similarly authors in [21] integrates the results from segmentation and SAE based classification through majority voting.

Fig. 1.
figure 1

Spectral-Spatial Classification stages and Framework.

Recently, a latest development in neural network, deep learning has proved its efficiency and efficacy in many fields particularly in computer vision such as image classification [11], speech recognition [30], language processing [19]. Deep learning based architectures has also performed well in HSI classification [31]. However incorporating spatial features into a deep network is still a persistent issue.

In this paper, spectral-spatial HSI classification based on deep learning based deep belief network (DBN) and hyper-segmentation based spatial feature extraction is proposed. Spectral feature extraction is exploited through deep learning based DBN architecture [10] and logistic regression (LR) is applied as a pixel-wise classifier while spatial features are extracted through structural boundary adjustment based hyper-segmentation [20] which adaptively segments the HSI image. Proposed approach is based on the third category of spectral-spatial HSI classification where spectral and spatial information is effectively incorporated after the classification. Decision to label the target pixel for a specific class is simultaneously based on the DBN based pixel-wise classification and additional spatial features obtained from effective segmentation. Accurate segmentation approach to exploit the spatial features makes this approach more effective.

2 Proposed Methodology

It is strongly believed in HSI research community that incorporating spatial contextual features can significantly improve the classification performance [34]. Proposed method first exploits multi-layer DBN for effective deep and abstract feature extraction and ML is utilized for subsequent pixel-wise classification. For contextual spatial features, adaptive boundary adjustment based hyper-segmentation [20] is employed. In the third phase, majority voting [14] base process is utilized to fully exploit and integrate the spectral and spatial features for final spectral-spatial classification. Detailed description of each phase is depicted in Fig. 1.

Fig. 2.
figure 2

Framework of the DBN based pixel-wise classification.

2.1 Spectral Feature Extraction via DBN

Deep belief network is composed of neural network based Restricted Boltzmann Machine (RBM) learning module that consists of input data layer or visible layer x and a hidden layer y that learn to distinguish features with higher correlations in the input data as shown in Fig. 1. The energy function can be described as:

$$\begin{aligned} E(x,y,\theta )= -\sum _{j=1}^m \frac{(x_j -b_j)^2}{2\sigma ^2} - \sum _{i=1}^n a_i y_i -\sum _{j=1}^m\sum _{i=1}^n w_{ij} \frac{x_j}{\sigma _i}y_i \end{aligned}$$
(1)

The conditional distributions are given by:

$$\begin{aligned} P(y_j|x;\theta ) = h \left( \sum _{i=1}^m w_{ji}x_j + a_i\right) \end{aligned}$$
(2)
$$\begin{aligned} P(x_j|y;\theta ) = V \left( \sum _{j=1}^m w_{ji}y_j\sigma _j^2 + b_j\right) \end{aligned}$$
(3)

where \(\sigma \) is the standard deviation (SD)of a Gaussian visible unit, and V(.) is the Gaussian distribution. A deep belief network is mainly comprised of restricted Boltzmann machine stacks, and the learning of RBM plays an essential role in DBN. The block diagram of image classification using DBN, in general, is shown in Fig. 2. At learning stage, training dataset is processed in order to get the spatial and spectral information from hyperspectral images. After that, the parameters of DBN model are adjusted by learning which includes back propagation for fine tuning. In the classification stage, the learned network is used to classify the test sample set and output the classification results. We have used DBN-LR in which DBN is use for feature extraction from spectral images and classification is made by logistic regression.

Fig. 3.
figure 3

Framework of the hyper-segmentation process.

Fig. 4.
figure 4

Hyperspectral image datasets.

2.2 Spatial Feature Extraction via Hyper-segmentation

Two spatial constraints must be incorporated while spatial feature extraction, (1) There is a high probability that pixels with the same spectral signatures shares the same class label (2) There is a high probability that neighbouring pixels with the similar spectral signatures share the same class label. In order to full fill the above constraints an effective adaptive boundary adjustment based approach [20] is exploited to segment the HSI. The tri-factor based energy function is given by:

$$\begin{aligned} A(q,P_i)=\sqrt{ |x_q-g_i|^2+\lambda \tilde{n}_i(q) |Grad(q)| } \end{aligned}$$
(4)

where \(x_q\) is the spectral vector at the boundary pixel, \(g_i\) is the majority vector, \(\tilde{n}_i(q)\) is the straightness factor, |Grad(q)| is the local gradient at target pixel q. Detailed implementation of the algorithm for spatial segmentation can be viewed in [20] (Fig. 3).

Fig. 5.
figure 5

Classification Results of Houston and Pavia University datasets using proposed method.

2.3 Majority Voting

The individual classification results obtained from DBN-LR classifier and Segmentation based spatial classification are integrated through majority voting (MV) [14]. In MV each pixel in the segmentation region is assigned to a most repetitive class allocated by the DBN-LR classifier. Hence both spectral and spatial features are taken into account.

3 Experimental Results and Performance Comparison

To validate the performance proposed technique for HSI classification, experiments are conducted on well known and challenging datasets which are widely used by other well known HSI classification techniques to validate the results (Fig. 4).

3.1 Hyperspctral Dataset

Two popular datasets Pavia University and Houston University are utilized for performance evaluation due to their distinction and difficulty. Houston University dataset was acquired by AVIRIS sensor in 1992. It consists of 144 spectral channels with spatial dimension of \(349\times 1905\). Indian Pine is considered difficult for classification due to small spatial structures and presence of mixed pixels. It consists of 15 classes. Pavia University dataset was collected by ROSIS sensor over the Pavia University, Italy. It comprises of 103 spectral channels with spatial dimension of \(610\times 340\). This dataset includes both main made structures and natural plants. It consists of 9 classes. Mostly, in the literature two datasets are considered to demonstrate the validation and accuracy of proposed techniques for HSI classification. Classification performance is estimated using the evaluation criteria based on overall accuracy (OA), Average accuracy (AA) and kappa Coefficient (k). OA is the percentage of pixels correctly classified. AA is the mean of all the class specific accuracy over the total number for classes for the specific image. Kappa is a degree of agreement between predicted class accuracy and reality. Generally, it is considered more robust than OA and AA.

Table 1. Classification accuracy(%) of each class for the Houston University dataset obtained by the SVM [18], OMP [27], CNN [12] using \(10\%\) training samples
Table 2. Classification accuracy of each class for the Pavia University dataset obtained by the SVM [18], OMP [27], CNN [12] using \(10\%\) training samples

3.2 Spectral-Spatial DBN-HS Classification

We conducted experiments on windows 7 system, with 4.0 GHz processor and NVIDIA GeForce GTX 970. The code was implemented in Theano. Number of hidden layers also known as depths plays a significant role in the classification performance as it characterizes the quality of the learned features. For each dataset, we choose randomly 10% of each class as the training data as training data. For Pavia University and Houston University datasets we selected depth of size 2 and number of hidden units for each hidden layer is 50 as suggested by the experiments in [6]. The performance of the proposed DBN-HS technique is compared with well known existing techniques such as support vector machine (SVM) [18], orthogonal matching pursuit (OMP) [27], deep belief network with logistic regression(DBN-LR) [6] and newly developed deep CNN (CNN) [12]. In case of DBN-LR, only spectral data as an input was considered.

Individual class level accuracy results of Pavia university and Houston University dataset and their comparison with mentioned well known existing techniques is shown in Tables 1 and 2. Mixed pixel is the major challenge in Houston dataset due to its low spatial resolution and small spatial size. In Houston University dataset, proposed technique performed well in classes with small spatial regions as effective segmentation plays a very important role in segmenting those small spatial regions and making them available for effective classification. The complete HSI classification result of proposed method is shown in Fig. 5. Each color characterizes a specific type of ground cover area which is the same as aforementioned ground truth image. Results confirm that spectral-spatial classification using contextual feature extraction has significant effect on the classification accuracy because spatial features help prevent the salt and paper noise.

Overall, experimental results demonstrates the significant improvement in HSI classification by incorporating spatial information and spectral feature selection. The algorithm has performed significantly well on the low spatial resolution dataset.

4 Conclusion

In this paper a new hyperspectral image classification DBN-HS approach based on Deep Belief Network and hyper-segmentation is proposed by taking spectral and spatial information into account. DBN based logistic regression (DBN-LR) is used for extraction of deep spectral features and hyper-segmentation is utilized for exploiting the spatial features. In the final step, DBN-LR based spectral features and hyper-segmentation based spatial features are integrated through majority voting (MV) for the efficient spectral-spatial classification of HSI. Hyper-segmentation based segmentation defines an adaptive neighborhood for each pixel. Experimental results and comparisons with the well known existing methods demonstrates that the spectral-spatial classification, based on majority voting within the regions obtained by the hyper-segmentation algorithms, led to higher classification accuracy as compare to pixel-wise classification. Use of MV for the fusion of local spectral information through DBN-LR and spatial information through effective hyper-segmentation based segmentation has a significant effect on the accuracy of the final HSI classification.