Keywords

1 Introduction

Synthetic Aperture Radar (SAR) has been popularized in recent years as a technique that captures high resolution microwave images of the earth surface. With the SAR technique, an image can be taken regardless of weather conditions or time of the day unlike optical sensors. The other major reason to use SAR is its operability over multiple frequency bands, viz, from the X band to P band. The penetrability of the L and the P bands allows SAR to capture data from even below ground level. In case of PolSAR it draws information of the target in four polarization states which makes it an information rich technique. These are some of the reasons that establish the superiority of SAR over optical data capturing techniques.

One of the very early approaches of classification of multifrequency PolSAR data was attempted using DNN (Dynamic Neural Network) [1]. Deep learning based multiplayer autoencoder network was also proposed [2]. It uses Kronecker product of eigenvalues of coherency matrix to combine multiple bands information. Recently, an Optimized Wishart Network (OWN) for classification of multifrequency PolSAR data was reported [3].

Superpixel algorithms in conjunction with deep neural networks have gained popularity for capturing spatial information of a PolSAR image. Hou et al. presented [4] a way of using Pauli decomposition of PolSAR image to generate superpixels and autoencoder to extract features from the coherency matrix of each PolSAR pixel. Prediction of the network was then used to run a KNN (k nearest neighbors) algorithm in each superpixel to determine the class of the complete superpixel. Guo et al. introduced a method to apply Fuzzy clustering algorithm over PolSAR images to generate superpixels [5]. This method considered only those pixels which are similar to their neighbors, while pixels which are in all probability badly conditioned were ignored. In another work [6] Cloude decomposition features were used to generate superpixels and CNN (Convolutional Neural Network) to perform the classification. Adaptive nonlocal approach for extracting spatial information was also proposed [7]. It uses stacked sparse autoencoder to extract robust features.

In this paper we propose a classification algorithm for multifrequency PolSAR images. It is organized as follows: Sect. 2 discusses the proposed network architecture; Sect. 3 explains the experiments conducted on the Flevoland dataset; Sect. 4 concludes with our observations and the discussions based on the results.

2 Proposed Methodology

Phase information is very useful for PolSAR image classification. Since we are using a real valued neural network it may not be able to relate amplitude and phase information. It considers the real and imaginary components as separate features. This brings the need of decomposition features to the network instead of the raw features of coherency matrix. We have created 33 dimensional feature vector extracted from one frequency band of a PolSAR image as shown in Table 1. Hence combining information of all three bands we get 99 dimensional feature vector corresponding to each PolSAR pixel.

Freeman decomposition [8] has 3 features which describe the power of single-bounce(odd-bounce), double-bounce and volume scattering of the incident waves. The Huynen decomposition [9] aims to form a single scattering matrix to model the scattering mechanism of the surface. The Cloude decomposition [10] aims to model the surface scattering using the parameters such as Entropy, Anisotropy and other angles which describe the scattering mechanism. The Krogager decomposition [11] aims to factorize the scattering matrix as the combination of a sphere, a di-plane and a helix. The Yamaguchi decomposition [12] adds a Helix scattering component to the Freeman decomposition to model complicated man-made structures.

Table 1. List of extracted features

As shown in Fig. 1 our proposed network architecture contains three modules. The first module is a two layer autoencoder network. The purpose of this module is to reduce the dimensionality of the input vector by learning efficient representation of combined PolSAR frequency bands information.

Let \(\mathbf {X_i}\) be the input feature vector of \(i^{th}\) PolSAR pixel. Let \(\mathbf {W_{11}}, \mathbf {W_{12}}, \mathbf {b_{11}}\) and \(\mathbf {b_{12}}\) be weights and biases of two encoder layers. A hidden representation of input feature vector \(\mathbf {X_i}\) can be calculated as \(\mathbf {h_i} = f(\mathbf {W_{12}}^Tf(\mathbf {W_{11}}^T\mathbf {X_i} + \mathbf {b_{11}})+\mathbf {b_{12}})\), where f is a tanh activation function. Let \(\mathbf {W_{21}}, \mathbf {W_{22}}, \mathbf {b_{21}}\) and \(\mathbf {b_{22}}\) be weights and biases of two decoder layers. A reconstructed input vector can be calculated as \(\mathbf {X_i}^{'} = f(\mathbf {W_{22}}^Tf(\mathbf {W_{21}}^T\mathbf {h_i} + \mathbf {b_{21}})+\mathbf {b_{22}})\). We have used mean square error to train the autoencoder. The cost function of first module of the proposed network is given as follows:

$$\begin{aligned} \begin{aligned} J_1&= \beta (\mathbf {\Vert W_{11} \Vert ^2_2 + \Vert W_{12} \Vert ^2_2 + \Vert W_{21} \Vert ^2_2 + \Vert W_{22} \Vert ^2_2}) \\&+\,\alpha \frac{1}{N} \sum _{i=1}^{N} \Vert \mathbf {X_i - X_i}^{'}\Vert ^2 +\gamma \sum _{t=1}^{U_1} KL(\rho \Vert \rho _t). \end{aligned} \end{aligned}$$
(1)

Here \(\beta \) is a regularization parameter, \(\alpha \) is the learning rate, \(\gamma \) is the sparsity parameter, N is total number of training samples, \(U_1\) is a size of reduced dimensional feature vector \(\mathbf {h_i}, \rho \) is a sparsity parameter, \(\rho _t\) is the average activation value of the \(t^{th}\) hidden unit and \(KL(\rho \Vert \rho _t)\) is a Kullback-Leibler divergence which encourages sparsity in the hidden representation \(\mathbf {h_i}\). For our application we have set the value of \(U_1=5\). Once the training of the first module is complete we disconnect the decoder layers.

Next, we feed the entire PolSAR image as an input to this network to obtain hidden representations of all pixels of the PolSAR image. We will use this hidden representation of all pixels of the PolSAR image to generate superpixels. This process has two advantages: (i) it contains feature information of all bands and (ii) its dimensionality is substantially reduced in comparison to the input feature vector. To generate superpixels we have used an algorithm similar to simple linear iterative clustering (SLIC) [14]. Instead of giving an RGB image as input, we will give the hidden representation of PolSAR image obtained from using first module of the proposed network as input. Using SLIC we measure the distance between any two pixels which is given by Eq. 2:

Fig. 1.
figure 1

Proposed network architecture.

$$\begin{aligned} D = \frac{m}{s}D_s + D_h, \end{aligned}$$
(2)

where \(D_s = \sqrt{(x_i-x_j)^2+(y_i - y_j)^2}\) and \(D_h = ||\mathbf {h_i}-\mathbf {h_j} ||_2\). Here m is a parameter controlling the relative weight between \(D_s\) and \(D_h\) and s is a size of search space [14]. \((x_i, y_i)\) and \((x_j, y_j)\) are the positions of \(i^{th}\) and \(j^{th}\) PolSAR pixels on Euclidean plane. This distance measure was finally used for superpixel generation.

The second module of our proposed architecture combines each pixel and corresponding superpixel information to construct robust feature vector. This is done by letting \(\mathbf {S_j}\) be the \(j^{th}\) superpixel and \(\mathbf {h_i} \in \mathbf {S_j}\). Let \(\mathbf {c_j}\) be the cluster center of \(\mathbf {S_j}\). To extract robust feature using both pixel and superpixel information input for second autoencoder \(\mathbf {H_i}\) can be constructed as \(\mathbf {H_i} = [\mathbf {h_i}; \mathbf {c_j}]\) [7]. Since the dimensionality of the hidden representation \(h_i\) is low, a single layer autoencoder is sufficient for an effective reconstruction. Let \(\mathbf {r_i} = f(\mathbf {W_{13}}^T\mathbf {H_i} + \mathbf {b_{13}})\) be the activation value obtained at hidden layer of second autoencoder. Let \(\mathbf {H_i^{'}} = f(\mathbf {W_{23}}^T\mathbf {r_i} + \mathbf {b_{23}})\) be the reconstructed input. The cost function for the second autoencoder can now readily be described by:

$$\begin{aligned} \begin{aligned} J_2 = \beta (\Vert \mathbf {W_{13}} \Vert ^2_2 + \Vert \mathbf {W_{23}} \Vert ^2_2) + \lambda \frac{1}{N} \sum _{i=1}^{N} \Vert \mathbf {H_i} - \mathbf {H_i}^{'}\Vert ^2&+ \alpha \sum _{t=1}^{U_2} KL(\rho \Vert \rho _t). \end{aligned} \end{aligned}$$
(3)

Here \(U_2\) is the size of feature vector \(\mathbf {r_i}\). Once the training of the second module of the autoencoder is complete we again disconnect the decoder layer. Output of the second module of autoencoder contains both pixel and superpixel information. Finally we use softmax classifier to obtain predicted probability distribution.

3 Experiments

With this theoretical model we conduct the following experiments. We start with the details of dataset chosen for our experiments. After that we analyze the performance of the proposed network for different band combinations. Finally we will perform the analysis of performance for different feature decompositions. The complete experiment on the proposed network architecture was implemented using Python 3.6 and it was executed on a 1.60 GHz machine with 8 GB RAM for all the experiments.

Experiments have been conducted on a dataset of Flevoland [15], an agricultural tract in the Netherlands, whose data was captured by the NASA/Jet Propulsion Laboratory on 15 June, 1991. This dataset has often been viewed as the benchmark dataset for PolSAR applications. The intensities after Pauli decomposition of the dataset have been used to form an RGB image as shown in Fig. 2(a–c). The ground truth of the data set shown in Fig. 2(d) identifies a total of 15 classes of land cover.

Fig. 2.
figure 2

Pauli decomposition of (a) L band, (b) P band and (c) C band Flevoland dataset [15]. (d) Ground truth map of Flevoland dataset

Fig. 3.
figure 3

(a) Ground truth map, Classification map of Flevoland dataset obtained using (b) L band, (c) P band, (d) C band, (e) P and C band, (f) L and C band, (g) L and P band and (h) L, P and C band.

Table 2. Class-wise accuracies for different band combinations

We have evaluated the accuracy of each class with respect to all possible combinations of the data acquired in the three frequency bands. Please note that all the overall accuracy mentioned in the paper is the precision. Figure 3 shows classification maps obtained by the proposed method and Table 2 shows classwise accuracies for all possible band combinations. From Table 2 it can be observed that while the network with just the C band information recognizes Onions and Lucerne with high accuracy, it fails in the case of classes such as Beet or Oats. While in the case of the C band, the network fails to recognize Wheat accurately. It can be noted that the lack of information to recognize Wheat is compensated by the help of the L band. In the case of the P band, the network fails to accurately identify Onions and Peas but correctly identifies the majority of the Rapeseed and Fruit. It can also be observed that the L band performs better individually than the C or the P bands.

A total of six PolSAR decomposition methods have been applied to the PolSAR data and their features have been given as input to the proposed network. The significance of each decomposition technique is evident from the Table 3. It can be seen that the Krogager and Freeman decompositions fail to provide enough information to recognize Beet and Onions. On the other hand Cloude decomposition and Huynen decomposition provide sufficient information for these crops respectively.

Table 3. Class-wise accuracies for different features

The proposed method for classification of multifrequency PolSAR image is also compared with other methods available in the literature. Comparison of overall accuracies is reported in Table 4. ANN [2] and OWN [3] have used small subset of Flevoland dataset containing 7 classes. On the other hand Stein-SRC [16] used ground truth with 14 classes. For parity in comparison with our results we have calculated the overall accuracy of the proposed method using ground truth of 7, 14 as well as 15 classes. We observe from Table 4 that the proposed method outperforms all the three methods [2, 3, 16].

Table 4. Classification’s overall accuracy comparison

4 Conclusion

In this paper a classification network for multifrequency PolSAR data is proposed. Proposed network involves three modules. First module reduces the dimensionality of the input feature vector. Output of the first module has been used to generate superpixels. The second module constructs robust feature vector using each pixel and its corresponding superpixel information. Finally the last module of the proposed network conducts classification using softmax classifier. It is observed that combining multiple frequency bands information improves overall classification accuracy. We validated our proposed network on Flevoland dataset resulted in 99.59% overall classification accuracy. Experimental result shows that this proposed network outperforms other reported methods available in the literature.