1 Introduction

Camouflage target plays an important role in modern warfare. Therefore, various strategic and tactical targets rely on the camouflage or covert technology to avoid being discovered. How to quickly and accurately detect and identify camouflage targets has become an important research topic for the military target detection. To conceal target or reduce its detectability, camouflaged and stealth materials are widely used to reduce the differences of scattering and radiation intensity between target and background in optical bands. With the rapid development of spectral imaging technology, an hyperspectral image (HSI) [16] contains not only the two-dimensional geometric spatial information of the target, but also the one-dimensional spectral information of the target. Since, most of the camouflage targets are designed to hide mainly at certain bands, which makes it hard to realize all-wave stealth [11], hyperspectral imaging technology [13] shows great potential for the camouflage target detection by utilizing the abundant spectral information.

To give a better reconnaissance result of camouflage target, researchers investigated the hyperspectral camouflage target detection methods on early days [3, 5, 6], such as, HUA et al. [5] employed the constrained energy minimization (CEM) hyperspectral target detection method to extract the camouflage targets, YANG et al. [11] utilized the spectral angle distance and mathematical morphology to detect the camouflage target in hyperspectral image. The common strategies of these methods is to use the prior spectrum information of the camouflage target to complete the detection task, but it is not practical in real applications. The reason is that the prior spectral information of camouflage target is hardly available as it is often effected by some uncertain environmental factors, such as the absorption and scattering of the atmosphere, change of illumination, spectral response of sensor, etc. Additionally, the military secrets also interferes with the acquisition of spectra information of camouflage targets. In this case, it is appropriate to exploit the unsupervised hyperspectral camouflage target detection method.

With the ideas discussed above, this paper proposes a novel camouflage target detection method via hyperspectral image. Since there is no prior spectral information for the camouflage target, we first cluster the HSI into different background clusters according to their spectral features in order to describe the background accurately. Then, a spectral-based background dictionary is learned for each cluster through the principle component analysis (PCA) learning method. Based on the representation theory, the background can exhibit block-diagonal structure while camouflage target has sparse property when the HSI is represented onto those background sub-dictionaries. Following this investigation, the block-diagonal-based low-rank and sparse representation model has been built. When solving this model, the HSI can be decomposed into the background part and sparsity part. Since the camouflage target has a sparse property, we can extract it from the sparsity part.

The rest of this paper is organized as follows. In Sect. 2, we give a detailed description of the proposed model, the spectral-based background dictionary construction method and the optimization procedure for the proposed model. The experiments and results analyses are provided in Sect. 3, and Sect. 4 concludes the paper.

2 The Proposed Method

2.1 Background Block-Diagonal Structure for the Hyperspectral Image

It is important to describe the background accurately without prior spectral information of camouflage target. However, due to the cluttered imaging scene, an HSI often contains different categories of materials. And the corresponding background is inhomogeneous but multi-mode. Therefore, to guarantee the camouflage target detection accuracy, it is crucial to consider the multi-mode structure in background modeling. A promising way for the multi-mode structure capturing is to apply the clustering method. As it specializes in collecting similar pixels into a homogeneous cluster and dispersing different pixels into various clusters. In such way, the multi-mode structure is represented with different clusters. In this study, we propose to incorporate the clustering method and dictionary learning scheme to depict the multi-mode structure of background in the representation based detection framework. Through clustering the background, we obtain several homogeneous clusters and different clusters exhibit obvious discrepancy. When being represented on the concatenation of all dictionaries learned from each cluster, the representation matrix of the HSI exhibits obvious block-diagonal structure.

To clarify this point, we first decompose the input HSI \(\varvec{X}\) into a background part as well as camouflage target part, which can be formulated as:

(1)

where \(\varvec{B}_{bg}\) is the background part and \(\varvec{C}\) is the camouflage target part. As discussed above, the background can be represented by a reasonable dictionary while the camouflage target \(\varvec{C}\) not. Thus, we can represent \(\varvec{B}_{bg} = \varvec{DZ}\) and reformulated Eq. (1) as:

$$\begin{aligned} \begin{aligned} \varvec{X} = \varvec{DZ} + \varvec{C}, \end{aligned} \end{aligned}$$
(2)

where \(\varvec{D}=[\varvec{D}_1,\varvec{D}_2,\cdots ,\varvec{D}_k]\) contains k background sub-dictionaries which is learned from each cluster independently, \(\varvec{D}_i\) corresponds to i-th sub-dictionaries, \(\varvec{Z}\) is the background representation matrix of the HSI. Suppose the HSI \(\varvec{X}\) can be divided into k cluster, it is easy to permute the columns in \(\varvec{X}\) according to the cluster result as \(\varvec{X}=[\varvec{X}_1,\varvec{X}_2,\cdots ,\varvec{X}_k]\), \(\varvec{X}_i\) represent i-th cluster and each column in \(\varvec{X}_i\) denotes the spectra of a specific pixel. According to [8, 12] when \(\varvec{D}_i\) and \(\varvec{X}_i\) are exactly sampled from independent subspaces, Eq. (2) can reveal the subspace membership among the samples. Therefore, with clustering and permuting the original HSI, the background representation matrix \(\varvec{Z}\) will exhibit a block-diagonal structure which means that the background also has this structure characteristics.

2.2 Block-Diagonal Structure Based Low-Rank and Sparse Representation Model

Based on above discussion, the proposed Block-Diagonal Structure Based Low-Rank and Sparse Representation (BDSLRSR) is implemented by integrating the multi-mode structure background and sparse camouflage as follows:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{Z},\varvec{C}}{\min } \quad \mathrm {rank}(\varvec{Z}) + \lambda ||\varvec{C}||_{2,1}, \\&\mathbf {s.t} \quad \varvec{X} = \varvec{D}\varvec{Z} + \varvec{C}, \end{aligned} \end{aligned}$$
(3)

where rank\((\cdot )\) denotes the rank function, parameter \(\lambda > 0\) is used to balance the effects of the two parts, and \(||\cdot ||_{2,1}\) is the \(\ell _{2,1}\) norm defined as the sum of \(\ell _2\) norm of the column of a matrix. \(\varvec{X}=[\varvec{x}_{11},\dots ,\varvec{x}_{1n_1},\dots ,\varvec{x}_{k1},\dots ,\varvec{x}_{kn_k}]\in \mathbb {R}^{b \times n}\) is a sorted 2-D HSI matrix according to the cluster processing (suppose that there are k clusters for the HSI, \(\varvec{x}_{ij}(i=1,\dots ,k;j =1,\dots ,n)\) is j-th pixel of the i-th cluster, \(n_1+\dots +n_k = n\) is the total number of samples, b is the number of hyperspectral bands), \(\varvec{D}\varvec{Z}\) denotes the background part, \(\varvec{D}\) is the background dictionary learned by each cluster, \(\varvec{Z}\) denotes the background block-diagonal representation coefficients, and \(\varvec{C}\) denotes the remaining part corresponding to the camouflage target. The reason for the sparsity of camouflage target in Eq. (3) is that the dictionary \(\varvec{D}\) stands for background characteristics only and can not be utilized to represent camouflage target reasonably. Moreover, there are very low amounts of camouflage target in the data \(\varvec{X}\) compared with the background pixels, thus the camouflage target may have sparsity property rather than low-rank property [14]. Consequently, it is reasonable to add the sparse constraint into camouflage target as shown in Eq. (3).

After getting the sparsity matrix \(\varvec{C}\), the role of i-th pixel can be determined as follows:

$$\begin{aligned} \begin{aligned} r(\varvec{x}_i) = ||[\varvec{C}]_{:,i}||_2 =\sqrt{\sum _{j}([\varvec{C}]_{j,i})^2} \;_{<}^{>}\;\delta , \end{aligned} \end{aligned}$$
(4)

where \(||[\varvec{C}]_{:,i}||_2\) denotes the \(\ell _2\) norm of the i-th column of \(\varvec{C}\), \(\delta \) is the segmentation threshold, and if \(r(\varvec{x}_i) > \delta \), \(\varvec{x}_i\) is determined as the camouflage target; otherwise, \(\varvec{x}_i\) is labeled as the background.

The main advantages in our model are as follows:

  1. 1.

    We adopt the cluster method to describe background which can exploit background information and characteristics more accurately. This kind of detailed feature has not been considered in the former low-rank-based methods which regarded the background as a whole.

  2. 2.

    The block-diagonal structure is utilized to represent the multi-mode structure information of background based on the cluster result and it is more robust than the low-rank structure. Because, low-rank structure depends on the feature consistency of pixels, and a slight variation may cause the background to be full-rank. While the block-diagonal structure depends on the feature dissimilarity of pixels, which is more robust to feature variation.

  3. 3.

    We employ dictionary leaning method to obtain background dictionary which can extract background feature efficiently. The later section will give a detailed explanation.

2.3 Spectral Feature Based Dictionary Learning

Generally, the background dictionary has a great impact on the representation-based hyperspectral unsupervised target detection methods [7, 15]. To construct a robust background dictionary, we utilize the k-means [1] method to divide the hyperspectral data into k clusters and each cluster can represent one background material roughly. By this way, the multi-mode characteristics of background can be well exhibited through selecting a reasonable k (k should be larger than the true number of ground material clusters in order to make sure that the k cluster represent all the ground materials).

Through clustering background, the camouflage target will be assigned to one of cluster. Then, we adopt the PCA technique for dictionary learning. It has been shown that the significant components in PCA deliver the major information of the data. In a given cluster, the major information comes from the background pixels. Thus, we remove the less significant components after PCA to eliminate the negative effect of anomalies on the learned dictionary. Finally, we obtain the background dictionary \(\varvec{D}\) after using the PCA learning algorithm for each cluster.

The advantages of spectral feature based dictionary learning technique are as follows:

  1. 1.

    By using the cluster way to represent background, both the diversity and multi-mode structure information of background can be well described explicitly. Moreover, the low-rank property of background is enhanced, which is helpful to increase the separability of camouflage targets and background.

  2. 2.

    The PCA learning scheme enables us to learn clean background dictionary by neglecting those less significant principle component.

The entire flow of the proposed method can be shown in Fig. 1. It can be seen clearly that the proposed method mainly contains two modules, dictionary learning and block-diagonal structure based low-rank and sparse representation. Given an HSI, we first divide it into different clusters by utilizing the k-means method. Then, a robust background dictionary \(\varvec{D}\) can be learned through PCA method for each cluster. With the background dictionary \(\varvec{D}\) and re-ordered HSI \(\varvec{X}\) corresponding the dictionary \(\varvec{D}\), the block-diagonal structure based low-rank and sparse representation model in Eq. (3) can be built. Through solving this model, we can get the sparse matrix \(\varvec{C}\) containing the camouflage target. As a result, the targets are extracted from this sparse matrix by Eq. (4).

Fig. 1.
figure 1

Framework of the proposed method.

2.4 Optimization Procedure

This section will show the detailed procedure of how to solve the BDSLRSR model. The model in Eq. (3) is non-convex and NP-hard. An effective way to mitigate this problem is to relax Eq. (3) into the following convex problem:

$$\begin{aligned} \begin{aligned}&\underset{\varvec{Z},\varvec{C}}{\min } \quad ||\varvec{Z} ||_{*} + \lambda ||\varvec{C}||_{2,1}, \\&\mathbf {s.t} \quad \varvec{X} = \varvec{D}\varvec{Z} + \varvec{C}. \end{aligned} \end{aligned}$$
(5)

where the nuclear norm \(||\cdot ||_{*}\) is utilized to replace the original rank regularization. It has been shown that the solution of Eq. (3) is equal to that of Eq. (5) when some mild conditions hold [8].

In our study, we employ the standard alternative direction method of multipliers (ADMM) to solve the problem in Eq. (5). Specifically, we first reformulate Eq. (5) as follows

$$\begin{aligned} \begin{aligned}&\underset{\varvec{Z},\varvec{E},\varvec{J}}{\min } \quad ||\varvec{J} ||_{*} + \lambda ||\varvec{C}||_{2,1}, \\&\mathbf {s.t} \quad \varvec{X} = \varvec{D}\varvec{Z} + \varvec{C}, \varvec{Z} = \varvec{J}. \end{aligned} \end{aligned}$$
(6)

Then, we can obtain the following Lagrangian function:

$$\begin{aligned} \begin{aligned} L =&||\varvec{J} ||_{*} + \lambda ||\varvec{C} ||_{2,1} + \mathbf {tr}(\varvec{Y_1^T}(\varvec{X}-\varvec{D}\varvec{Z}-\varvec{C})) \\&+\mathbf {tr}(\varvec{Y}_2^T(\varvec{Z}-\varvec{J})) + \dfrac{\mu }{2}(||\varvec{X}-\varvec{D}\varvec{Z}-\varvec{C}||_{F}^{2}\\&+||\varvec{Z}-\varvec{J}||_F^2) \end{aligned} \end{aligned}$$
(7)

where \(\varvec{Y}_1\) and \(\varvec{Y}_2\) are Lagrange multipliers and \(\mu > 0\) is the penalty coefficient. Similar as [8], given the Lagrangian function the detailed steps for solving Eq. (7) can be summarized into Algorithm 1. The detailed derivation for each step can be found in [8].

figure a

3 Experiments and Discussion

3.1 Comparison Methods and Evaluation Index

To give a well and objective evaluation for the proposed method, we employ 4 state-of-the-art hyperspectral unsupervised target detection methods for comparison. They are RX (RX) [9], Sparse Representation-based (SR) Unsupervised Target Detector, Cluster-Based Detector (CBAD) [2] and Low-Rank and Sparse Representation (LRASR) based Detection Method [10]. The proposed method is named Block-Diagonal Structure Based Low-Rank and Sparse Representation (BDSLRSR) method. The results are evaluated using receiver operating characteristic (ROC) curves and the area under such curves (AUC) [4].

3.2 Hyperspectral Datasets

In this study, we employed our Hyperspectral Imager shown in Fig. 2(a) bought from Zolix companyFootnote 1 in Beijing to collect the hyperspectral datasets. The specific parameters of this imager are shown in Table 1. Three kinds of camouflage nets shown in Fig. 2(b)–(d) are utilized as the camouflage targets by placing them in the same color surroundings.

Table 1. Parameters of hyperspectral imager
Fig. 2.
figure 2

Data acquisition equipments. (a) Hyperspectral Imager, (b) Woodland camouflage net, (c) Desert camouflage net, (d) Digital camouflage net.

The first dataset was collected on the fourth floor of the School of Computer Science, Northwestern Polytechnical University. The camouflage target were woodland camouflage net (left) and desert camouflage net (right). There are \(174 \times 411\) pixels in the whole data with 160 spectral bands as shown in Fig. 3(a). The ground truth of the target is shown in Fig. 3(b). The parameters on this dataset are as follows: the number of clusters is 8 and the number of first principle components is 20.

The second dataset was collected at the BaiLu Tableland, Xian, Shaanxi Province. The camouflage targets in the data were woodland camouflage net (left) and digital camouflage net (right). The image size is \(174 \times 682\) pixels with 160 spectral bands as shown in Fig. 4(a). The ground truth of the target is shown in Fig. 4(b). The parameters on this dataset are as follows: the number of clusters is 8 and the number of first principle components is 20.

The third data was also collected at the BaiLu Tableland, Xian. By using a woodland camouflage net, the roadside car was covered as a camouflage target. Its size is \(87 \times 181\) pixels with 160 spectral bands as shown in Fig. 5(a). Figure 5(b) is the ground truth of the target. The parameters on this dataset are as follows: the number of clusters is 8 and the number of first principle components is 20.

Fig. 3.
figure 3

First dataset. (a) Pseudo-RGB of the scene, (b) Ground truth of the camouflage targets.

Fig. 4.
figure 4

Second dataset. (a) Pseudo-RGB of the scene, (b) Ground truth of the camouflage targets.

Fig. 5.
figure 5

Third dataset. (a) Pseudo-RGB of the scene, (b) Ground truth of the camouflage targets.

Fig. 6.
figure 6

Two-dimensional plots of the detection results obtained by different methods for the first dataset. (a) RX, (b) SR, (c) CBAD, (d) LRASR, (e) BDSLRSR. (Color figure online)

Fig. 7.
figure 7

Two-dimensional plots of the detection results obtained by different methods for the second dataset. (a) RX, (b) SR, (c) CBAD, (d) LRASR, (e) BDSLRSR. (Color figure online)

Fig. 8.
figure 8

Two-dimensional plots of the detection results obtained by different methods for the third dataset. (a) RX, (b) SR, (c) CBAD, (d) LRASR, (e) BDSLRSR. (Color figure online)

Fig. 9.
figure 9

Detection accuracy evaluation for the first dataset. (a) ROC curves. (b) AUC values.

3.3 Result Analysis

All the two-dimensional plots of detection results for these three datasets are shown in Figs. 6, 7 and 8. For giving a good visualization, we adopt the colormap image to show these results. Background pixels are represented by the blue pixels. The groundtruth location of the targets are shown in Figs. 3(b), 4(b) and 5(b). Compared with other methods, the proposed method has achieved better results both in background representation and background suppression. Although a small amount of man-made objects, such as buildings in Fig. 8, telegraph poles in Fig. 7, etc., can be easily judged as a target of interest when the target is detected using spectral information, the proposed method can effectively suppress such objects and reduce interference by considering the overall block-diagonal structure information of the background. Finally, the background and the camouflage targets are effectively separated by solving a block-diagonal structure based low-rank and sparse representation model. Further more, the ROC curves and AUC values in Figs. 9 and 10 show that the proposed method achieved better detection results.

Fig. 10.
figure 10

Detection accuracy evaluation for the second dataset. (a) ROC curves. (b) AUC values.

4 Conclusion

With the hyperspectral image, this paper describes a new camouflage target detection method. To represent the background more accurate, spectral-based cluster strategy is employed to exhibit multi-mode structure of the background. Then, the dictionary of each cluster is obtained by utilizing the PCA method and the whole background dictionary consists of the learned sub-dictionary of each cluster. Next, we cast the block-diagonal structure and background dictionary into a low-rank and sparse representation model. After solving this model, camouflage targets are extracted from the sparsity part. Compared with the traditional cluster-based and low-rank methods, this proposed method can achieve a better detection result, since it simultaneously consider the low-rank, multi-modal, and block diagonal structure properties of the background.