Multi-focus Image Fusion with PCA Filters of PCANet

Song, Xu; Wu, Xiao-Jun

doi:10.1007/978-3-030-20984-1_1

Xu Song¹⁶ &
Xiao-Jun Wu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11377))

Included in the following conference series:

IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction

691 Accesses
10 Citations

Abstract

As is well known to all, the training of deep learning model is time consuming and complex. Therefore, in this paper, a very simple deep learning model called PCANet is used to extract image features from multi-focus images. First, we train the two-stage PCANet using ImageNet to get PCA filters which will be used to extract image features. Using the feature maps of the first stage of PCANet, we generate activity level maps of source images by using nuclear norm. Then, the decision map is obtained through a series of post-processing operations on the activity level maps. Finally, the fused image is achieved by utilizing a weighted fusion rule. The experimental results demonstrate that the proposed method can achieve state-of-the-art fusion performance in terms of both objective assessment and visual quality.

You have full access to this open access chapter, Download conference paper PDF

Multi-focus image fusion with the all convolutional neural network

Article 06 January 2018

Multi-level receptive field feature reuse for multi-focus image fusion

Article 29 September 2022

A multi-focus image fusion method based on attention mechanism and supervised learning

Article 29 April 2021

Keywords

1 Introduction

Image fusion is an information fusion of images. It combines different images obtained by different sensors for the same target or scene, or different images obtained with the same sensor in different imaging modes or at different imaging times. The multi-focus image fusion is a branch of image fusion. The fused image can reflect the information of multiple original images to achieve a comprehensive description of the target and the scene, making it more suitable for visual perception or computer processing. Multi-focus image fusion has become a representative topic since many algorithms have been developed in many fields, such as remote sensing applications, medical imaging applications and surveillance applications [14]. Conventionally, the multi-focus image fusion algorithms can be divided into transform domain algorithms and spatial domain algorithms [15]. Since there are many new algorithms that have been proposed recently, we would like to divide the existing fusion algorithms into three categories: multi-scale transform methods, sparse representation (SR) and low-rank representation based fusion methods, and deep learning based fusion methods.

The multi-scale transform (MST) methods are the most commonly used methods, such as discrete wavelet transform (DWT) [9], contourlet transform (CT) [25], shift-invariant shearlet transform [24] and curvelet transform (CVT) [5] etc. The basic idea is to perform image transformation on the source images to get the coefficient representation. Then fuse the coefficients according to a certain fusion rule to obtain fused coefficients, and finally obtain the fused image through inverse transformation. All these methods share a “decomposition-fusion-reconstruction” framework. These methods are good representation of their structural information, but can only extract limited direction information and cannot accurately extract the complete contours [26].

In recent years, methods based on sparse representation and low rank representation also have significant performance in image fusion. Yin et al. [27] proposed a novel multi-focus image fusion approach. The key point of this approach is that a maximum weighted multi-norm fusion rule is used to reconstruct fused image from sparse coefficients and the joint dictionary. And the method based on saliency detection in sparse domain [16] also has a remarkable result. Yang el al. [26] combined robust sparse representation with adaptive PCNN is also an effective method. Liu et al. [20] combined multi-scale transform with sparse representation for image fusion which overcomes the inherent defects of both the MST- and SR-based fusion methods. Besides the above methods, Li et al. [10] proposed a novel multi-focus image fusion method based on dictionary learning and low-rank representation which gets a better performance in both global and local structure. Li et al. also achieved significant results from the perspective of noisy image fusion using the low-rank representation [12].

With the development of deep learning, deep features are used as saliency features to fuse images. Liu et al. [19] suggested a convolutional sparse representation (CSR)-based image fusion. The CSR model was introduced by Zeiler et al. [28] in their deconvolutional networks for feature learning. Thus, although CSR is different from deep learning methods, the features extracted by CSR are still deep features. Liu et al. [18] also applied CNN model to image fusion, which can be used to generate the activity level measurement and fusion rule. Li et al. [13] proposed an effective image fusion method using the fixed VGG-19 [23] to generate a single image which contains all the features from infrared and visible images. But we all know that the training of deep model is very time consuming and complicated. And the requirements for hardware conditions are very high.

In this paper, we propose a novel and effective multifocus fusion method based on PCA filters of PCANet [4] which is a very simple deep learning model. The main contribution of this paper is using PCANet to extract image features and using nuclear norm to construct an effective feature space for image fusion. In particular, the training time of PCANet is shorter than that of other CNN-based network, and the extracted features can play the same role. The experimental results demonstrate that the proposed method can obtain state-of-the-art fusion performance in terms of both objective assessment and visual quality.

The structure of the rest paper is organized as follows. In Sect. 2, we give a brief introduction to related work. In Sect. 3, the proposed multi-focus image fusion method is presented in detail. Section 4 presents the experimental results. Finally, Sect. 5 concludes the paper and puts forward the future work.

2 Related Work

In [4], PCANet is a very simple deep learning network which contains three parts: cascaded principal component analysis (two-stage), binary hashing and block-wise histograms (output layer). In this architecture, PCA is employed to learn multistage filter banks. Therefore, this network can be designed and learned extremely easy and efficient. In this paper, we just use the PCA filters to extract image features, binary hashing and block-wise histograms are not used. In two-stage of PCANet [4], the number of filters in the first stage L1 is set as 8, the number of filters in the second stage L2 is the same. Therefore, we can get 8 image features in the first stage and 64 image features in the second stage. Considering the time efficiency and the non redundancy of the data, we just utilize image features of the first stage to fuse images. Therefore, the explanation of the first stage of PCANet is introduced as follows.

For each input image of size of $m$ $\times $ $n$, we take $k_{1}$ $\times $ $k_{2}$ patches, and combine these patches (overlapping) together; i.e., $x_{i,1},x_{i,2},\cdots ,x_{i,mn}\in R^{k_1 k_2}$, where $x_{i,j}$ denotes the $j-th$ vectorized patch in image $I_i$. Subtracting patch mean from each patch, we obtain $X_i = [\bar{x}_{i,1},\bar{x}_{i,2},\cdots ,\bar{x}_{i,mn}]$, where $\bar{x}_{i,j}$ is a mean-centralized patch. Taking the same action for all input images $\{I_i\}_{i=1}^N$ (N is the number of the input images) and putting the results together, we get

$$\begin{aligned} X = [\bar{X}_1,\bar{X}_2,\cdots ,\bar{X}_N]\in R^{K_1 K_2 \times N_{mn}} \end{aligned}$$

(1)

Assuming that the number of filters in layer i is $L_i$, therefore, $L_1$, is the number of filters in layer 1. PCA minimizes the reconstruction error, i.e.,

$$\begin{aligned} \min _{V\in R^{K_1 K_2 \times L_1}}||X-VV^TX||_F^2, s.t. V^TV=I_{L_1} \end{aligned}$$

(2)

where $I_{L_1}$ is identity matrix of size $L_1\times L_1$ and V is a matrix composed of eigenvectors corresponding to the first $L_1$ eigenvalues of X. Therefore, PCA filters is expressed as

$$\begin{aligned} W_l^1 = map_{k_1,k_2}(q_l(XX^T)) \in R^{K_1 K_2},l=1,2,\cdots ,L_1 \end{aligned}$$

(3)

where $map_{k_1,k_2 }(v)$ is a function that maps v to a matrix $W\in R^{K_1 K_2}$, and $q_l (XX^T)$ is the $l-th$ principal eigenvector of $XX^T$. The leading principal eigenvectors capture the main variations of all the mean-centralized training patches. Therefore we can use PCA filters to extract image features.

3 The Proposed Fusion Method

3.1 Image Features

Considering the generalization performance of the filters, we train the two-stage PCANet using ImageNet [22] which contains 1000 categories to get PCA filters. We randomly select 5 images for these 1000 categories, therefore, 5000 images in total are used to train PCA filters and all of them are resized to $256\times 256$ and color images are converted to gray ones. Training PCA filters is implemented in Matlab R2016b on 3.60 GHz Inter(R) Core(TM) CPU with 64 GB RAM. We use the filters of the first stage to extract image features. For example, we use the fixed PCA filters to extract features from the source image, as shown in Fig. 1.

In Fig. 1, (a) is a near focused source image. (b)–(i) are image features extracted by PCA filters, such as texture features, vertical textures, horizontal textures, point textures, diagonal textures, etc. Obviously, the first few images extract features from a global perspective, and the subsequent features become more specific to local significance. Image fusion is the fusion of images with different contents of the same scene into an image. In other words, the saliency features of different images are extracted and combined into an image. In [3], to get better performance, the authors remove the first three principal components in the Eigenface method. Therefore, the feature extraction before fusion can reflect the significance of the source image as much as possible. Because of that, we do an experiment in which there are four cases including all the 8 image features or discarding the first image feature or discarding second one or both of them are abandoned.

Experimental results show that it is better to discard the first two feature maps. Therefore we just utilize the last six feature maps to make the activity level map. This experiment will be introduced in detail in Sect. 4.2.

3.2 Proposed Fusion Method

In this section we will introduce the proposed fusion method in detail. The framework of the proposed fusion method is shown in Fig. 2.

As shown in Fig. 2, the input images are denoted by $Source_1$ and $Source_2$ respectively the feature extraction of the $Source_1$ and $Source_2$ is carried out through the pre-trained PCA filters. We just take the last six image features to calculate the activity level map.

In the [11], authors apply the nuclear norm to the image features. In this paper, we use nuclear norm [17] which is the sum of the singular values of matrix to extract the image features to get the activity level maps. This procedure is shown in Fig. 3.

As shown in Fig. 3, the edge of the feature map is filled with zero, the six feature maps are concatenated as a 6-channel image feature, and the multi-channel image feature is processed by nuclear norm. Taking each pixel as the center, multi-channel block is taken from the same position, and the multi-channel block is transformed into a two-dimensional matrix. The sum of the singular values of the matrix is calculated, and the sum value is used to replace the original pixel point. Therefore, activity level maps are composed of the sum of singular values.

(4)

where $R(*)$ is reshape operation, (x, y) is the position of the pixel, c is channel number, k is the number of preregistered source image, $(2t+1)\times (2t+1)$ is the block size, $M_k (*)$ is the activity level map and $F_k^{1:c} (*)$ is c feature maps of the $k-th$ source image.

Next, the activity level map is processed to obtain the decision map. According to the reference [21], we process the activity level maps as shown in Fig. 4.

In Fig. 4, for each corresponding pair of blocks which take each pixel as the center $f_1$ and $f_2$, we calculate the sum of all the coefficients in each of them, denoted as $s_1$ and $s_2$. If $s_1>s_2$, the corresponding pixel is set as 1, otherwise, the pixel is set as 0. Finally, we can obtain two complementary focus score maps, denoted as $FSM_1$ and $FSM_2$. The steps are shown in Eqs. 5 and 6.

(5)

$$\begin{aligned} s_i=sum(f_i),i \in \{1,2\} \end{aligned}$$

(6)

where $sum(*)$ is the sum of all coefficients in $f_i$.

As focus score maps usually contain some small holes surrounded by the focused regions, we apply a simple post-processing approach to remove these regions. We apply a small region removal strategy [18] that the area threshold is universally set to ${0.1\times H \times W}$, where H and W are the height and width of each source image. Subsequently, we apply morphological closing and opening operation to the focus score maps. Finally, according to [21], we combine the two focus score maps into one decision map, that is, for the pixels where the two focus score maps are complementary, take the value of the first focus score map; otherwise, the value is 0.5. The final decision map denoted as $D_{final}$ is evaluated as shown in formula 7.

(7)

Finally, we obtain the $D_{final}$ shown in Fig. 5 (a).

3.3 Fusion Method

With the final decision map $D_{final}$, the fused image F is calculated by

$$\begin{aligned} F = D_{final}Source_1 + (1-D_{final})Source_2. \end{aligned}$$

(8)

The fused image is shown in Fig. 5(b).

The algorithm is described in Table 1.

Table 1. Algorithm flow chart

Full size table

4 Experiments

In this section, we introduce the source images and experimental environment. There is also a detailed description of Sect. 3.1 and a subjective and objective comparison between the proposed method and the existing methods.

4.1 Experimental Settings

As introduced in Sect. 3.1, our images sets are denoted as SET1 and SET2 coming from two references [29] and [10]. There are 15 pairs of source images in SET1 and 20 pairs in SET2. Part of them are shown in Figs. 6 and 7.

Secondly, we compare the proposed method with several existing fusion methods, including: convolutional sparse representation fusion method (CSR) [19], multi-focus image fusion with dense SIFT (DSIFT) [21], multi-focus image fusion with a deep convolutional neural network (CNN) [18], infrared and visible image fusion using a deep learning framework (VGG) [13], discrete cosine harmonic wavelet transform fusion method (DCHWT) [7] and cross bilateral filter fusion method (CBF) [8].

In order to evaluate our proposed method and existing methods from an objective perspective, we choose several image quality metrics. These are: Average Gradient (AG), entropy(EN), Mutual Information (MI) [1], FMI_gradient [6] and the sum of the correlations of differences (SCD) [2].

In our experiment, the sliding window size is $5\times 5$ in nuclear norm used for feature processing and the step is one pixel. The sliding window size is $3\times 3$ in the construction of focus score maps and the step is one pixel as well.

The fusion algorithm is implemented in Matlab R2016a on 3.00 GHz Inter(R) Core(TM) CPU with 4 GB RAM.

4.2 Feature Selection Experiment

In this section, we will introduce the experiment which is mentioned in Sect. 3.1. In this two images sets (SET1 and SET2), according to the number of image features, our method is divided into four cases: (1) all features are used, (2) discarding the first image feature, (3) discarding the second image feature and (4) discarding the first two image features. For all the cases, we compare each other and adopt multiple evaluation indexes as reference, and take the average value of experimental results of each image as the final, as shown in Tables 2 and 3.

Table 2. The AG, EN, MI and FMI_gradient average values of the compared methods and the proposed method for SET1.

Full size table

Table 3. The AG, EN, MI and SCD average values of the compared methods and the proposed method for SET2.

Full size table

In Tables 2 and 3, the best results are bloded. It can be seen from the two tables that the distribution of the best results is the same in the four cases of each data set, but the value of the fourth case is better than or equal to the first three cases. Therefore, only the last six image features are selected, and the first two are abandoned.

4.3 Image Fusion Results

We use fifteen pairs of source images (SET1) to test comparison methods and the proposed method. The fused results are shown in Fig. 8, we choose one pair of source images as an example. And the values of AG, EN, MI and FMI_gradient for fifteen fused images are shown in Tables 4 and 5.

Table 4. The AG, EN, MI and FMI_gradient values of the compared methods and the proposed method for 10 pairs source images from SET1.

Full size table

Table 5. The AG, EN, MI and FMI_gradient values of the compared methods and the proposed method for another 5 pairs source images from SET1.

Full size table

Table 6. The AG, EN, MI and SCD values of the compared methods and the proposed method for 10 pairs source images from SET2.

Full size table

Table 7. The AG, EN, MI and SCD values of the compared methods and the proposed method for another 10 pairs source images from SET2.

Full size table

As shown in Fig. 8, we can see, the proposed method has almost the same fusion performance compared with other classical and novel fusion methods in human visual system. Therefore we mainly discus the fusion performance with quality metrics, as shown in Tables 4 and 5.

In Tables 4 and 5, the best results are bloded, the second-best results are marked in red. We can see, in most cases, the proposed method has good indicators.

We also make the same comparison on SET2 which contains 20 pairs of source images. The fused results are shown in Fig. 9, we choose one pair of source images as an example as well. And the values of AG, EN, MI and SCD for twenty fused images are shown in Tables 6 and 7.

As shown in Fig. 9, we can see, from human visual perspective, there is almost no significant difference in the fusion results between these methods. Therefore we evaluate the fusion results objectively, as shown in Tables 6 and 7.

In Tables 6 and 7, the best results are bloded, the second-best results are marked in red. We can see, in most cases, the proposed method has good indicators as well.

5 Conclusion

In this paper, we propose a novel fusion method based on PCANet. First of all, we utilize the PCA filters to extract image features of source images, and then we apply the nuclear norm to process the image features in order to get activity level maps. Through a series of post-processing operations on activity level maps, the decision map is obtained. Finally, the fused image is obtained by utilizing a weighted fusion rule. The experimental results demonstrate that the proposed method can obtain state-of-the-art fusion performance in terms of both objective assessment and visual quality.

References

Mutual information. https://ww2.mathworks.cn/matlabcentral/fileexchange/28694-mutual-information
Aslantas, V., Bendes, E.: A new image quality metric for image fusion: the sum of the correlations of differences. AEU Int. J. Electron. Commun. 69(12), 1890–1896 (2015)
Article Google Scholar
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. Technical report, Yale University New Haven United States (1997)
Google Scholar
Chan, T.-H., Jia, K., Gao, S., Jiwen, L., Zeng, Z., Ma, Y.: PCANet: a simple deep learning baseline for image classification? IEEE Trans. Image Process. 24(12), 5017–5032 (2015)
Article MathSciNet Google Scholar
Guo, L., Dai, M., Zhu, M.: Multifocus color image fusion based on quaternion curvelet transform. Opt. Express 20(17), 18846–18860 (2012)
Article Google Scholar
Haghighat, M., Razian,M.A.: Fast-FMI: non-reference image fusion metric. In: 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–3. IEEE (2014)
Google Scholar
Shreyamsha Kumar, B.K.: Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform. Signal Image Video Process. 7(6), 1125–1143 (2013)
Article Google Scholar
Shreyamsha Kumar, B.K.: Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process. 9(5), 1193–1204 (2015)
Article Google Scholar
Li, H., Manjunath, B.S., Mitra, S.K.: Multisensor image fusion using the wavelet transform. Graph. Models Image Process. 57(3), 235–245 (1995)
Article Google Scholar
Li, H., Wu, X.-J.: Multi-focus Image fusion using dictionary learning and low-rank representation. In: Zhao, Y., Kong, X., Taubman, D. (eds.) ICIG 2017. LNCS, vol. 10666, pp. 675–686. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71607-7_59
Chapter Google Scholar
Li, H., Wu,X.-J.: Infrared and visible image fusion with ResNet and zero-phase component analysis. arXiv preprint arXiv:1806.07119 (2018)
Li,H., Wu, X.-J.: Multi-focus noisy image fusion using low-rank representation. arXiv preprint arXiv:1804.09325 (2018)
Li, H., Wu, X.-J., Kittler,J.: Infrared and visible image fusion using a deep learning framework. arXiv preprint arXiv:1804.06992 (2018)
Li, S., Kang, X., Fang, L., Jianwen, H., Yin, H.: Pixel-level image fusion: a survey of the state of the art. Inf. Fusion 33, 100–112 (2017)
Article Google Scholar
Li, S., Kang, X., Jianwen, H., Yang, B.: Image matting for fusion of multi-focus images in dynamic scenes. Inf. Fusion 14(2), 147–162 (2013)
Article Google Scholar
Liu, C.H., Qi, Y., Ding, W.R.: Infrared and visible image fusion method based on saliency detection in sparse domain. Infrared Phys. Technol. 83, 94–102 (2017)
Article Google Scholar
Liu, G., Lin, Z., Yu, Y.: Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 663–670 (2010)
Google Scholar
Liu, Y., Chen, X., Peng, H., Wang, Z.: Multi-focus image fusion with a deep convolutional neural network. Inf. Fusion 36, 191–207 (2017)
Article Google Scholar
Liu, Y., Chen, X., Ward, R.K., Wang, Z.J.: Image fusion with convolutional sparse representation. IEEE Signal Process. Lett. 23(12), 1882–1886 (2016)
Article Google Scholar
Liu, Y., Liu, S., Wang, Z.: A general framework for image fusion based on multi-scale transform and sparse representation. Inf. Fusion 24, 147–164 (2015)
Article Google Scholar
Liu, Y., Liu, S., Wang, Z.: Multi-focus image fusion with dense sift. Inf. Fusion 23, 139–155 (2015)
Article Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Wang, L., Li, B., Tian, L.-F.: EGGDD: an explicit dependency model for multi-modal medical image fusion in shift-invariant shearlet transform domain. Inf. Fusion 19, 29–37 (2014)
Article Google Scholar
Yang, S., Wang, M., Jiao, L., Wu, R., Wang, Z.: Image fusion based on a new contourlet packet. Inf. Fusion 11(2), 78–84 (2010)
Article Google Scholar
Yang, Y., Yang, M., Huang, S., Ding, M., Sun, J.: Robust sparse representation combined with adaptive PCNN for multifocus image fusion. IEEE Access 6, 20138–20151 (2018)
Article Google Scholar
Yin, H., Li, Y., Chai, Y., Liu, Z., Zhu, Z.: A novel sparse-representation-based multi-focus image fusion approach. Neurocomputing 216, 216–229 (2016)
Article Google Scholar
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks (2010)
Google Scholar
Zhang, Y., Bai, X., Wang, T.: Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure. Inf. fusion 35, 81–101 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Jiangsu Provincial Engineering Laborator of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, 214122, China
Xu Song & Xiao-Jun Wu

Authors

Xu Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ulm University, Ulm, Germany
Friedhelm Schwenker
University of Southern California, Playa Vista, CA, USA
Stefan Scherer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Wu, XJ. (2019). Multi-focus Image Fusion with PCA Filters of PCANet. In: Schwenker, F., Scherer, S. (eds) Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction. MPRSS 2018. Lecture Notes in Computer Science(), vol 11377. Springer, Cham. https://doi.org/10.1007/978-3-030-20984-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-20984-1_1
Published: 15 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20983-4
Online ISBN: 978-3-030-20984-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Multi-focus Image Fusion with PCA Filters of PCANet

Abstract

Similar content being viewed by others

Multi-focus image fusion with the all convolutional neural network

Multi-level receptive field feature reuse for multi-focus image fusion

A multi-focus image fusion method based on attention mechanism and supervised learning

Keywords

1 Introduction

2 Related Work