Multimedia Tools and Applications

, Volume 77, Issue 17, pp 22407–22431 | Cite as

Infrared and visible image fusion based on NSCT and stacked sparse autoencoders

  • Xiaoqing Luo
  • Xinyi Li
  • Pengfei Wang
  • Shuhan Qi
  • Jian Guan
  • Zhancheng ZhangEmail author


To integrate the infrared object into the fused image effectively, a novel infrared (IR) and visible (VI) image fusion method by using nonsubsampled contourlet transform (NSCT) and stacked sparse autoencoders (SSAE) is proposed. Firstly, the IR and VI images are decomposed into low-frequency subbands and high-frequency subbands by using NSCT. Secondly, SSAE is performed on the low frequency subband of IR image to calculate the object reliabilities (OR) of the low frequency subband coefficients. Subsequently, an adaptive multi-strategy fusion rule based on OR is designed for the fusion of low frequency subbands and a choose-max fusion rule with the absolute values of high frequency subband coefficients are employed for the fusion of high frequency subbands. Experimental results show the proposed method is superior to the conventional methods in highlighting the infrared objects as well as keeping the background information in VI image.


Image fusion Stacked sparse autoencoders Nonsubsampled contourlet transform Infrared images 



This work was supported by the National Natural Science Foundation of P. R. China under grant no.61772237, the Provincial research grant no. BK20151358, BK20151202, the Suzhou science and technology project under Grant SYG201702, the Fundamental Research Funds for the Central Universities JUSRP51618B and the Equipment Development and Ministry of Education union fund 6141A02033312.


  1. 1.
    Arthur L, Cunha D, Zhou J, Do MN (2006) The nonsubsampled contourlet transform: theory, design, and applications. IEEE Trans Image Process 15(10):3089–3101CrossRefGoogle Scholar
  2. 2.
    Cai J, Cheng Q, Peng M, Song Y (2017) Fusion of infrared and visible images based on nonsubsampled contourlet transform and sparse k-SVD dictionary learning. Infrared Phys Technol 82(5):85–95CrossRefGoogle Scholar
  3. 3.
    Chai X, Wang Q, Zhao Y, Li Y (2016) Unsupervised domain adaptation techniques based on auto-encoder for non-stationary EEG-based emotion recognition. Comput Biol Med 79:205–214CrossRefGoogle Scholar
  4. 4.
    Chen Y, Xiong J, Liu H, Fan Q (2014) Fusion method of infrared and visible images based on neighborhood characteristic and regionalization in NSCT domain. Optik-Int J Light Electron Opt 125(17):4980–4984CrossRefGoogle Scholar
  5. 5.
    Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) SCA-CNN: Spatial and Channel-Wise attention in convolutional networks for image captioning. IEEE International Conference on Computer Vision 2017:6298–6306Google Scholar
  6. 6.
    Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  7. 7.
    Cui G, Feng H, Xu Z, Chen Y (2015) Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Opt Commun 341:199–209CrossRefGoogle Scholar
  8. 8.
    Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Eckhorn R, Reitboeck HJ, Arndt M, Dicke P (1989) A neural network for feature linking via synchronous activity: Results from cat visual cortex and from simulations. Models of Brain Function. Cambridge University Press, pp 255–272Google Scholar
  10. 10.
    Fu Z, Dai X, Li Y, Wu H, Wang X (2014) An Improved visible and infrared image fusion based on local energy and fuzzy logic. In: Signal processing (ICSP), pp 861–865Google Scholar
  11. 11.
    Fu Z, Wang X, Xu J, Zhou N, Zhao Y (2016) Infrared and visible images fusion based on RPCA and NSCT. Infrared Phys Technol 77:114–123CrossRefGoogle Scholar
  12. 12.
    Gan W, Wu X, Wu W, Liu K (2015) Infrared and visible image fusion with the use of multi-scale edge-preserving decomposition and guided image filter. Infrared Phys Technol 72:37–51CrossRefGoogle Scholar
  13. 13.
    Gao C, Meng D, Yang Y, Wang Y, Zhou X (2013) Infrared Patch-Image model for small target detection in a single image. IEEE Trans Image Process 22 (12):4996–5009MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Geng X, Zhang H, Bian J, Chua T-S (2015) Learning image and user features for recommendation in social networks. In: IEEE International conference on computer vision, pp 4274-4282Google Scholar
  15. 15.
    Geng P, Sun X, Liu J (2017) Adopting quaternion wavelet transform to fuse Multi-Modal medical images. Multimed Tools Appl 37(2):230–239Google Scholar
  16. 16.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
  17. 17.
    Li H, Manjunath BS, Mitra SK (1995) Multisensor image fusion using the wavelet transform. Graph Model Image Process 57(3):235–245CrossRefGoogle Scholar
  18. 18.
    Li S, Kang X, Hu J (2013) Image fusion with guided filtering. IEEE Trans Image Process 22(7):2864CrossRefGoogle Scholar
  19. 19.
    Li H, Qiu H, Yu Z, Zhang Y (2016) Infrared and visible image fusion scheme based on NSCT and low-level visual features. Infrared Phys Technol 76:174–184CrossRefGoogle Scholar
  20. 20.
    Liang J, He Y, Liu D, Zeng X (2012) Image fusion using higher order singular value decomposition. IEEE Trans Image Process 21(5):2898–2909MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Lu B, Miao C (2010) Structure tensor based image fusion, Proceedings of the International Symposium on Electronic CommercGoogle Scholar
  22. 22.
    Ma Y, Zhai Y, Geng P, Yan P (2011) A novel algorithm of image fusion based on PCNN and shearlet. Int J Digit Content Technol Appl 5(12):347–354CrossRefGoogle Scholar
  23. 23.
    Pajares G, de la Cruz JM (2004) A wavelet-based image fusion tutorial. Pattern Recogn 37(9):1855–1872CrossRefGoogle Scholar
  24. 24.
    Qu X, Yan J, Xiao H, Zhu Z (2008) Image fusion algorithm based on spatial frequency-motivated pulse coupled neural networks in nonsubsampled contourlet transform domain. Acta Autom Sin 34(12):1508–1514CrossRefzbMATHGoogle Scholar
  25. 25.
    Ranzato M, Poultney CS, Chopra S, Lecun Y (2006) Efficient learning of sparse representations with an Energy-Based model. Adv Neural Inf Process Syst 19:1137–1144Google Scholar
  26. 26.
    Seal A, Bhattacharjee D, Nasipuri M (2016) Human face recognition using random forest based fusion of -trous wavelet transform coefficients from thermal and visible images. AEU - Int J Electron Commun 70(8):1041–1049CrossRefGoogle Scholar
  27. 27.
    Wang L, Li B, Tian L (2014) EGGDD: an explicit dependency model for multi-modal medical image fusion in shift-invariant shearlet transform domain. Informa Fusion 19:29–37CrossRefGoogle Scholar
  28. 28.
    Wang M, Chen Y, Wang X (2014) Recognition of Handwritten Characters in Chinese Legal Amounts by Stacked Autoencoders, 2014 22nd International Conference on Pattern Recognition, pp 3002–3007Google Scholar
  29. 29.
    Xiang T, Yan L, Gao R (2015) A fusion algorithm for infrared and visible images based on adaptive dual-channel unit-linking PCNN in NSCT domain. Infrared Phys Technol 69:53–61CrossRefGoogle Scholar
  30. 30.
    Xie L, Zhu L, Chen G (2016) Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimed Tools Appl 75(15):9185CrossRefGoogle Scholar
  31. 31.
    Xydeas CS, Petrovic V (2000) Objective image fusion performance measure. Electron Lett 36(4):308–309CrossRefGoogle Scholar
  32. 32.
    Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimed 18(12):2494–2502CrossRefGoogle Scholar
  33. 33.
    Yang L, Guo BL, Ni W (2008) Multimodality medical image fusion based on multiscale geometric analysis of contourlet transform. Neurocomputing 72(1):203–211CrossRefGoogle Scholar
  34. 34.
    Yang S, Wang M, Lu Y, Jiao L (2009) Fusion of multiparametric SAR images based on SW-nonsubsampled contourlet and PCNN. Signal Process 89(12):2596–2608CrossRefzbMATHGoogle Scholar
  35. 35.
    Yang B, Li S (2010) Multifocus image fusion and restoration with sparse representations. IEEE Trans Instrum Meas 59(4):884–892CrossRefGoogle Scholar
  36. 36.
    Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan Y (2012) A multimedia retrieval framework based on Semi-Supervised ranking and relevance feedback. IEEE Trans Pattern Anal Mach Intell 34(4):723–742CrossRefGoogle Scholar
  37. 37.
    Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimed 15(3):661–669CrossRefGoogle Scholar
  38. 38.
    Yue C, Liu L, Li H, Huang W (2015) A fusion algorithm for infrared and low light level images based on edge information and support value transform. Infrared Phys Technol 71:313–321CrossRefGoogle Scholar
  39. 39.
    Zhang Q, Maldague X (2016) An adaptive fusion approach for infrared and visible images based on NSCT and compressed sensing. Infrared Phys Technol 74:11–20CrossRefGoogle Scholar
  40. 40.
    Zhang X, Zhang H, Zhang Y, Yang Y, Wang M, Luan H, Li J, Chua T-S (2015) Deep fusion of multiple semantic cues for complex event recognition. IEEE Trans Image Process 25(3):1033–1046MathSciNetCrossRefGoogle Scholar
  41. 41.
    Zhang H, Shang X, Luan H, Chua T-S (2016) Learning from Collective Intelligence: Feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13(1):1–23CrossRefGoogle Scholar
  42. 42.
    Zhang X, Li X, Feng Y (2016) Image fusion based on simultaneous empirical wavelet transform. Multimed Tools Appl 76(6):8175–8193CrossRefGoogle Scholar
  43. 43.
    Zhang H, Kyaw Z, Chang S-F, Chua T-S (2017) Visual translation embedding network for visual relation detection. In: IEEE International conference on computer vision and pattern recognition, pp 3107–3115Google Scholar
  44. 44.
    Zhang H, Kyaw Z, Yu J, Chang SF (2017) PPR-FCN: Weakly supervised visual relation detection via parallel pairwise r-FCN. IEEE International Conference on Computer Vision 2017:4243–4251Google Scholar
  45. 45.
    Zhu L, Jin H, Zheng R, Feng X (2014) Effective naive Bayes nearest neighbor based image classification on GPU. J Supercomput 68(2):820CrossRefGoogle Scholar
  46. 46.
    Zhu L, Jin H, Zheng R, Feng X (2014) Weighting scheme for image retrieval based on bag-of-visual-words. IET Image Process 8(9):509–518CrossRefGoogle Scholar
  47. 47.
    Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), pp 3959-3965Google Scholar
  48. 48.
    Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern 47(11):3941–3954CrossRefGoogle Scholar
  49. 49.
    Zhu L, Xu Z, Yang Y, Hauptmann AG (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):1–13MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of IoT engineeringJiangnan UniversityWuxiChina
  2. 2.Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational IntelligenceJiangnan UniversityWuxiChina
  3. 3.Computer Application Research CenterHarbin Institute of TechnologyShenzhenChina
  4. 4.School of EIESuzhou University of Science and TechnologySuzhouChina

Personalised recommendations