Abstract
In this paper we propose a novel deep spatial transformer convolutional neural network (Spatial Net) framework for the detection of salient and abnormal areas in images. The proposed method is general and has three main parts: (1) context information in the image is captured by using convolutional neural networks (CNN) to automatically learn high-level features; (2) to better adapt the CNN model to the saliency task, we redesign the feature sub-network structure to output a 6-dimensional transformation matrix for affine transformation based on the spatial transformer network. Several local features are extracted, which can effectively capture edge pixels in the salient area, meanwhile embedded into the above model to reduce the impact of highlighting background regions; (3) finally, areas of interest are detected by means of the linear combination of global and local feature information. Experimental results demonstrate that Spatial Nets obtain superior detection performance over state-of-the-art algorithms on two popular datasets, requiring less memory and computation to achieve high performance.
Similar content being viewed by others
References
Zhang X, Li Z, Zhou T et al (2012) Neural activities in V1 create a bottom-up saliency map. Neuron 73(1):183–192
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, pp 545–552. https://dl.acm.org/citation.cfm?id=2976525
Cheng M, Mitra NJ, Huang X et al (2015) Global contrast based salient region detection. IEEE Trans Pattern Anal Mach Intell 37(3):569–582
Rahtu E, Kannala J, Salo M et al (2010) Segmenting salient objects from images and videos. In: ECCV2010, pp 366–379
Vig E, Dorr M, Cox D (2014) Large-scale optimization of hierarchical features for saliency prediction in natural images. Comput Vis Pattern Recognit 2014:2798–2805
Zhou A, Yao A, Guo Y et al (2017) Incremental network quantization: towards lossless CNNs with low-precision weights. arXiv preprint arXiv:1702.03044
Li G, Yu Y (2016) Visual saliency detection based on multiscale deep CNN features. IEEE Trans Image Process 25(11):5012–5024
Imamoglu N, Zhang C, Shimoda W et al (2017) Saliency detection by forward and backward cues in deep-CNNs. arXiv preprint arXiv:1703.00152
He S, Lau RW, Liu W et al (2015) Supercnn: a superpixelwise convolutional neural network for salient object detection. Int J Comput Vis 115(3):330–344
Zhang J, Li B, Dai Y et al (2017) Integrated deep and shallow networks for salient object detection. arXiv preprint arXiv:1706.00530
Sarvadevabhatla RK, Surya S, Kruthiventi SS, Babu RV et al (2016) SwiDeN: convolutional neural networks for depiction invariant object recognition. In: ACM multimedia, vol 1, pp 187–191
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. Neural Inf Process Syst 2013(1):2553–2561
Li H, Chen J, Lu H et al (2017) CNN for saliency detection with low-level feature integration. Neurocomputing 226(22):212–220
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Li G, Xie Y, Lin L et al (2017) Instance-level salient object segmentation. arXiv preprint arXiv:1704.03604
Xie G-S, Zhang X-Y, Yang W et al (2017) LG-CNN: from local parts to global discrimination for fine-grained recognition. Pattern Recognit 71(1):118–131
Scherer D, Muller A, Behnke S et al (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: International conference on artificial neural networks 2010 (ICANN2010), pp 92–101
Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Neural Inf Process Syst 2015(1):2017–2025
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Jiang M, Huang S, Duan J et al (2015) Salicon: saliency in context. In: CVPR2015, pp. 1072–1080
Xiao J, Hays J, Ehinger KA et al (2010) Sun database: large-scale scene recognition from abbey to zoo. In: CVPR2010, pp 3485–3492
Bylinskii Z, Judd T, Borji A et al (2017) Mit saliency benchmark. http://saliency.mit.edu/. Accessed 1 Dec 2017
Zhang L, Tong MH, Marks TK et al (2008) SUN: a Bayesian framework for saliency using natural statistics. J Vis 8(7):32.1–20
Zhang Y, Yu F, Song S et al (2017) Large-scale scene understanding challenge: leaderboard. http://lsun.cs.princeton.edu/leaderboard/. Accessed 6 June 2017
Riche N, Mancas M, Duvinage M et al (2013) RARE2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Sig Process Image Commun 28(6):642–658
Xia C, Qi F, Shi G (2016) Bottom-up visual saliency estimation with deep autoencoder-based sparse reconstruction. IEEE Trans Neural Netw Learn Syst 27(6):1227–1240
Zhang J, Sclaroff S (2013) Saliency detection: a boolean map approach. In: ICCV2013, pp 153–160
Acknowledgements
The authors acknowledge the National Natural Science Foundation of Shaanxi Province (Grant No. 2016JM6023).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, X., Gao, T. & Gao, D. A new deep spatial transformer convolutional neural network for image saliency detection. Des Autom Embed Syst 22, 243–256 (2018). https://doi.org/10.1007/s10617-018-9209-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-018-9209-0