Abstract
Background prior has been widely used in many salient object detection models with promising results. These methods assume that the image boundary is all background. Then, color feature based methods are used to extract the salient object. However, such assumption may be inaccurate when the salient object is partially cropped by the image boundary. Besides, using only color feature is also insufficient. We present a novel salient object detection model based on background selection and multi-features. Firstly, we present a simple but effective method to pick out more reliable background seeds. Secondly, we utilize multi-features enhanced graph-based manifold ranking to get the saliency maps. Finally, we also present the salient object segmentation via computed saliency map. Qualitative and quantitative evaluation results on three widely used data sets demonstrate significant appeal and advantages of our technique compared with many state-of-the art models.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Salient object detection aims to detect the most salient attention-grabbing object in a scene and the great value of it mainly lies in kinds of applications such as object detection and recognition [1, 2], image and video compression [3, 4], object content aware image retargeting [5, 6], to name a few. Therefore, numerous salient object detection models have been developed in recent years [7, 8]. All these models can be categorized as either bottom-up or top-down approaches. Bottom-up saliency models are based on some pre-assumed priors (e.g., contrast prior, central bias prior, background prior and so on). On the other side, top-down models usually use high-level information to guide the detection. We only focus on bottom-up models in this work.
For bottom-up salient object detection models, the priors play a critical role. The most widely used is the contrast prior and often measured with respect to local [9–11] or the global fashion [12–14]. Motivated by the early primate vision, Itti et al. [11] regard the visual attention as the local center-surround difference and present a pioneer saliency model based on multi-scales image features. Goferman et al. [9] take advantage of multi-clues including local low-level features, high-level features, and global considerations to segment out the salient objects along with their contexts. In [10], Jiang et al. utilize the shape information to find the regions of distinct color by computing the difference between the color histogram of a region and its adjacent regions. Due to the lack of higher-level information about the object, all these local contrast based models tend to produce higher saliency values near edges instead of uniformly highlighting the whole salient object.
On the other side, global contrast based methods take holistic rarity over the complete image into account. The model of Achanta et al. [12] works on a per-pixel basis through computing color dissimilarities to the mean image color and achieves globally consistent results. They also use Gaussian blur to decrease the influence of noise and high frequency patterns. Cheng et al. [13] define a regional contrast-based method by generating 3D histograms and using segmentation, which evaluates not only global contrast differences but also spatial coherence. Model [14] measures global contrast-based saliency based on spatially weighted feature dissimilarities. However, global contrast-based methods may highlight background regions as salient because they do not account for any spatial relationship inside the image.
The central-bias prior is based on a well-known fact that when humans take photos they often frame the interested objects near the center of the image. So Judd et al. [15] present a saliency model via computing the distance between each pixel and the coordinate center of the image. Their model presents a better prediction of the salient object than many previous saliency models. Later, both Goferman et al. [9] and Jiang et al. [10] enhance the intermediate saliency map with weight implemented via a 2D Gaussian fallof positioned at the center of the image. This prior usually improves the saliency performance of the most the natural images. However, the certral-bias prior is not always true when the photographer faces a big scene or the objects of interest cannot be located near the center of image.
Besides above two commonly used priors, several recent models also utilize the background prior, i.e., image boundary should be treated as background, to perform saliency detection. Wei et al. [16] propose a novel saliency measure called geodesic saliency, which use two priors about common backgrounds in natural images, namely boundary and connectivity priors, to help removing background clutters and in turn lead to better salient object detection. Later, Yang et al. [17] utilize this background prior and graph-based manifold ranking to detect the salient object and get promising results. However, they assume that all the four image sides are background. This is not always true when the image is cropped. Recently, Zhu et al. [18] propose a novel and reliable background measure, called boundary connectivity, and a principled optimization framework to integrate multiple low level cues. They do not treat all the image boundaries as background.However, their method is too complicated. Unlike all these aforementioned methods, our model not only adaptively treat the image boundaries as background or non-background, but also is very easy to be implemented. Figure 1 gives an overview of our framework.
The contributions of this paper are three-fold:
-
We adaptively treat the four image boundaries as background. A simple but effective method is proposed to adaptively treat the boundary pixels as background and non-background pixels.
-
We not only use the color information, but also utilize the variance and histogram features in multi color spaces (LAB and RGB) to enhance the detection performance.
-
We present a simple but effective salient object segmentation via computed saliency map.
The rest of this paper is organized as follows. In Sect. 2, we first give a detailed description of graph-based manifold ranking, and then present our proposed model. Then, in Sect. 3, we provide a qualitative and quantitative comparison with previous methods. We will present application of salient object detection: salient object segmentation in Sect. 4. Finally, we conclude with a short summary and discussion in Sect. 5.
2 Robust Salient Object Detection
In 2004, Zhou et al. [19, 20] propose a graph-based manifold ranking model, a method that can exploit the intrinsic manifold structure of data. It can be regarded as a kind of semi-supervised learning problem. We present a robust salient object detection method via adaptive background selection and multi-features enhancement. We first give a brief introduction to the graph-based manifold ranking, and then present the details of our proposed method.
2.1 Graph-Based Manifold Ranking
Given a set of n data points \(X=\{x_1,x_2,...,x_q,...,x_n\}\), with each data \(x_i\in R^{m}\), the first q points \(\{x_1,x_2,...,x_q\}\) are labeled as the queries and the rest points \(\{x_{q+1},...,x_n\}\) are unlabelled. The ranking algorithm aims to rank the remaining points according to their relevances to the labelled queries. Let \(f\!\!\!:X\rightarrow R^n\) denotes a ranking function which assigns to each data point \(x_i\) a ranking value \(f_i\). We can treat f as a vector \(f=[f_1,f_2,...,f_n]^T\). We can also define an indication vector \(y=[y_1,y_2,...,y_n]^T\), in which \(y_i=1\) if \(x_i\) is a query, and \(y_i=0\) otherwise.
Next, we define a graph \(G=(V,E)\) on these data points, where the nodes V are dataset X and the edges E are weighted by an affinity matrix \(W=[w_{ij}]_{n\times n}\). Give G, the degree matrix is denoted as \(D=diag\{d_{11},d_{22},...,d_{nn}\}\), where \(d_{ii}=\sum _{j=1}^{n}w_{ij}\).
According to Zhou et al. [20], cost function associated with the ranking function f is defined to be
where the regularization parameter \(\mu >0\) controls the balance of the first term (smoothness constraint) and the second term (fitting constraint, containing labelled as well as unlabelled data.). Then the optimal ranking \(f^*\) of queries is computed by solving the following optimization problem:
The trade-off between these two competing constraints is captured by a positive parameter \(\mu \) and usually set to be 0.99 to put more emphasis on the label consistency. The solution of Eq. (2) can be denoted as
where I is an identity matrix, and \(S=D^{-\frac{1}{2}}WD^{-\frac{1}{2}}\) is the normalized Laplacian matrix, \(\alpha =1/(1+\mu )\). The detailed derivation can be found in [20].
This ranking algorithm indicates that the salient object detection model should consist of two parts: graph construction and ranking with queries. In Sect. 2.2, we present our multi-features enhanced graph construction and then in Sect. 2.3, we give the details of our adaptive background selection and saliency ranking.
2.2 Multi-Features Enhanced Graph Construction
To better exploit the intrinsic relationship between data points, there are two aspects should be carefully treated in graph construction: graph structure and edge weights. We over-segment input image into small homogeneous regions using SLIC algorithm [21] and regard each superpixel as a node in the graph G.
For graph structure, we take into account the local smoothness cue (i.e., local neighboring superpixels are more likely to belong to the same object) and follow two rules. Firstly, each node is not only connected with its direct adjacent neighboring nodes, but also is connected with those nodes sharing common boundaries with its neighboring nodes. Secondly, the nodes on the four image sides should be connected together. Figure 2 gives an illustration of graph construction.
After modelling the graph structure, the very core problem is how to get the edge weight between any pairwise nodes given input data. The color information has been shown to be effective in saliency detection [7, 12]. So most models only adopt color information to generate the edge weights. However, there are other features can be utilized to improve the performance. We employ more features: color, variance and histogram feature. We denote the edge weight as following
where \(r_i\) and \(r_j\) denote the superpixel region i and j respectively. \(c_c(r_i,r_j)\), \(c_v(r_i,r_j)\) and \(c_h(r_i,r_j)\) represent the corresponding color, variance and histogram feature difference between region \(r_i\) and \(r_j\) respectively. \(\sigma _c\), \(\sigma _v\) and \(\sigma _h\) are feature parameters controlling the strength of the corresponding weight and we take 5, 2 and 2 in all experiments. The color feature is defined as
where \(c_c(r_i)\) and \(c_c(r_j)\) denote the mean of region \(r_i\) and \(r_j\) respectively in Lab color space.
Generally speaking, the color distributions of image regions are independent of each other with different variances, so we should also take advantage of the variance information. We define the variance feature difference as
where \(\sigma _v(r_i)\) and \(\sigma _v(r_j)\) are corresponding computed regional variance. \(n(r_i)\) and \(n(r_j)\) are number of pixels in the regions \(r_i\) and \(r_j\) respectively. \(\epsilon \) is a small number to avoid arithmetic error. Note that we also take the region size into account. This is performed in RGB color space.
For histogram feature, we utilize \(\chi ^2\) distance instead of simple Euclidean distance to define the disparity between two histograms as suggested in [22]. The histogram feature is defined by
where \(h_k(r_i)\) denotes the k-th component of the color histogram of region \(r_i\), d denotes the number of component in the histogram, we take \(d=256\) in this work for simpleness, however, d can be much more smaller in order to improve the computational efficiency. This is also performed in RGB color space.
All these three features are normalized to [0, 1]Footnote 1. We keep all other parameters unchanged, and add feature(s) to compute the saliency map and give an comparative example in Fig. 3. We can see that these two additional features can both improve the saliency detection performance.
2.3 Saliency Ranking via Adaptive Background Selection
Most background prior based models treat all the four image sides as background by assuming that photographers will not crop salient objects along the view frame. However, this is not always true. Figure 4 shows a special case and the visual comparison of our model and [17]. We can see that when the salient object touches the image border, the detection result of [17] is not so robust anymore. While our proposed method can handle this drawback. See Algorithm 1 and Algorithm 2 for our adaptive background selection and saliency ranking respectively.
3 Experiments
In this section, we extensively evaluate our model and make quantitative evaluation and qualitative evaluation on three widely used datasets SOD [23], ECSSD [24] and ASD [12].
We compare our approaches with twenty state-of-the-art salient object models on these three widely used datasets. These twenty models are: CA [9], CB [10], CHM [25], FES [26], FT [12], GMR [17], GS [16], HDCT [27], HS [24], MC [28], MSS [29], PCA [30], SF [31], SVO [32], SWD [14], BM [33], LRMR [34], GB [35], SR [36], IT [11].
3.1 Quantitative Evaluation
For quantitative evaluation, we evaluate the performance using three commonly used metrics including the PR (precision-recall) curve, F-Measure and MAE (mean absolute error).
PR curve is based on the overlapping area between pixel-wise annotation and saliency prediction. F-Measure, jointly considers recall and precision. We also introduce the mean absolute error (MAE) into the evaluation because the PR curves are limited in that they only consider whether the object saliency is higher than the background saliency. MAE is the average per-pixel difference between the pixel-wise annotation and the computed saliency map. It directly measures how close a saliency map is to the ground truth and is more meaningful and complementary to PR curves.
Figures 5, 6 and 7 show the PR curves, F-Measures and MAEs of all compared and our models on these three data sets. We note that the PR curve of proposed method outperforms PR curves of all other methods on SOD dataset. On ECSSD and ASD data sets, our model is among the best performance models. For F-Measure, our model gets the best performance on all data sets. And for MAE, our model has the smallest value on all these three data set and this indicates that our saliency maps are closest to the ground truth masks.
3.2 Qualitative Evaluation
For qualitative evaluation, the results of applying the various algorithms to representative images from SOD, ECSSD and ASD are shown in Fig. 8. We note that the proposed algorithm uniformly highlights the salient regions and preserves finer object boundaries than all other methods. It is also worth pointing out that our algorithm performs well when the background is cluttered.
4 Salient Object Segmentation
In [37], Cheng propose an iterative version of GrabCut, named SaliencyCut, to cut out the salient object. However, their work is based on predefined fixed threshold and is a little bit time consuming. We just use the adaptive threshold to segment the salient object. We first define the average saliency value as
where m and n denote image rows and columns respectively. Then the salient object mask is denoted as
The final segmented salient object is defined as
where \(.*\) denotes pixel-wise multiplication. See Fig. 9 for some segmentation examples.
5 Conclusion
In this paper, we address the salient object detection problem using a semi-supervised method. We tackle the failure case when the salient object touches the image border by adaptive background selection. We also take more features into account to better exploit the intrinsic relationship between image pixels. We evaluate our model on large datasets and demonstrate promising results with comparisons to twenty state-of-the-art methods. Finally, we present a simple but effective salient object segmentation method.
Notes
- 1.
For different channels in LAB and RGB color spaces, we perform the calculation separately and add the results together to get the corresponding feature descriptor.
References
Rutishauser, U., Walther, D., Koch, C., Perona, P.: Is bottom-up attention useful for object recognition?. In: IEEE CVPR, pp. II-37–II-44 (2004)
Ren, Z., Gao, S., Chia, L.-T., Tsang, I.: Region-based saliency detection and its application in object recognition. IEEE TCSVT 24(5), 769–779 (2013)
Guo, C., Zhang, L.: A novel multi resolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE TIP 19(1), 185–198 (2010)
Itti, L.: Automatic foveation for video compression using a neurobiological model of visual attention. IEEE TIP 13(10), 1304–1318 (2004)
Ding, Y., Xiao, J., Yu, J.: Importance filtering for image retargeting. In: IEEE CVPR, pp. 89–96 (2011)
Sun, J., Ling, H.: Scale and object aware image retargeting for thumbnail browsing. In: ICCV, pp. 1511–1518 (2011)
Borji, A., Sihite, D.N., Itti, L.: Salient object detection: a benchmark. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 414–429. Springer, Heidelberg (2012)
Borji, A., Cheng, M.M., Jiang, H. Z., Li, J.: Salient object detection: a survey. CORR, abs/1411.5878 (2014)
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Patt. Anal. Mach. Intell. 32(10), 1915–1925 (2012)
Jiang, H., Wang, J., Yuan, Z., Liu, T., Zheng, N., Li, S.: Automatic salient object segmentation based on context and shape prior. In: BMVC, pp. 110.1–110.12 (2011)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Patt. Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency Tuned salient region detection. In: IEEE CVPR, pp. 1597–1604 (2009)
Cheng, M., Zhang, G.X., Mitra, N.J. Huang, X., Hu, S.M.: Global contrast based salient region detection. In: IEEE CVPR, pp. 409–416 (2011)
Duan, L., Wu, C., Miao, J., Qing, L., Fu, Y.: Visual saliency detection by spatially weighted dissimilarity. In: IEEE CVPR, pp. 473–480 (2011)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: International Conference on Computer Vision, pp. 2106–2113 (2009)
Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 29–42. Springer, Heidelberg (2012)
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: IEEE CVPR, pp. 3166–3173 (2013)
Zhu, W., Liang, S., Wei, Y., Sun, J.: Saliency optimization from robust background detection. In: IEEE CVPR, pp. 2814–2821 (2014)
Zhou, D., Weston, J., Gretton, A., Bousquet, O., Scholkopf, B.: Ranking on data manifolds. In: NIPS (2004)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: NIPS, pp. 321–328 (2004)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Patt. Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Gorisse, D., Cord, M., Precioso, F.: Locality-sensitive hashing for chi2 distance. IEEE Trans. Patt. Anal. Mach. Intell. 34(2), 402–409 (2012)
Movahedi, V., Elder, J.: Design and perceptual validation of performance measures for salient object segmentation. In: IEEE CVPRW, pp. 49–56 (2010)
Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: IEEE CVPR, pp. 1155–1162 (2013)
Li, X., Li, Y., Shen, C., Dick, A., Hengel, A.: Contextual hypergraph modelling for salient object detection. In: IEEE ICCV, pp. 3328–3335 (2013)
Rezazadegan Tavakoli, H., Rahtu, E., Heikkilä, J.: Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 666–675. Springer, Heidelberg (2011)
Kim, J., Han, D., Tai, Y.W., Kim, J.: Salient region detection via high-dimensional color transform. In: IEEE CVPR, pp. 883–890 (2014)
Jiang, B., Zhang, L., Lu, H., Yang, C.: Saliency detection via absorbing markov chain. In: IEEE ICCV, pp. 1665–1672 (2013)
Achanta, R., Susstrunk, S.: Saliency detection using maximum symmetric surround. In: IEEE ICIP, pp. 2653–2656 (2010)
Margolin, R., Tal, A., Manor, L.: What makes a patch distinct?. In: IEEE CVPR, pp. 1139–1146 (2013)
Perazzi, F., Krahenbuhl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: IEEE CVPR, pp. 733–740 (2012)
Chang, K., Liu, T, Chen, H., Lai, S.: Fusing generic objectness and visual saliency for salient object detection. In: IEEE ICCV, pp. 914–921 (2011)
Xie, Y., Lu, L.: Visual saliency detection based on bayesian model. In: IEEE ICIP, pp. 645–648 (2011)
Shen, X., Wu, Y.: A unified approach to salient object detection via low rank matrix recovery. In: IEEE CVPR, pp. 853–860 (2012)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS, pp. 545–552 (2006)
Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: IEEE CVPR, pp. 1–8 (2007)
Cheng, M.M., Mitra, N.J., Huang, X.L., Torr, P.H.S., Hu, S.M.: Salient object detection and segmentation. doi:10.1109/TPAMI.2014.2345401
Acknowledgments
The authors would like to thank the editor and anonymous reviews for their valued suggestions which helped a lot to improve the manuscript. This work was supported in part by the Research Committee at University of Macau under Grant MYRG2014-00139-FST.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Li, H., Wu, W., Wu, E. (2015). Robust Salient Object Detection and Segmentation. In: Zhang, YJ. (eds) Image and Graphics. Lecture Notes in Computer Science(), vol 9219. Springer, Cham. https://doi.org/10.1007/978-3-319-21969-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-21969-1_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21968-4
Online ISBN: 978-3-319-21969-1
eBook Packages: Computer ScienceComputer Science (R0)