Robust Salient Object Detection and Segmentation

Li, Hong; Wu, Wen; Wu, Enhua

doi:10.1007/978-3-319-21969-1_24

Robust Salient Object Detection and Segmentation

Hong Li¹⁴,
Wen Wu¹⁴ &
Enhua Wu¹⁴

Conference paper

2240 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9219))

Abstract

Background prior has been widely used in many salient object detection models with promising results. These methods assume that the image boundary is all background. Then, color feature based methods are used to extract the salient object. However, such assumption may be inaccurate when the salient object is partially cropped by the image boundary. Besides, using only color feature is also insufficient. We present a novel salient object detection model based on background selection and multi-features. Firstly, we present a simple but effective method to pick out more reliable background seeds. Secondly, we utilize multi-features enhanced graph-based manifold ranking to get the saliency maps. Finally, we also present the salient object segmentation via computed saliency map. Qualitative and quantitative evaluation results on three widely used data sets demonstrate significant appeal and advantages of our technique compared with many state-of-the art models.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Salient object detection aims to detect the most salient attention-grabbing object in a scene and the great value of it mainly lies in kinds of applications such as object detection and recognition [1, 2], image and video compression [3, 4], object content aware image retargeting [5, 6], to name a few. Therefore, numerous salient object detection models have been developed in recent years [7, 8]. All these models can be categorized as either bottom-up or top-down approaches. Bottom-up saliency models are based on some pre-assumed priors (e.g., contrast prior, central bias prior, background prior and so on). On the other side, top-down models usually use high-level information to guide the detection. We only focus on bottom-up models in this work.

For bottom-up salient object detection models, the priors play a critical role. The most widely used is the contrast prior and often measured with respect to local [9–11] or the global fashion [12–14]. Motivated by the early primate vision, Itti et al. [11] regard the visual attention as the local center-surround difference and present a pioneer saliency model based on multi-scales image features. Goferman et al. [9] take advantage of multi-clues including local low-level features, high-level features, and global considerations to segment out the salient objects along with their contexts. In [10], Jiang et al. utilize the shape information to find the regions of distinct color by computing the difference between the color histogram of a region and its adjacent regions. Due to the lack of higher-level information about the object, all these local contrast based models tend to produce higher saliency values near edges instead of uniformly highlighting the whole salient object.

On the other side, global contrast based methods take holistic rarity over the complete image into account. The model of Achanta et al. [12] works on a per-pixel basis through computing color dissimilarities to the mean image color and achieves globally consistent results. They also use Gaussian blur to decrease the influence of noise and high frequency patterns. Cheng et al. [13] define a regional contrast-based method by generating 3D histograms and using segmentation, which evaluates not only global contrast differences but also spatial coherence. Model [14] measures global contrast-based saliency based on spatially weighted feature dissimilarities. However, global contrast-based methods may highlight background regions as salient because they do not account for any spatial relationship inside the image.

The central-bias prior is based on a well-known fact that when humans take photos they often frame the interested objects near the center of the image. So Judd et al. [15] present a saliency model via computing the distance between each pixel and the coordinate center of the image. Their model presents a better prediction of the salient object than many previous saliency models. Later, both Goferman et al. [9] and Jiang et al. [10] enhance the intermediate saliency map with weight implemented via a 2D Gaussian fallof positioned at the center of the image. This prior usually improves the saliency performance of the most the natural images. However, the certral-bias prior is not always true when the photographer faces a big scene or the objects of interest cannot be located near the center of image.

Besides above two commonly used priors, several recent models also utilize the background prior, i.e., image boundary should be treated as background, to perform saliency detection. Wei et al. [16] propose a novel saliency measure called geodesic saliency, which use two priors about common backgrounds in natural images, namely boundary and connectivity priors, to help removing background clutters and in turn lead to better salient object detection. Later, Yang et al. [17] utilize this background prior and graph-based manifold ranking to detect the salient object and get promising results. However, they assume that all the four image sides are background. This is not always true when the image is cropped. Recently, Zhu et al. [18] propose a novel and reliable background measure, called boundary connectivity, and a principled optimization framework to integrate multiple low level cues. They do not treat all the image boundaries as background.However, their method is too complicated. Unlike all these aforementioned methods, our model not only adaptively treat the image boundaries as background or non-background, but also is very easy to be implemented. Figure 1 gives an overview of our framework.

The contributions of this paper are three-fold:

We adaptively treat the four image boundaries as background. A simple but effective method is proposed to adaptively treat the boundary pixels as background and non-background pixels.
We not only use the color information, but also utilize the variance and histogram features in multi color spaces (LAB and RGB) to enhance the detection performance.
We present a simple but effective salient object segmentation via computed saliency map.

The rest of this paper is organized as follows. In Sect. 2, we first give a detailed description of graph-based manifold ranking, and then present our proposed model. Then, in Sect. 3, we provide a qualitative and quantitative comparison with previous methods. We will present application of salient object detection: salient object segmentation in Sect. 4. Finally, we conclude with a short summary and discussion in Sect. 5.

2 Robust Salient Object Detection

In 2004, Zhou et al. [19, 20] propose a graph-based manifold ranking model, a method that can exploit the intrinsic manifold structure of data. It can be regarded as a kind of semi-supervised learning problem. We present a robust salient object detection method via adaptive background selection and multi-features enhancement. We first give a brief introduction to the graph-based manifold ranking, and then present the details of our proposed method.

2.1 Graph-Based Manifold Ranking

Given a set of n data points $X=\{x_1,x_2,...,x_q,...,x_n\}$, with each data $x_i\in R^{m}$, the first q points $\{x_1,x_2,...,x_q\}$ are labeled as the queries and the rest points $\{x_{q+1},...,x_n\}$ are unlabelled. The ranking algorithm aims to rank the remaining points according to their relevances to the labelled queries. Let $f\!\!\!:X\rightarrow R^n$ denotes a ranking function which assigns to each data point $x_i$ a ranking value $f_i$. We can treat f as a vector $f=[f_1,f_2,...,f_n]^T$. We can also define an indication vector $y=[y_1,y_2,...,y_n]^T$, in which $y_i=1$ if $x_i$ is a query, and $y_i=0$ otherwise.

Next, we define a graph $G=(V,E)$ on these data points, where the nodes V are dataset X and the edges E are weighted by an affinity matrix $W=[w_{ij}]_{n\times n}$. Give G, the degree matrix is denoted as $D=diag\{d_{11},d_{22},...,d_{nn}\}$, where $d_{ii}=\sum _{j=1}^{n}w_{ij}$.

According to Zhou et al. [20], cost function associated with the ranking function f is defined to be

(1)

where the regularization parameter $\mu >0$ controls the balance of the first term (smoothness constraint) and the second term (fitting constraint, containing labelled as well as unlabelled data.). Then the optimal ranking $f^*$ of queries is computed by solving the following optimization problem:

$$\begin{aligned} f^*=\arg \underset{f}{\min }\,Q(f) \end{aligned}$$

(2)

The trade-off between these two competing constraints is captured by a positive parameter $\mu $ and usually set to be 0.99 to put more emphasis on the label consistency. The solution of Eq. (2) can be denoted as

$$\begin{aligned} f^*=(I-\alpha S)^{-1}y \end{aligned}$$

(3)

where I is an identity matrix, and $S=D^{-\frac{1}{2}}WD^{-\frac{1}{2}}$ is the normalized Laplacian matrix, $\alpha =1/(1+\mu )$. The detailed derivation can be found in [20].

This ranking algorithm indicates that the salient object detection model should consist of two parts: graph construction and ranking with queries. In Sect. 2.2, we present our multi-features enhanced graph construction and then in Sect. 2.3, we give the details of our adaptive background selection and saliency ranking.

2.2 Multi-Features Enhanced Graph Construction

To better exploit the intrinsic relationship between data points, there are two aspects should be carefully treated in graph construction: graph structure and edge weights. We over-segment input image into small homogeneous regions using SLIC algorithm [21] and regard each superpixel as a node in the graph G.

For graph structure, we take into account the local smoothness cue (i.e., local neighboring superpixels are more likely to belong to the same object) and follow two rules. Firstly, each node is not only connected with its direct adjacent neighboring nodes, but also is connected with those nodes sharing common boundaries with its neighboring nodes. Secondly, the nodes on the four image sides should be connected together. Figure 2 gives an illustration of graph construction.

After modelling the graph structure, the very core problem is how to get the edge weight between any pairwise nodes given input data. The color information has been shown to be effective in saliency detection [7, 12]. So most models only adopt color information to generate the edge weights. However, there are other features can be utilized to improve the performance. We employ more features: color, variance and histogram feature. We denote the edge weight as following

$$\begin{aligned} w_{ij}=e^{-(\frac{c_c(r_i,r_j)}{\sigma _c^2} + \frac{c_v(r_i,r_j)}{\sigma _v^2} + \frac{c_h(r_i,r_j)}{\sigma _h^2})} \end{aligned}$$

(4)

where $r_i$ and $r_j$ denote the superpixel region i and j respectively. $c_c(r_i,r_j)$, $c_v(r_i,r_j)$ and $c_h(r_i,r_j)$ represent the corresponding color, variance and histogram feature difference between region $r_i$ and $r_j$ respectively. $\sigma _c$, $\sigma _v$ and $\sigma _h$ are feature parameters controlling the strength of the corresponding weight and we take 5, 2 and 2 in all experiments. The color feature is defined as

(5)

where $c_c(r_i)$ and $c_c(r_j)$ denote the mean of region $r_i$ and $r_j$ respectively in Lab color space.

Generally speaking, the color distributions of image regions are independent of each other with different variances, so we should also take advantage of the variance information. We define the variance feature difference as

(6)

where $\sigma _v(r_i)$ and $\sigma _v(r_j)$ are corresponding computed regional variance. $n(r_i)$ and $n(r_j)$ are number of pixels in the regions $r_i$ and $r_j$ respectively. $\epsilon $ is a small number to avoid arithmetic error. Note that we also take the region size into account. This is performed in RGB color space.

For histogram feature, we utilize $\chi ^2$ distance instead of simple Euclidean distance to define the disparity between two histograms as suggested in [22]. The histogram feature is defined by

$$\begin{aligned} c_h(r_i,r_j)=\frac{1}{2}\sum _{k=1}^d\frac{(h_k(r_i)-h_k(r_j))^2}{(h_k(r_i)+h_k(r_j))^2} \end{aligned}$$

(7)

where $h_k(r_i)$ denotes the k-th component of the color histogram of region $r_i$, d denotes the number of component in the histogram, we take $d=256$ in this work for simpleness, however, d can be much more smaller in order to improve the computational efficiency. This is also performed in RGB color space.

All these three features are normalized to [0, 1]^{Footnote 1}. We keep all other parameters unchanged, and add feature(s) to compute the saliency map and give an comparative example in Fig. 3. We can see that these two additional features can both improve the saliency detection performance.

2.3 Saliency Ranking via Adaptive Background Selection

Most background prior based models treat all the four image sides as background by assuming that photographers will not crop salient objects along the view frame. However, this is not always true. Figure 4 shows a special case and the visual comparison of our model and [17]. We can see that when the salient object touches the image border, the detection result of [17] is not so robust anymore. While our proposed method can handle this drawback. See Algorithm 1 and Algorithm 2 for our adaptive background selection and saliency ranking respectively.

3 Experiments

In this section, we extensively evaluate our model and make quantitative evaluation and qualitative evaluation on three widely used datasets SOD [23], ECSSD [24] and ASD [12].

We compare our approaches with twenty state-of-the-art salient object models on these three widely used datasets. These twenty models are: CA [9], CB [10], CHM [25], FES [26], FT [12], GMR [17], GS [16], HDCT [27], HS [24], MC [28], MSS [29], PCA [30], SF [31], SVO [32], SWD [14], BM [33], LRMR [34], GB [35], SR [36], IT [11].

3.1 Quantitative Evaluation

For quantitative evaluation, we evaluate the performance using three commonly used metrics including the PR (precision-recall) curve, F-Measure and MAE (mean absolute error).

PR curve is based on the overlapping area between pixel-wise annotation and saliency prediction. F-Measure, jointly considers recall and precision. We also introduce the mean absolute error (MAE) into the evaluation because the PR curves are limited in that they only consider whether the object saliency is higher than the background saliency. MAE is the average per-pixel difference between the pixel-wise annotation and the computed saliency map. It directly measures how close a saliency map is to the ground truth and is more meaningful and complementary to PR curves.

Figures 5, 6 and 7 show the PR curves, F-Measures and MAEs of all compared and our models on these three data sets. We note that the PR curve of proposed method outperforms PR curves of all other methods on SOD dataset. On ECSSD and ASD data sets, our model is among the best performance models. For F-Measure, our model gets the best performance on all data sets. And for MAE, our model has the smallest value on all these three data set and this indicates that our saliency maps are closest to the ground truth masks.

3.2 Qualitative Evaluation

For qualitative evaluation, the results of applying the various algorithms to representative images from SOD, ECSSD and ASD are shown in Fig. 8. We note that the proposed algorithm uniformly highlights the salient regions and preserves finer object boundaries than all other methods. It is also worth pointing out that our algorithm performs well when the background is cluttered.

4 Salient Object Segmentation

In [37], Cheng propose an iterative version of GrabCut, named SaliencyCut, to cut out the salient object. However, their work is based on predefined fixed threshold and is a little bit time consuming. We just use the adaptive threshold to segment the salient object. We first define the average saliency value as

$$\begin{aligned} sal_{mean}=\frac{1}{mn}\sum _{i=1}^m\sum _{j=1}^nS(i,j) \end{aligned}$$

(8)

where m and n denote image rows and columns respectively. Then the salient object mask is denoted as

$$\begin{aligned} Sal_{mask}(i,j)=\left\{ \begin{aligned} 1&,\quad S(i,j)>= sal_{mean}&\\ 0&,\quad S(i,j)< sal_{mean}&\\ \end{aligned} \right. \end{aligned}$$

(9)

The final segmented salient object is defined as

$$\begin{aligned} S_{obj}=I.*Sal_{mask} \end{aligned}$$

(10)

where $.*$ denotes pixel-wise multiplication. See Fig. 9 for some segmentation examples.

5 Conclusion

In this paper, we address the salient object detection problem using a semi-supervised method. We tackle the failure case when the salient object touches the image border by adaptive background selection. We also take more features into account to better exploit the intrinsic relationship between image pixels. We evaluate our model on large datasets and demonstrate promising results with comparisons to twenty state-of-the-art methods. Finally, we present a simple but effective salient object segmentation method.

Notes

1.
For different channels in LAB and RGB color spaces, we perform the calculation separately and add the results together to get the corresponding feature descriptor.

References

Rutishauser, U., Walther, D., Koch, C., Perona, P.: Is bottom-up attention useful for object recognition?. In: IEEE CVPR, pp. II-37–II-44 (2004)
Google Scholar
Ren, Z., Gao, S., Chia, L.-T., Tsang, I.: Region-based saliency detection and its application in object recognition. IEEE TCSVT 24(5), 769–779 (2013)
Google Scholar
Guo, C., Zhang, L.: A novel multi resolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE TIP 19(1), 185–198 (2010)
MATH Google Scholar
Itti, L.: Automatic foveation for video compression using a neurobiological model of visual attention. IEEE TIP 13(10), 1304–1318 (2004)
Google Scholar
Ding, Y., Xiao, J., Yu, J.: Importance filtering for image retargeting. In: IEEE CVPR, pp. 89–96 (2011)
Google Scholar
Sun, J., Ling, H.: Scale and object aware image retargeting for thumbnail browsing. In: ICCV, pp. 1511–1518 (2011)
Google Scholar
Borji, A., Sihite, D.N., Itti, L.: Salient object detection: a benchmark. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 414–429. Springer, Heidelberg (2012)
Chapter Google Scholar
Borji, A., Cheng, M.M., Jiang, H. Z., Li, J.: Salient object detection: a survey. CORR, abs/1411.5878 (2014)
Google Scholar
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Patt. Anal. Mach. Intell. 32(10), 1915–1925 (2012)
Article Google Scholar
Jiang, H., Wang, J., Yuan, Z., Liu, T., Zheng, N., Li, S.: Automatic salient object segmentation based on context and shape prior. In: BMVC, pp. 110.1–110.12 (2011)
Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Patt. Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency Tuned salient region detection. In: IEEE CVPR, pp. 1597–1604 (2009)
Google Scholar
Cheng, M., Zhang, G.X., Mitra, N.J. Huang, X., Hu, S.M.: Global contrast based salient region detection. In: IEEE CVPR, pp. 409–416 (2011)
Google Scholar
Duan, L., Wu, C., Miao, J., Qing, L., Fu, Y.: Visual saliency detection by spatially weighted dissimilarity. In: IEEE CVPR, pp. 473–480 (2011)
Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: International Conference on Computer Vision, pp. 2106–2113 (2009)
Google Scholar
Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 29–42. Springer, Heidelberg (2012)
Chapter Google Scholar
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: IEEE CVPR, pp. 3166–3173 (2013)
Google Scholar
Zhu, W., Liang, S., Wei, Y., Sun, J.: Saliency optimization from robust background detection. In: IEEE CVPR, pp. 2814–2821 (2014)
Google Scholar
Zhou, D., Weston, J., Gretton, A., Bousquet, O., Scholkopf, B.: Ranking on data manifolds. In: NIPS (2004)
Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: NIPS, pp. 321–328 (2004)
Google Scholar
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Patt. Anal. Mach. Intell. 34(11), 2274–2282 (2012)
Article Google Scholar
Gorisse, D., Cord, M., Precioso, F.: Locality-sensitive hashing for chi2 distance. IEEE Trans. Patt. Anal. Mach. Intell. 34(2), 402–409 (2012)
Article Google Scholar
Movahedi, V., Elder, J.: Design and perceptual validation of performance measures for salient object segmentation. In: IEEE CVPRW, pp. 49–56 (2010)
Google Scholar
Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: IEEE CVPR, pp. 1155–1162 (2013)
Google Scholar
Li, X., Li, Y., Shen, C., Dick, A., Hengel, A.: Contextual hypergraph modelling for salient object detection. In: IEEE ICCV, pp. 3328–3335 (2013)
Google Scholar
Rezazadegan Tavakoli, H., Rahtu, E., Heikkilä, J.: Fast and efficient saliency detection using sparse sampling and kernel density estimation. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 666–675. Springer, Heidelberg (2011)
Chapter Google Scholar
Kim, J., Han, D., Tai, Y.W., Kim, J.: Salient region detection via high-dimensional color transform. In: IEEE CVPR, pp. 883–890 (2014)
Google Scholar
Jiang, B., Zhang, L., Lu, H., Yang, C.: Saliency detection via absorbing markov chain. In: IEEE ICCV, pp. 1665–1672 (2013)
Google Scholar
Achanta, R., Susstrunk, S.: Saliency detection using maximum symmetric surround. In: IEEE ICIP, pp. 2653–2656 (2010)
Google Scholar
Margolin, R., Tal, A., Manor, L.: What makes a patch distinct?. In: IEEE CVPR, pp. 1139–1146 (2013)
Google Scholar
Perazzi, F., Krahenbuhl, P., Pritch, Y., Hornung, A.: Saliency filters: contrast based filtering for salient region detection. In: IEEE CVPR, pp. 733–740 (2012)
Google Scholar
Chang, K., Liu, T, Chen, H., Lai, S.: Fusing generic objectness and visual saliency for salient object detection. In: IEEE ICCV, pp. 914–921 (2011)
Google Scholar
Xie, Y., Lu, L.: Visual saliency detection based on bayesian model. In: IEEE ICIP, pp. 645–648 (2011)
Google Scholar
Shen, X., Wu, Y.: A unified approach to salient object detection via low rank matrix recovery. In: IEEE CVPR, pp. 853–860 (2012)
Google Scholar
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS, pp. 545–552 (2006)
Google Scholar
Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: IEEE CVPR, pp. 1–8 (2007)
Google Scholar
Cheng, M.M., Mitra, N.J., Huang, X.L., Torr, P.H.S., Hu, S.M.: Salient object detection and segmentation. doi:10.1109/TPAMI.2014.2345401

Download references

Acknowledgments

The authors would like to thank the editor and anonymous reviews for their valued suggestions which helped a lot to improve the manuscript. This work was supported in part by the Research Committee at University of Macau under Grant MYRG2014-00139-FST.

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Macau, Macau, 999078, China
Hong Li, Wen Wu & Enhua Wu

Authors

Hong Li
View author publications
You can also search for this author in PubMed Google Scholar
Wen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Enhua Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Li .

Editor information

Editors and Affiliations

Department of Electronic Engineering, Tsinghua University, Beijing, China
Yu-Jin Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Wu, W., Wu, E. (2015). Robust Salient Object Detection and Segmentation. In: Zhang, YJ. (eds) Image and Graphics. Lecture Notes in Computer Science(), vol 9219. Springer, Cham. https://doi.org/10.1007/978-3-319-21969-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-21969-1_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21968-4
Online ISBN: 978-3-319-21969-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)