RGB-D Saliency Detection by Multi-stream Late Fusion Network

Chen, Hao; Li, Youfu; Su, Dan

doi:10.1007/978-3-319-68345-4_41

Hao Chen¹⁶,
Youfu Li^16,17 &
Dan Su¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10528))

Included in the following conference series:

International Conference on Computer Vision Systems

4406 Accesses
6 Citations

Abstract

In this paper we aim to address the problem of saliency detection on RGB-D image pairs based on a multi-stream late fusion network. With the prevalence of RGB-D sensors, leveraging additional depth information to facilitate saliency detection task has drawn increasing attention. However, the key challenge that how to fuse RGB data and depth data in an optimum manner is still under-studied. Conventional wisdom simply regards depth information as an undifferentiated channel and models RGB-D saliency detection by using existing RGB saliency detection models directly. However, this paradigm is incapable of capturing specific representations in depth modality and also powerless in fusing multi-modal information. In this paper, we address this problem by proposing a simple yet principled late fusion strategy carried out in conjunction with convolutional neural networks (CNNs). The proposed network is able to learn discriminant representations and explore the complementarity between RGB and depth modalities. Comprehensive experiments on two public datasets witness the benefits of the proposed RGB-D saliency detection network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cheng, M., Mitra, N.J., Huang, X., Torr, P.H., Hu, S.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 569–582 (2015)
Article Google Scholar
Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans. Image Process. 19(1), 185–198 (2010)
Article MathSciNet Google Scholar
Itti, L.: Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans. Image Process. 13(10), 1304–1318 (2004)
Article Google Scholar
Yang, J., Yang, M.-H.: Top-down visual saliency via joint CRF and dictionary learning. In: CVPR 2012, pp. 2296–2303 (2012)
Google Scholar
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS 2007, pp. 545–552 (2007)
Google Scholar
Zhang, Y., Han, J., Guo, L.: Saliency detection by combining spatial and spectral information. Opt. Lett. 38(11), 1987–1989 (2013)
Article Google Scholar
Niu, Y., Geng, Y., Li, X., Liu, F.: Leveraging stereopsis for saliency analysis. In: CVPR 2012, pp. 454–461 (2012)
Google Scholar
Peng, H., Li, B., Xiong, W., Hu, W., Ji, R.: RGBD salient object detection: a benchmark and algorithms. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 92–109. Springer, Cham (2014). doi:10.1007/978-3-319-10578-9_7
Google Scholar
Ren, J., Gong, X., Yu, L., Zhou, W., Yang, M.Y.: Exploiting global priors for RGB-D saliency detection. In: CVPR Workshop 2015, pp. 25–32 (2015)
Google Scholar
Cheng, Y., Fu, H., Wei, X., Xiao, J., Cao, X.: Depth enhanced saliency detection method. In: ICIMCS 2014, p. 23 (2014)
Google Scholar
Ciptadi, A., Hermans, T., Rehg, J.M.: An in depth view of saliency. In: BMVC 2013, pp. 9–13 (2013)
Google Scholar
Desingh, K., Krishna, K.M., Rajan, D., Jawahar, C.V.: Depth really matters: improving visual salient region detection with depth. In: BMVC 2013 (2013)
Google Scholar
Ju, R., Ge, L., Geng, W., Ren, T., Wu, G.: Depth saliency based on anisotropic center-surround difference. In: ICIP 2014, pp. 1115–1119 (2014)
Google Scholar
Lang, C., Nguyen, T.V., Katti, H., Yadati, K., Kankanhalli, M., Yan, S.: Depth matters: influence of depth cues on visual saliency. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 101–115. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_8
Chapter Google Scholar
Fan, X., Liu, Z., Sun, G.: Salient region detection for stereoscopic images. In: DSP 2014, pp. 454–458 (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012, pp. 1097–1105 (2012)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM MM 2014, pp. 675–678 (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR 2015, pp. 3431–440 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Feng, D., Barnes, N., You, S., McCarthy, C.: Local background enclosure for RGB-D salient object detection. In: CVPR 2016, pp. 2343–2350 (2016)
Google Scholar
Qu, L., He, S., Zhang, J., Tian, J., Tang, Y., Yang, Q.: RGBD salient object detection via deep fusion. IEEE Trans. Image Process. 26(5), 2274–2285 (2017)
Article MathSciNet Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 345–360. Springer, Cham (2014). doi:10.1007/978-3-319-10584-0_23
Google Scholar
Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 530–549 (2004)
Article Google Scholar

Download references

Acknowledgments

This work is funded by the Research Grants Council of Hong Kong (CityU 11205015) and the National Natural Science Foundation of China (NSFC) (61673329).

Author information

Authors and Affiliations

Department of Mechanical and Biomedical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong SAR
Hao Chen, Youfu Li & Dan Su
City University of Hong Kong, Shenzhen Research Institute, Shenzhen, China
Youfu Li

Authors

Hao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Youfu Li
View author publications
You can also search for this author in PubMed Google Scholar
Dan Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youfu Li .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Ming Liu
Harbin Institute of Technology, Shenzhen, China
Haoyao Chen
Technische Universtiät Wien, Vienna, Austria
Markus Vincze

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, H., Li, Y., Su, D. (2017). RGB-D Saliency Detection by Multi-stream Late Fusion Network. In: Liu, M., Chen, H., Vincze, M. (eds) Computer Vision Systems. ICVS 2017. Lecture Notes in Computer Science(), vol 10528. Springer, Cham. https://doi.org/10.1007/978-3-319-68345-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-68345-4_41
Published: 11 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68344-7
Online ISBN: 978-3-319-68345-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics