Video Saliency Detection by 3D Convolutional Neural Networks

Ding, Guanqun; Fang, Yuming

doi:10.1007/978-981-10-8108-8_23

Guanqun Ding¹² &
Yuming Fang¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 815))

Included in the following conference series:

International Forum on Digital TV and Wireless Multimedia Communications

1866 Accesses
6 Citations

Abstract

Different from salient object detection methods for still images, a key challenging for video saliency detection is how to extract and combine spatial and temporal features. In this paper, we present a novel and effective approach for salient object detection for video sequences based on 3D convolutional neural networks. First, we design a 3D convolutional network (Conv3DNet) with the input as three video frame to learn the spatiotemporal features for video sequences. Then, we design a 3D deconvolutional network (Deconv3DNet) to combine the spatiotemporal features to predict the final saliency map for video sequences. Experimental results show that the proposed saliency detection model performs better in video saliency prediction compared with the state-of-the-art video saliency detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Rutishauser, U., Walther, D., Koch, C., Perona, P.: Is bottom-up attention useful for object recognition? In: IEEE Conference on Computer Vision and Pattern Recognition (2004)
Google Scholar
Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Fang, Y., et al.: Saliency detection in the compressed domain for adaptive image retargeting. IEEE Trans. Image Process. 21(9), 3888–3901 (2012)
Article MathSciNet MATH Google Scholar
Li, X., et al.: DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans. Image Process. 25(8), 3919–3930 (2016)
Article MathSciNet Google Scholar
Du, T., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Google Scholar
Kim, H., Kim, Y., Sim, J.Y., Kim, C.S.: Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 24(8), 2552–2564 (2015)
Article MathSciNet Google Scholar
Zhang, P., Zhuo, T., Huang, W., Chen, K., Kankanhalli, M.: Online object tracking based on cnn with spatial-temporal saliency guided sampling. Neurocomputing 257, 115–127 (2017)
Article Google Scholar
Huang, W., Ding, H., Chen, G.: A novel deep multi-channel residual networks-based metric learning method for moving human localization in video surveillance. Sig. Process. 142, 104–113 (2017)
Article Google Scholar
Fang, Y., Zhang, C., Li, J., Lei, J., Perreira, D.S.M., Le, C.P.: Visual attention modeling for stereoscopic video: a benchmark and computational model. IEEE Trans. Image Process. 26(10), 1476–1490 (2017)
Article MathSciNet Google Scholar
Fang, Y., Lin, W., Chen, Z., Tsai, C.M., Lin, C.W.: A video saliency detection model in compressed domain. IEEE Trans. Circuits Syst. Video Technol. 24(1), 27–38 (2014)
Article Google Scholar
Liu, Z., Li, J., Ye, L., Sun, G., Shen, L.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans. Circuits Syst. Video Technol. PP(99), 1 (2016)
Google Scholar
Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Article MathSciNet Google Scholar
Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Fang, Y., et al.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 23(9), 3910–3921 (2014)
Article MathSciNet MATH Google Scholar
Li, Y., Zhou, Y., Yan, J., Niu, Z., Yang, J.: Visual saliency based on conditional entropy. In: Zha, H., Taniguchi, R., Maybank, S. (eds.) ACCV 2009. LNCS, vol. 5994, pp. 246–257. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12307-8_23
Chapter Google Scholar
Tomas, V., et al.: Pixel-wise object segmentations for the VOT 2016 dataset (2017)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. Computer Science (2014)
Google Scholar
Ioffe, S., et al.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
Google Scholar
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: IEEE International Conference on Computer Vision (2013)
Google Scholar
Perazzi, F., Sorkine-Hornung, A., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2016)
Google Scholar

Download references

Acknowledgments

This work was supported by NSFC (No. 61571212), and NSF of Jiangxi Province in China (No. 20071BBE50068, 20171BCB23048, 20161ACB21014, GJJ160420).

Author information

Authors and Affiliations

School of Information Technology, Jiangxi University of Finance and Economics, Nanchang, 330013, China
Guanqun Ding & Yuming Fang

Authors

Guanqun Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuming Fang .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University , Shanghai, China
Guangtao Zhai
Shanghai Jiao Tong University , Shanghai, China
Jun Zhou
Jiao Tong University , Shanghai, China
Xiaokang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, G., Fang, Y. (2018). Video Saliency Detection by 3D Convolutional Neural Networks. In: Zhai, G., Zhou, J., Yang, X. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2017. Communications in Computer and Information Science, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-10-8108-8_23

Download citation

DOI: https://doi.org/10.1007/978-981-10-8108-8_23
Published: 03 February 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8107-1
Online ISBN: 978-981-10-8108-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics