Abstract
Different from salient object detection methods for still images, a key challenging for video saliency detection is how to extract and combine spatial and temporal features. In this paper, we present a novel and effective approach for salient object detection for video sequences based on 3D convolutional neural networks. First, we design a 3D convolutional network (Conv3DNet) with the input as three video frame to learn the spatiotemporal features for video sequences. Then, we design a 3D deconvolutional network (Deconv3DNet) to combine the spatiotemporal features to predict the final saliency map for video sequences. Experimental results show that the proposed saliency detection model performs better in video saliency prediction compared with the state-of-the-art video saliency detection methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rutishauser, U., Walther, D., Koch, C., Perona, P.: Is bottom-up attention useful for object recognition? In: IEEE Conference on Computer Vision and Pattern Recognition (2004)
Simakov, D., Caspi, Y., Shechtman, E., Irani, M.: Summarizing visual data using bidirectional similarity. In: IEEE Conference on Computer Vision and Pattern Recognition (2008)
Fang, Y., et al.: Saliency detection in the compressed domain for adaptive image retargeting. IEEE Trans. Image Process. 21(9), 3888–3901 (2012)
Li, X., et al.: DeepSaliency: multi-task deep neural network model for salient object detection. IEEE Trans. Image Process. 25(8), 3919–3930 (2016)
Du, T., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Kim, H., Kim, Y., Sim, J.Y., Kim, C.S.: Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 24(8), 2552–2564 (2015)
Zhang, P., Zhuo, T., Huang, W., Chen, K., Kankanhalli, M.: Online object tracking based on cnn with spatial-temporal saliency guided sampling. Neurocomputing 257, 115–127 (2017)
Huang, W., Ding, H., Chen, G.: A novel deep multi-channel residual networks-based metric learning method for moving human localization in video surveillance. Sig. Process. 142, 104–113 (2017)
Fang, Y., Zhang, C., Li, J., Lei, J., Perreira, D.S.M., Le, C.P.: Visual attention modeling for stereoscopic video: a benchmark and computational model. IEEE Trans. Image Process. 26(10), 1476–1490 (2017)
Fang, Y., Lin, W., Chen, Z., Tsai, C.M., Lin, C.W.: A video saliency detection model in compressed domain. IEEE Trans. Circuits Syst. Video Technol. 24(1), 27–38 (2014)
Liu, Z., Li, J., Ye, L., Sun, G., Shen, L.: Saliency detection for unconstrained videos using superpixel-level graph and spatiotemporal propagation. IEEE Trans. Circuits Syst. Video Technol. PP(99), 1 (2016)
Wang, W., Shen, J., Shao, L.: Consistent video saliency using local gradient flow optimization and global refinement. IEEE Trans. Image Process. 24(11), 4185–4196 (2015)
Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Fang, Y., et al.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 23(9), 3910–3921 (2014)
Li, Y., Zhou, Y., Yan, J., Niu, Z., Yang, J.: Visual saliency based on conditional entropy. In: Zha, H., Taniguchi, R., Maybank, S. (eds.) ACCV 2009. LNCS, vol. 5994, pp. 246–257. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12307-8_23
Tomas, V., et al.: Pixel-wise object segmentations for the VOT 2016 dataset (2017)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. Computer Science (2014)
Ioffe, S., et al.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: IEEE International Conference on Computer Vision (2013)
Perazzi, F., Sorkine-Hornung, A., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2016)
Acknowledgments
This work was supported by NSFC (No. 61571212), and NSF of Jiangxi Province in China (No. 20071BBE50068, 20171BCB23048, 20161ACB21014, GJJ160420).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ding, G., Fang, Y. (2018). Video Saliency Detection by 3D Convolutional Neural Networks. In: Zhai, G., Zhou, J., Yang, X. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2017. Communications in Computer and Information Science, vol 815. Springer, Singapore. https://doi.org/10.1007/978-981-10-8108-8_23
Download citation
DOI: https://doi.org/10.1007/978-981-10-8108-8_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8107-1
Online ISBN: 978-981-10-8108-8
eBook Packages: Computer ScienceComputer Science (R0)