Semantic Aware Attention Based Deep Object Co-segmentation

  • Hong Chen
  • Yifei HuangEmail author
  • Hideki Nakayama
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)


Object co-segmentation is the task of segmenting the same objects from multiple images. In this paper, we propose the Attention Based Object Co-Segmentation for object co-segmentation that utilize a novel attention mechanism in the bottleneck layer of the deep neural network for the selection of semantically related features. Furthermore, we take the benefit of attention learner and propose an algorithm to segment multi-input images in linear time complexity. Experiment results demonstrate that our model achieves state of the art performance on multiple datasets, with a significant reduction of computational time.


Co-segmentation Attention Deep learning 



This work was supported by JSPS KAKENHI Grant Number 16H05872.

Supplementary material

484519_1_En_27_MOESM1_ESM.pdf (17.1 mb)
Supplementary material 1 (pdf 17515 KB)


  1. 1.
    Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR (2018)Google Scholar
  2. 2.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)CrossRefGoogle Scholar
  3. 3.
    Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T.: iCoseg: interactive co-segmentation with intelligent scribble guidance. In: CVPR (2010)Google Scholar
  4. 4.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  5. 5.
    Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. arXiv preprint arXiv:1611.05594 (2016)
  6. 6.
    Chen, X., Shrivastava, A., Gupta, A.: Enriching visual knowledge bases via object discovery and segmentation. In: CVPR (2014)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  8. 8.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: CVPR (2015)Google Scholar
  9. 9.
    Faktor, A., Irani, M.: Co-segmentation by composition. In: ICCV (2013)Google Scholar
  10. 10.
    Gan, C., Li, Y., Li, H., Sun, C., Gong, B.: VQS: linking segmentations to questions and answers for supervised attention in VQA and question-focused semantic segmentation. In: ICCV (2017)Google Scholar
  11. 11.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
  12. 12.
    Huang, Y., Cai, M., Kera, H., Yonetani, R., Higuchi, K., Sato, Y.: Temporal localization and spatial segmentation of joint attention in multiple first-person videos. In: ICCVW (2017)Google Scholar
  13. 13.
    Huang, Y., Cai, M., Li, Z., Sato, Y.: Predicting gaze in egocentric video by learning task-dependent attention transition. arXiv preprint arXiv:1803.09125 (2018)
  14. 14.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  15. 15.
    Jain, S.D., Xiong, B., Grauman, K.: Pixel objectness. arXiv preprint arXiv:1701.05349 (2017)
  16. 16.
    Jerripothula, K.R., Cai, J., Meng, F., Yuan, J.: Automatic image co-segmentation using geometric mean saliency. In: ICIP (2014)Google Scholar
  17. 17.
    Jerripothula, K.R., Cai, J., Yuan, J.: Image co-segmentation via saliency co-fusion. IEEE Trans. Multimedia 18(9), 1896–1909 (2016)CrossRefGoogle Scholar
  18. 18.
    Joulin, A., Bach, F., Ponce, J.: Discriminative clustering for image co-segmentation. In: CVPR (2010)Google Scholar
  19. 19.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  20. 20.
    Li, W., Jafari, O.H., Rother, C.: Deep object co-segmentation. arXiv preprint arXiv:1804.06423 (2018)
  21. 21.
    Li, Z., Tao, R., Gavves, E., Snoek, C.G., Smeulders, A., et al.: Tracking by natural language specification. In: CVPR (2017)Google Scholar
  22. 22.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)Google Scholar
  23. 23.
    Mukherjee, P., Lall, B., Lattupally, S.: Object cosegmentation using deep Siamese network. arXiv preprint arXiv:1803.02555 (2018)
  24. 24.
    Paszke, A., et al.: Automatic differentiation in PyTorch (2017)Google Scholar
  25. 25.
    Quan, R., Han, J., Zhang, D., Nie, F.: Object co-segmentation via graph optimized-flexible manifold ranking. In: CVPR (2016)Google Scholar
  26. 26.
    Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)Google Scholar
  27. 27.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)Google Scholar
  28. 28.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: TextonBoost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006). Scholar
  29. 29.
    Vicente, S., Kolmogorov, V., Rother, C.: Cosegmentation revisited: models and optimization. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 465–479. Springer, Heidelberg (2010). Scholar
  30. 30.
    Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: CVPR (2011)Google Scholar
  31. 31.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)Google Scholar
  32. 32.
    Yang, C., Kim, T., Wang, R., Peng, H., Kuo, C.C.J.: Show, attend and translate: unsupervised image translation with self-regularization and attention. arXiv preprint arXiv:1806.06195 (2018)
  33. 33.
    Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: CVPR (2016)Google Scholar
  34. 34.
    Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015)
  35. 35.
    Yu, D., Fu, J., Mei, T., Rui, Y.: Multi-level attention networks for visual question answering. In: CVPR (2017)Google Scholar
  36. 36.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  37. 37.
    Yu, Y., Choi, J., Kim, Y., Yoo, K., Lee, S.H., Kim, G.: Supervising neural attention models for video captioning by human gaze data. In: CVPR (2017)Google Scholar
  38. 38.
    Yuan, Z., Lu, T., Wu, Y.: Deep-dense conditional random fields for object co-segmentation. In: IJCAI (2017)Google Scholar
  39. 39.
    Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)
  40. 40.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)Google Scholar
  41. 41.
    Zhu, C., Zhao, Y., Huang, S., Tu, K., Ma, Y.: Structured attentions for visual question answering. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The University of TokyoTokyoJapan

Personalised recommendations