Joint motion boundary detection and CNN-based feature visualization for video object segmentation

Abstract

This paper presents a video object segmentation method which jointly uses motion boundary and convolutional neural network (CNN)-based class-level maps to carry out the co-segmentation of the frames. The key characteristic of the proposed approach is a combination of those two sources of information to create initial object and background regions. These regions are employed within the co-segmentation energy function. The motion boundary map detects the areas which contain the object movement, and the CNN-based class saliency map determines the regions with more impact on acquiring the correct network classification. The proposed approach can be implemented on unconstrained natural videos which include changes in an object’s appearance, rapidly moving background, object deformation in non-rigid moving, rapid camera motion and even the existence of a static object. Experimental results on two challenging datasets (i.e., Davis and SegTrackv2 datasets) demonstrate the competitive performance of the proposed method compared with the state-of-the-art approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    https://github.com/zkamranian/Video-Object-Segmentation.

References

  1. 1.

    Arbeláez P, Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 328–335

  2. 2.

    Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:14053531

  3. 3.

    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 248–255

  4. 4.

    Dong X, Shen J, Shao L, Yang MH (2015) Interactive cosegmentation using global and local energy optimization. IEEE Trans Image Process 24(11):3966–3977

    MathSciNet  Article  Google Scholar 

  5. 5.

    Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: British machine vision (BMVC) conference

  6. 6.

    Fathi A, Naghsh-Nilchi AR (2013) Integrating adaptive neuro-fuzzy inference system and local binary pattern operator for robust retinal blood vessels segmentation. Neural Comput Appl 22(1):163–174

    Article  Google Scholar 

  7. 7.

    Fragkiadaki K, Arbelaez P, Felsen P, Malik J (2015) Learning to segment moving objects in videos. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4083–4090

  8. 8.

    Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Computer vision and pattern recognition (CVPR) conference, IEEE, pp 447–456

  9. 9.

    He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv preprint arXiv:151203385

  10. 10.

    Hochbaum DS, Singh V (2009) An efficient algorithm for co-segmentation. In: Computer vision (ICCV) international conference. IEEE, pp 269–276

  11. 11.

    Hu YT, Huang JB, Schwing A (2017) Maskrnn: instance level video object segmentation. In: Advances in neural information processing systems. pp 325–334

  12. 12.

    Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. arXiv preprint arXiv:170105384

  13. 13.

    Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Image and video retrieval international conference. ACM, pp 494–501

  14. 14.

    Kamranian Z, Nilchi ARN, Monadjemi A, Navab N (2018a) Iterative algorithm for interactive co-segmentation using semantic information propagation. Appl Intell 48(12):5019–5036

    Article  Google Scholar 

  15. 15.

    Kamranian Z, Tombari F, Nilchi ARN, Monadjemi A, Navab N (2018b) Co-segmentation via visualization. J Vis Commun Image Represent 55:201–214

    Article  Google Scholar 

  16. 16.

    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1725–1732

  17. 17.

    Khoreva A, Perazzi F, Benenson R, Schiele B, Sorkine-Hornung A (2016) Learning video object segmentation from static images. arXiv preprint arXiv:161202646

  18. 18.

    Kim G, Xing EP (2012) On multiple foreground cosegmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 837–844

  19. 19.

    Kim G, Xing EP, Fei-Fei L, Kanade T (2011) Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: Computer vision (ICCV) international conference. IEEE, pp 169–176

  20. 20.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS) conference. NIPS, pp 1097–1105

  21. 21.

    Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Computer vision (ICCV) international conference. IEEE, pp 1995–2002

  22. 22.

    Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1346–1353

  23. 23.

    Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: Computer vision (ICCV) international conference. IEEE, pp 2192–2199

  24. 24.

    Li H, Li Y, Porikli F (2016a) Deeptrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans Image Process 25(4):1834–1848

    MathSciNet  Article  Google Scholar 

  25. 25.

    Li K, Zhang J, Tao W (2016b) Unsupervised co-segmentation for indefinite number of common foreground objects. IEEE Trans Image Process 25(4):1898–1909

    MathSciNet  Article  Google Scholar 

  26. 26.

    Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Computer vision (ICCV) international conference. IEEE, pp 3074–3082

  27. 27.

    Ma T, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 670–677

  28. 28.

    Meng F, Li H, Liu G, Ngan KN (2012) Object co-segmentation based on shortest path algorithm and saliency model. IEEE Trans Multimed 14(5):1429–1441

    Article  Google Scholar 

  29. 29.

    Meng F, Cai J, Li H (2016) Cosegmentation of multiple image groups. Comput Vis Image Underst 146:67–76

    Article  Google Scholar 

  30. 30.

    Mukherjee L, Singh V, Dyer CR (2009) Half-integrality based algorithms for cosegmentation of images. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2028–2035

  31. 31.

    Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4293–4302

  32. 32.

    Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals. In: European conference on computer vision (ECCV). Springer, pp 737–752

  33. 33.

    Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Computer Vision (ICCV) International Conference, IEEE, pp 1777–1784

  34. 34.

    Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 724–732

  35. 35.

    Rother C, Minka T, Blake A, Kolmogorov V (2006) Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFS. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 993–1000

  36. 36.

    Sadeghian H, Villani L, Kamranian Z, Karami A (2015) Visual servoing with safe interaction using image moments. In: Intelligent robots and systems (IROS) international conference. IEEE, pp 5479–5485

  37. 37.

    Schwarz LA, Mateus D, Castañeda V, Navab N (2010) Manifold learning for tof-based human body tracking and activity recognition. In: British machine vision (BMVC) conference. Citeseer, pp 1–11

  38. 38.

    Simonyan K, Zisserman A (2014a) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems (NIPS) conference. NIPS, pp 568–576

  39. 39.

    Simonyan K, Zisserman A (2014b) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  40. 40.

    Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:14126806

  41. 41.

    Sundberg P, Brox T, Maire M, Arbeláez P, Malik J (2011) Occlusion boundary detection and figure/ground assignment from optical flow. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2233–2240

  42. 42.

    Taylor B, Karasev V, Soatto S (2015) Causal video object segmentation from persistence of occlusions. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4268–4276

  43. 43.

    Tsai D, Flagg M, Nakazawa A, Rehg JM (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis 100(2):190–202

    MathSciNet  Article  Google Scholar 

  44. 44.

    Tsai YH, Yang MH, Black MJ (2016a) Video segmentation via object flow. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3899–3908

  45. 45.

    Tsai YH, Zhong G, Yang MH (2016b) Semantic co-segmentation in videos. In: European conference computer vision (ECCV). Springer, pp 760–775

  46. 46.

    Wang H, Raiko T, Lensu L, Wang T, Karhunen J (2016) Semi-supervised domain adaptation for weakly labeled semantic video object segmentation. arXiv preprint arXiv:160602280

  47. 47.

    Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3395–3402

  48. 48.

    Wen L, Du D, Lei Z, Li SZ, Yang MH (2015) Jots: joint online tracking and segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2226–2234

  49. 49.

    Xiao F, Jae Lee Y (2016) Track and segment: an iterative unsupervised approach for video object proposals. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 933–942

  50. 50.

    Yu G, Yuan J (2015) Fast action proposals for human action detection and search. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1302–1311

  51. 51.

    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision (ECCV). Springer, pp 818–833

  52. 52.

    Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 628–635

  53. 53.

    Zhang L, He Z, Liu Y (2017a) Deep object recognition across domains based on adaptive extreme learning machine. Neurocomputing 239:194–203

    Article  Google Scholar 

  54. 54.

    Zhang L, Yang J, Zhang D (2017b) Domain class consistency based transfer learning for image classification across domains. Inf Sci 418:242–257

    Article  Google Scholar 

  55. 55.

    Zhang Y, Chen X, Li J, Wang C, Xia C (2015) Semantic object segmentation via detection in weakly labeled video. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3641–3649

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ahmad Reza Naghsh Nilchi.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kamranian, Z., Naghsh Nilchi, A.R., Sadeghian, H. et al. Joint motion boundary detection and CNN-based feature visualization for video object segmentation. Neural Comput & Applic 32, 4073–4091 (2020). https://doi.org/10.1007/s00521-019-04448-7

Download citation

Keywords

  • Video object segmentation
  • Class saliency map
  • Co-segmentation
  • Convolutional neural network
  • Feature visualization
  • Motion boundary