Advertisement

Joint motion boundary detection and CNN-based feature visualization for video object segmentation

  • Zahra Kamranian
  • Ahmad Reza Naghsh NilchiEmail author
  • Hamid Sadeghian
  • Federico Tombari
  • Nassir Navab
Original Article
  • 25 Downloads

Abstract

This paper presents a video object segmentation method which jointly uses motion boundary and convolutional neural network (CNN)-based class-level maps to carry out the co-segmentation of the frames. The key characteristic of the proposed approach is a combination of those two sources of information to create initial object and background regions. These regions are employed within the co-segmentation energy function. The motion boundary map detects the areas which contain the object movement, and the CNN-based class saliency map determines the regions with more impact on acquiring the correct network classification. The proposed approach can be implemented on unconstrained natural videos which include changes in an object’s appearance, rapidly moving background, object deformation in non-rigid moving, rapid camera motion and even the existence of a static object. Experimental results on two challenging datasets (i.e., Davis and SegTrackv2 datasets) demonstrate the competitive performance of the proposed method compared with the state-of-the-art approaches.

Keywords

Video object segmentation Class saliency map Co-segmentation Convolutional neural network Feature visualization Motion boundary 

Notes

Compliance with ethical standards

Conflict of interest

All the authors declare that they have no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. 1.
    Arbeláez P, Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 328–335Google Scholar
  2. 2.
    Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:14053531
  3. 3.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 248–255Google Scholar
  4. 4.
    Dong X, Shen J, Shao L, Yang MH (2015) Interactive cosegmentation using global and local energy optimization. IEEE Trans Image Process 24(11):3966–3977MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: British machine vision (BMVC) conferenceGoogle Scholar
  6. 6.
    Fathi A, Naghsh-Nilchi AR (2013) Integrating adaptive neuro-fuzzy inference system and local binary pattern operator for robust retinal blood vessels segmentation. Neural Comput Appl 22(1):163–174CrossRefGoogle Scholar
  7. 7.
    Fragkiadaki K, Arbelaez P, Felsen P, Malik J (2015) Learning to segment moving objects in videos. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4083–4090Google Scholar
  8. 8.
    Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Computer vision and pattern recognition (CVPR) conference, IEEE, pp 447–456Google Scholar
  9. 9.
    He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv preprint arXiv:151203385
  10. 10.
    Hochbaum DS, Singh V (2009) An efficient algorithm for co-segmentation. In: Computer vision (ICCV) international conference. IEEE, pp 269–276Google Scholar
  11. 11.
    Hu YT, Huang JB, Schwing A (2017) Maskrnn: instance level video object segmentation. In: Advances in neural information processing systems. pp 325–334Google Scholar
  12. 12.
    Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. arXiv preprint arXiv:170105384
  13. 13.
    Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Image and video retrieval international conference. ACM, pp 494–501Google Scholar
  14. 14.
    Kamranian Z, Nilchi ARN, Monadjemi A, Navab N (2018a) Iterative algorithm for interactive co-segmentation using semantic information propagation. Appl Intell 48(12):5019–5036CrossRefGoogle Scholar
  15. 15.
    Kamranian Z, Tombari F, Nilchi ARN, Monadjemi A, Navab N (2018b) Co-segmentation via visualization. J Vis Commun Image Represent 55:201–214CrossRefGoogle Scholar
  16. 16.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1725–1732Google Scholar
  17. 17.
    Khoreva A, Perazzi F, Benenson R, Schiele B, Sorkine-Hornung A (2016) Learning video object segmentation from static images. arXiv preprint arXiv:161202646
  18. 18.
    Kim G, Xing EP (2012) On multiple foreground cosegmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 837–844Google Scholar
  19. 19.
    Kim G, Xing EP, Fei-Fei L, Kanade T (2011) Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: Computer vision (ICCV) international conference. IEEE, pp 169–176Google Scholar
  20. 20.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS) conference. NIPS, pp 1097–1105Google Scholar
  21. 21.
    Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Computer vision (ICCV) international conference. IEEE, pp 1995–2002Google Scholar
  22. 22.
    Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1346–1353Google Scholar
  23. 23.
    Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: Computer vision (ICCV) international conference. IEEE, pp 2192–2199Google Scholar
  24. 24.
    Li H, Li Y, Porikli F (2016a) Deeptrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans Image Process 25(4):1834–1848MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Li K, Zhang J, Tao W (2016b) Unsupervised co-segmentation for indefinite number of common foreground objects. IEEE Trans Image Process 25(4):1898–1909MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Computer vision (ICCV) international conference. IEEE, pp 3074–3082Google Scholar
  27. 27.
    Ma T, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 670–677Google Scholar
  28. 28.
    Meng F, Li H, Liu G, Ngan KN (2012) Object co-segmentation based on shortest path algorithm and saliency model. IEEE Trans Multimed 14(5):1429–1441CrossRefGoogle Scholar
  29. 29.
    Meng F, Cai J, Li H (2016) Cosegmentation of multiple image groups. Comput Vis Image Underst 146:67–76CrossRefGoogle Scholar
  30. 30.
    Mukherjee L, Singh V, Dyer CR (2009) Half-integrality based algorithms for cosegmentation of images. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2028–2035Google Scholar
  31. 31.
    Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4293–4302Google Scholar
  32. 32.
    Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals. In: European conference on computer vision (ECCV). Springer, pp 737–752Google Scholar
  33. 33.
    Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Computer Vision (ICCV) International Conference, IEEE, pp 1777–1784Google Scholar
  34. 34.
    Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 724–732Google Scholar
  35. 35.
    Rother C, Minka T, Blake A, Kolmogorov V (2006) Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFS. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 993–1000Google Scholar
  36. 36.
    Sadeghian H, Villani L, Kamranian Z, Karami A (2015) Visual servoing with safe interaction using image moments. In: Intelligent robots and systems (IROS) international conference. IEEE, pp 5479–5485Google Scholar
  37. 37.
    Schwarz LA, Mateus D, Castañeda V, Navab N (2010) Manifold learning for tof-based human body tracking and activity recognition. In: British machine vision (BMVC) conference. Citeseer, pp 1–11Google Scholar
  38. 38.
    Simonyan K, Zisserman A (2014a) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems (NIPS) conference. NIPS, pp 568–576Google Scholar
  39. 39.
    Simonyan K, Zisserman A (2014b) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
  40. 40.
    Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:14126806
  41. 41.
    Sundberg P, Brox T, Maire M, Arbeláez P, Malik J (2011) Occlusion boundary detection and figure/ground assignment from optical flow. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2233–2240Google Scholar
  42. 42.
    Taylor B, Karasev V, Soatto S (2015) Causal video object segmentation from persistence of occlusions. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4268–4276Google Scholar
  43. 43.
    Tsai D, Flagg M, Nakazawa A, Rehg JM (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis 100(2):190–202MathSciNetCrossRefGoogle Scholar
  44. 44.
    Tsai YH, Yang MH, Black MJ (2016a) Video segmentation via object flow. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3899–3908Google Scholar
  45. 45.
    Tsai YH, Zhong G, Yang MH (2016b) Semantic co-segmentation in videos. In: European conference computer vision (ECCV). Springer, pp 760–775Google Scholar
  46. 46.
    Wang H, Raiko T, Lensu L, Wang T, Karhunen J (2016) Semi-supervised domain adaptation for weakly labeled semantic video object segmentation. arXiv preprint arXiv:160602280
  47. 47.
    Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3395–3402Google Scholar
  48. 48.
    Wen L, Du D, Lei Z, Li SZ, Yang MH (2015) Jots: joint online tracking and segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2226–2234Google Scholar
  49. 49.
    Xiao F, Jae Lee Y (2016) Track and segment: an iterative unsupervised approach for video object proposals. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 933–942Google Scholar
  50. 50.
    Yu G, Yuan J (2015) Fast action proposals for human action detection and search. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1302–1311Google Scholar
  51. 51.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision (ECCV). Springer, pp 818–833Google Scholar
  52. 52.
    Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 628–635Google Scholar
  53. 53.
    Zhang L, He Z, Liu Y (2017a) Deep object recognition across domains based on adaptive extreme learning machine. Neurocomputing 239:194–203CrossRefGoogle Scholar
  54. 54.
    Zhang L, Yang J, Zhang D (2017b) Domain class consistency based transfer learning for image classification across domains. Inf Sci 418:242–257CrossRefGoogle Scholar
  55. 55.
    Zhang Y, Chen X, Li J, Wang C, Xia C (2015) Semantic object segmentation via detection in weakly labeled video. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3641–3649Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Zahra Kamranian
    • 1
  • Ahmad Reza Naghsh Nilchi
    • 1
    Email author
  • Hamid Sadeghian
    • 2
  • Federico Tombari
    • 3
  • Nassir Navab
    • 3
  1. 1.Department of Artificial Intelligence, Faculty of Computer EngineeringUniversity of IsfahanIsfahanIran
  2. 2.Faculty of EngineeringUniversity of IsfahanIsfahanIran
  3. 3.Computer Aided Medical Procedures and Augmented RealityTechnische Universität MünchenMunichGermany

Personalised recommendations