Advertisement

VideoMatch: Matching Based Video Object Segmentation

  • Yuan-Ting HuEmail author
  • Jia-Bin Huang
  • Alexander G. Schwing
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11212)

Abstract

Video object segmentation is challenging yet important in a wide variety of applications for video analysis. Recent works formulate video object segmentation as a prediction task using deep nets to achieve appealing state-of-the-art performance. Due to the formulation as a prediction task, most of these methods require fine-tuning during test time, such that the deep nets memorize the appearance of the objects of interest in the given video. However, fine-tuning is time-consuming and computationally expensive, hence the algorithms are far from real time. To address this issue, we develop a novel matching based algorithm for video object segmentation. In contrast to memorization based classification techniques, the proposed approach learns to match extracted features to a provided template without memorizing the appearance of the objects. We validate the effectiveness and the robustness of the proposed method on the challenging DAVIS-16, DAVIS-17, Youtube-Objects and JumpCut datasets. Extensive results show that our method achieves comparable performance without fine-tuning and is much more favorable in terms of computational time.

Notes

Acknowledgments

This material is based upon work supported in part by the National Science Foundation under Grant No. 1718221, 1755785, Samsung, and 3M. We thank NVIDIA for providing the GPUs used for this research.

Supplementary material

474213_1_En_4_MOESM1_ESM.pdf (9 mb)
Supplementary material 1 (pdf 9256 KB)

Supplementary material 2 (mp4 32099 KB)

Supplementary material 3 (mp4 30529 KB)

Supplementary material 4 (mp4 7359 KB)

Supplementary material 5 (mp4 15973 KB)

References

  1. 1.
    Avinash Ramakanth, S., Venkatesh Babu, R.: SeamSeg: video object segmentation using patch seams. In: Proceedings of CVPR (2014)Google Scholar
  2. 2.
    Bai, X., Wang, J., Simons, D., Sapiro, G.: Video snapcut: robust video object cutout using localized classifiers. In: SIGGRAPH (2009)Google Scholar
  3. 3.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of CVPR (2017)Google Scholar
  4. 4.
    Caelles, S., Chen, Y., Pont-Tuset, J., Van Gool, L.: Semantically-guided video object segmentation (2017). arXiv preprint: arXiv:1704.01926
  5. 5.
    Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: Proceedings of CVPR (2017)Google Scholar
  6. 6.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. PAMI 40, 834–848 (2018)CrossRefGoogle Scholar
  7. 7.
    Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of CVPR (2018)Google Scholar
  8. 8.
    Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: SegFlow: joint learning for video object segmentation and optical flow. In: Proceedings of ICCV (2017)Google Scholar
  9. 9.
    Cheng, J., Tsai, Y.H., Hung, W.C., Wang, S., Yang, M.H.: Fast and accurate online video object segmentation via tracking parts. In: Proceedings of CVPR (2018)Google Scholar
  10. 10.
    Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: Proceedings of ICCV (2015)Google Scholar
  11. 11.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge: a retrospective. IJCV 111, 98–136 (2015)CrossRefGoogle Scholar
  12. 12.
    Faktor, A., Irani, M.: Video segmentation by non-local consensus voting. In: BMVC (2014)Google Scholar
  13. 13.
    Fan, Q., Zhong, F., Lischinski, D., Cohen-Or, D., Chen, B.: JumpCut: non-successive mask transfer and interpolation for video cutout. In: SIGGRAPH (2015)Google Scholar
  14. 14.
    Godec, M., Roth, P.M., Bischof, H.: Hough-based tracking of non-rigid objects. In: Proceedings of ICCV (2011)Google Scholar
  15. 15.
    Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: Proceedings of CVPR (2010)Google Scholar
  16. 16.
    Hariharan, B., Arbeláez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: Proceedings of ICCV (2011)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR (2016)Google Scholar
  18. 18.
    Hu, Y.T., Lin, Y.Y., Chen, H.Y., Hsu, K.J., Chen, B.Y.: Matching images with multiple descriptors: an unsupervised approach for locally adaptive descriptor selection. TIP 24, 5995–6010 (2015)MathSciNetGoogle Scholar
  19. 19.
    Hu, Y.T., Huang, J.B., Schwing, A.: MaskRNN: instance level video object segmentation. In: NIPS (2017)Google Scholar
  20. 20.
    Hu, Y.T., Huang, J.B., Schwing, A.: Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In: Ferrari, V., et al. (eds.) ECCV 2018, Part VIII. LNCS, vol. 11205, pp. 813–830. Springer, Cham (2018)CrossRefGoogle Scholar
  21. 21.
    Jain, S.D., Grauman, K.: Supervoxel-consistent foreground propagation in video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 656–671. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10593-2_43CrossRefGoogle Scholar
  22. 22.
    Jain, S.D., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In: Proceedings of CVPR (2017)Google Scholar
  23. 23.
    Jampani, V., Gadde, R., Gehler, P.V.: Video propagation networks. In: Proceedings of CVPR (2017)Google Scholar
  24. 24.
    Jang, W.D., Kim, C.S.: Online video object segmentation via convolutional trident network. In: Proceedings of CVPR (2017)Google Scholar
  25. 25.
    Khoreva, A., Perazzi, F., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: Proceedings of CVPR (2017)Google Scholar
  26. 26.
    Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking (2017). arXiv preprint: arXiv:1703.09554
  27. 27.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)Google Scholar
  28. 28.
    Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. PAMI 38, 2137–2155 (2016)CrossRefGoogle Scholar
  29. 29.
    Lee, Y.J., Kim, J., Grauman, K.: Key-segments for video object segmentation. In: Proceedings of ICCV (2011)Google Scholar
  30. 30.
    Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: spatio-temporal video segmentation with long-range motion cues. In: Proceedings of CVPR (2011)Google Scholar
  31. 31.
    Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: Proceedings of ICCV (2013)Google Scholar
  32. 32.
    Li, X., et al.: Video object segmentation with re-identification. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  33. 33.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  34. 34.
    Maerki, N., Perazzi, F., Wang, O., Sorkine-Hornung, A.: Bilateral space video segmentation. In: Proceedings of CVPR (2016)Google Scholar
  35. 35.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005)CrossRefGoogle Scholar
  36. 36.
    Nagaraja, N., Schmidt, F., Brox, T.: Video segmentation with just a few strokes. In: Proceedings of ICCV (2015)Google Scholar
  37. 37.
    Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. PAMI 36, 1187–1200 (2014)CrossRefGoogle Scholar
  38. 38.
    Oh, S.W., Lee, J.Y., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: Proceedings of CVPR (2018)Google Scholar
  39. 39.
    Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: Proceedings of ICCV (2013)Google Scholar
  40. 40.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.V., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of CVPR (2016)Google Scholar
  41. 41.
    Perazzi, F., Wang, O., Gross, M., Sorkine-Hornung, A.: Fully connected object proposals for video segmentation. In: Proceedings of ICCV (2015)Google Scholar
  42. 42.
    Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 DAVIS challenge on video object segmentation (2017). arXiv preprint: arXiv:1704.00675
  43. 43.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: Proceedings of CVPR (2012)Google Scholar
  44. 44.
    Price, B.L., Morse, B.S., Cohen, S.: LIVEcut: learning-based interactive video segmentation by evaluation of multiple propagated cues. In: Proceedings of ICCV (2009)Google Scholar
  45. 45.
    Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatching: hierarchical deformable dense matching. IJCV 120, 300–323 (2016)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of CVPR (2017)Google Scholar
  47. 47.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning motion patterns in videos. In: Proceedings of CVPR (2017)Google Scholar
  48. 48.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. In: Proceedings of ICCV (2017)Google Scholar
  49. 49.
    Tsai, D., Flagg, M., Rehg, J.: Motion coherent tracking with multi-label MRF optimization. In: Proceedings of BMVC (2010)Google Scholar
  50. 50.
    Tsai, Y.H., Yang, M.H., Black, M.J.: Video segmentation via object flow. In: Proceedings of CVPR (2016)Google Scholar
  51. 51.
    Vijayanarasimhan, S., Grauman, K.: Active frame selection for label propagation in videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 496–509. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_36CrossRefGoogle Scholar
  52. 52.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  53. 53.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC (2017)Google Scholar
  54. 54.
    Wang, W., Shen, J., Porikli, F.: Selective video object cutout. TIP 26, 5645–5655 (2017)MathSciNetGoogle Scholar
  55. 55.
    Xiao, F., Lee, Y.J.: Track and segment: an iterative unsupervised approach for video object proposals. In: Proceedings of CVPR (2016)Google Scholar
  56. 56.
    Yang, L., Wang, Y., Xiong, X., Yang, J., Katsaggelos, A.K.: Efficient video object segmentation via network modulation. In: Proceedings of CVPR (2018)Google Scholar
  57. 57.
    Yang, T.Y., Hsu, J.H., Lin, Y.Y., Chuang, Y.Y.: DeepCD: learning deep complementary descriptors for patch representations. In: Proceedings of ICCV (2017)Google Scholar
  58. 58.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. (CSUR) 38, 13 (2006)CrossRefGoogle Scholar
  59. 59.
    Yoon, J.S., Rameau, F., Kim, J., Lee, S., Shin, S., Kweon, I.S.: Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of ICCV (2017)Google Scholar
  60. 60.
    Zhong, F., Qin, X., Peng, Q., Meng, X.: Discontinuity-aware video object cutout. In: SIGGRAPH (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yuan-Ting Hu
    • 1
    Email author
  • Jia-Bin Huang
    • 2
  • Alexander G. Schwing
    • 1
  1. 1.University of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Virginia TechBlacksburgUSA

Personalised recommendations