Advertisement

Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object Segmentation

  • Ye WangEmail author
  • Jongmoo Choi
  • Yueru Chen
  • Qin Huang
  • Siyang Li
  • Ming-Sui Lee
  • C.-C. Jay Kuo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)

Abstract

One major technique debt in video object segmentation is to label the object masks for training instances. As a result, we propose to prepare inexpensive, yet high quality pseudo ground truth corrected with motion cue for video object segmentation training. Our method conducts semantic segmentation using instance segmentation networks and, then, selects the segmented object of interest as the pseudo ground truth based on the motion information. Afterwards, the pseudo ground truth is exploited to finetune the pretrained objectness network to facilitate object segmentation in the remaining frames of the video. We show that the pseudo ground truth could effectively improve the segmentation performance. This straightforward unsupervised video object segmentation method is more efficient than existing methods. Experimental results on DAVIS and FBMS show that the proposed method outperforms state-of-the-art unsupervised segmentation methods on various benchmark datasets. And the category-agnostic pseudo ground truth has great potential to extend to multiple arbitrary object tracking.

Keywords

Pseudo ground truth Unsupervised Video object segmentation 

References

  1. 1.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)Google Scholar
  2. 2.
    Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR 2017. IEEE (2017)Google Scholar
  3. 3.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv preprint arXiv:1606.00915 (2016)
  4. 4.
    Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: SegFlow: joint learning for video object segmentation and optical flow. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 686–695. IEEE (2017)Google Scholar
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  7. 7.
    Faktor, A., Irani, M.: Video segmentation by non-local consensus voting. In: BMVC, vol. 2, p. 8 (2014)Google Scholar
  8. 8.
    Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Simultaneous detection and segmentation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 297–312. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_20CrossRefGoogle Scholar
  9. 9.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017)Google Scholar
  10. 10.
    Huang, Q., Xia, C., Li, S., Wang, Y., Song, Y., Kuo, C.C.J.: Unsupervised clustering guided semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1489–1498. IEEE (2018)Google Scholar
  11. 11.
    Huang, Q., et al.: Semantic segmentation with reverse attention. arXiv preprint arXiv:1707.06426 (2017)
  12. 12.
    Jain, S.D., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. arXiv preprint arXiv:1701.053842(3), 6 (2017)
  13. 13.
    Keuper, M., Andres, B., Brox, T.: Motion trajectory segmentation via minimum cost multicuts. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3271–3279. IEEE (2015)Google Scholar
  14. 14.
    Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. arXiv preprint arXiv:1703.09554 (2017)
  15. 15.
    Koh, Y.J., Kim, C.S.: Primary object segmentation in videos based on region augmentation and reduction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 6 (2017)Google Scholar
  16. 16.
    Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2192–2199. IEEE (2013)Google Scholar
  17. 17.
    Li, S., Seybold, B., Vorobyov, A., Fathi, A., Huang, Q., Kuo, C.C.J.: Instance embedding transfer to unsupervised video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6526–6535 (2018)Google Scholar
  18. 18.
    Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.-C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 215–231. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_13CrossRefGoogle Scholar
  19. 19.
    Li, X., et al.: Video object segmentation with re-identification. arXiv preprint arXiv:1708.00197 (2017)
  20. 20.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  21. 21.
    Liu, C., et al.: Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. thesis, Massachusetts Institute of Technology (2009)Google Scholar
  22. 22.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  23. 23.
    Märki, N., Perazzi, F., Wang, O., Sorkine-Hornung, A.: Bilateral space video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 743–751 (2016)Google Scholar
  24. 24.
    Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014)CrossRefGoogle Scholar
  25. 25.
    Papazoglou, A., Ferrari, V.: Fast object segmentation in unconstrained video. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1777–1784. IEEE (2013)Google Scholar
  26. 26.
    Pathak, D., Girshick, R., Dollár, P., Darrell, T., Hariharan, B.: Learning features by watching objects move. In: Proceedings of the CVPR, vol. 2 (2017)Google Scholar
  27. 27.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition (2016)Google Scholar
  28. 28.
    Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: Computer Vision and Pattern Recognition (2017)Google Scholar
  29. 29.
    Port, R.F., Van Gelder, T.: Mind as Motion: Explorations in the Dynamics of Cognition. MIT press, Cambridge (1995)Google Scholar
  30. 30.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  31. 31.
    Taylor, B., Karasev, V., Soatto, S.: Causal video object segmentation from persistence of occlusions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4268–4276 (2015)Google Scholar
  32. 32.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning motion patterns in videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 531–539. IEEE (2017)Google Scholar
  33. 33.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. arXiv preprint arXiv:1704.05737 (2017)
  34. 34.
    Tsai, Y.H., Yang, M.H., Black, M.J.: Video segmentation via object flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3899–3908 (2016)Google Scholar
  35. 35.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364 (2017)
  36. 36.
    Wu, Z., Shen, C., Hengel, A.v.d.: Wider or deeper: revisiting the resnet model for visual recognition. arXiv preprint arXiv:1611.10080 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Southern CaliforniaLos AngelesUSA
  2. 2.National Taiwan UniversityTaipeiTaiwan

Personalised recommendations