Advertisement

PReMVOS: Proposal-Generation, Refinement and Merging for Video Object Segmentation

  • Jonathon LuitenEmail author
  • Paul Voigtlaender
  • Bastian Leibe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)

Abstract

We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations. Towards this goal, we present the PReMVOS algorithm (Proposal-generation, Refinement and Merging for Video Object Segmentation). Our method separates this problem into two steps, first generating a set of accurate object segmentation mask proposals for each video frame and then selecting and merging these proposals into accurate and temporally consistent pixel-wise object tracks over a video sequence in a way which is designed to specifically tackle the difficult challenges involved with segmenting multiple objects across a video sequence. Our approach surpasses all previous state-of-the-art results on the DAVIS 2017 video object segmentation benchmark with a \(\mathcal {J}\)&\(\mathcal {F}\) mean score of 71.6 on the test-dev dataset, and achieves first place in both the DAVIS 2018 Video Object Segmentation Challenge and the YouTube-VOS 1st Large-scale Video Object Segmentation Challenge.

Notes

Acknowledgements

This project was funded, in parts, by ERC Consolidator Grant DeeViSe (ERC-2017-COG-773161).

References

  1. 1.
    Bao, L., Wu, B., Liu, W.: CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. arXiv preprint arXiv:1803.09453 (2018)
  2. 2.
    Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR (2017)Google Scholar
  3. 3.
    Caelles, S., et al.: The 2018 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1803.00557 (2018)
  4. 4.
    Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611 (2018)
  5. 5.
    Cheng, J., Tsai, Y.H., Hung, W.C., Wang, S., Yang, M.H.: Fast and accurate online video object segmentation via tracking parts. In: CVPR (2018)Google Scholar
  6. 6.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  7. 7.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  8. 8.
    Guo, P., Zhang, L., Zhang, H., Liu, X., Ren, H., Zhang, Y.: Adaptive video object segmentation with online data generation. In: The 2018 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2018)Google Scholar
  9. 9.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  11. 11.
    Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)
  12. 12.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR (2017)Google Scholar
  13. 13.
    Luiten, J., Voigtlaender, P., Leibe, B.: PReMVOS: proposal-generation, refinement and merging for the YouTube-VOS challenge on video object segmentation 2018. In: The 1st Large-scale Video Object Segmentation Challenge - ECCV Workshops (2018)Google Scholar
  14. 14.
    Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for video object segmentation. In: The 2018 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2018)Google Scholar
  15. 15.
    Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for multiple object tracking. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  16. 16.
    Li, X., Loy, C.C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: The 2018 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2018)CrossRefGoogle Scholar
  17. 17.
    Li, X., Loy, C.C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 93–110. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_6CrossRefGoogle Scholar
  18. 18.
    Li, X., et al.: Video object segmentation with re-identification. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  19. 19.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  20. 20.
    Luiten, J., Voigtlaender, P., Leibe, B.: PreMVOS: proposal-generation, refinement and merging for the DAVIS challenge on video object segmentation 2018. In: The 2018 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2018)Google Scholar
  21. 21.
    Maninis, K.K., et al.: Video object segmentation without temporal information. PAMI (2017)Google Scholar
  22. 22.
    Neuhold, G., Ollmann, T., Bulo, S.R., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: ICCV (2017)Google Scholar
  23. 23.
    Oh, S., Lee, J., Sunkavalli, K., Kim, S.: Fast video object segmentation by reference-guided mask propagation. In: CVPR (2018)Google Scholar
  24. 24.
    Ošep, A., Voigtlaender, P., Luiten, J., Breuers, S., Leibe, B.: Large-scale object discovery and detector adaptation from unlabeled video. arXiv preprint arXiv:1712.08832 (2017)
  25. 25.
    Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR (2017)Google Scholar
  26. 26.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)Google Scholar
  27. 27.
    Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017)
  28. 28.
    Tran, M., et al.: Context-based instance segmentation in video sequences. In: The 2018 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2018)Google Scholar
  29. 29.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  30. 30.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC (2017)Google Scholar
  31. 31.
    Wu, Y., et al.: Tensorpack (2016). https://github.com/tensorpack/
  32. 32.
    Wu, Z., Shen, C., van den Hengel, A.: Wider or deeper: revisiting the ResNet model for visual recognition. arXiv preprint arXiv:1611.10080 (2016)
  33. 33.
    Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. arXiv preprint arXiv:1809.00461 (2018)
  34. 34.
    Xu, N., Price, B., Cohen, S., Yang, J., Huang, T.: Deep GrabCut for object selection. In: BMVC (2017)Google Scholar
  35. 35.
    Xu, S., Bao, L., Zhou, P.: Class-agnostic video object segmentation without semantic re-identification. In: The 2018 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2018)Google Scholar
  36. 36.
    Yang, L., Wang, Y., Xiong, X., Yang, J., Katsaggelos, A.: Efficient video object segmentation via network modulation. In: CVPR (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jonathon Luiten
    • 1
    Email author
  • Paul Voigtlaender
    • 1
  • Bastian Leibe
    • 1
  1. 1.Computer Vision GroupRWTH Aachen UniversityAachenGermany

Personalised recommendations