Advertisement

Adaptive Online Learning for Video Object Segmentation

  • Li Wei
  • Chunyan XuEmail author
  • Tong Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11935)

Abstract

In this work, we address the problem of video object segmentation (VOS), namely segmenting specific objects throughout a video sequence when given only an annotated first frame. Previous VOS methods based on deep neural networks often solves this problem by fine-tuning the segmentation model in the first frame of the test video sequence, which is time-consuming and can not be well adapted to the current target video. In this paper, we proposed the adaptive online learning for video object segmentation (AOL-VOS), which adaptively optimizes the network parameters and hyperparameters of segmentation model for better predicting the segmentation results. Specifically, we first pre-train the segmentation model with the static video frames and then learn the effective adaptation strategy on the training set by optimizing both network parameters and hyperparameters. In the testing process, we learn how to online adapt the learned segmentation model to the specific testing video sequence and the corresponding future video frames, where the confidence patterns is employed to constrain/guide the implementation of adaptive learning process by fusing both object appearance and motion cue information. Comprehensive evaluations on Davis 16 and SegTrack V2 datasets well demonstrate the significant superiority of our proposed AOL-VOS over other state-of-the-arts for video object segmentation task.

Keywords

Online-learning Video object segmentation Adaptation 

Notes

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Nos. 61972204, 61906094, U1713208), Tencent AI Lab Rhino-Bird Focused Research Program (No. JR201922), the Fundamental Research Funds for the Central Universities (Nos. 30918011321 and 30919011232).

References

  1. 1.
    Venetianer, P.L., et al.: Video surveillance system employing video primitives. Google Patents, US Patent 9,892,606 (2018)Google Scholar
  2. 2.
    Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masiz, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. 36, 47 (2017)CrossRefGoogle Scholar
  3. 3.
    Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 18–32 (2014)CrossRefGoogle Scholar
  4. 4.
    Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: CVPR, pp. 669–677 (2016)Google Scholar
  5. 5.
    Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: MultiNet: real-time joint semantic reasoning for autonomous driving. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013–1020 (2018)Google Scholar
  6. 6.
    Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 1647–1655 (2017)Google Scholar
  7. 7.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  8. 8.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR, pp. 730–738 (2017) Google Scholar
  9. 9.
    Hu, Y.-T., Huang, J.-B., Schwing, A.G.: Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 813–830. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_48CrossRefGoogle Scholar
  10. 10.
    Li, S., Seybold, B., Vorobyov, A., Fathi, A., Huang, Q., Jay Kuo, C.-C.: Instance embedding transfer to unsupervised video object segmentation. In: CVPR, pp. 6526–6535 (2018)Google Scholar
  11. 11.
    Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.-C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 215–231. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_13CrossRefGoogle Scholar
  12. 12.
    Croitoru, I., Bogolin, S.-V., Leordeanu, M.: Unsupervised learning from video to detect foreground objects in single images. In: ICCV, pp. 4345–4353 (2017)Google Scholar
  13. 13.
    Tsai, Y.-H., Yang, M.-H., Black, M.J.: Video segmentation via object flow. In: CVPR, pp. 3899–3908 (2016) Google Scholar
  14. 14.
    Xiao, H., Feng, J., Lin, G., Liu, Y., Zhang, M.: MoNet: deep motion exploitation for video object segmentation. In: CVPR, pp. 1140–1148 (2018)Google Scholar
  15. 15.
    Hu, P., Wang, G., Kong, X., Kuen, J., Tan, Y.-P.: Motion-guided cascaded refinement network for video object segmentation. In: CVPR, pp. 1400–1409 (2018)Google Scholar
  16. 16.
    Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: CVPR, pp. 1189–1198 (2018)Google Scholar
  17. 17.
    Cheng, J., Tsai, Y.-H., Wang, S., Yang, M.-H.: SegFlow: joint learning for video object segmentation and optical flow. In: ICCV, pp. 686–695 (2017)Google Scholar
  18. 18.
    Xu, M., Fan, C., Wang, Y., Ryoo, M.S., Crandall, D.J.: Joint person segmentation and identification in synchronized first- and third-person videos. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 656–672. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01246-5_39CrossRefGoogle Scholar
  19. 19.
    Li, X., Loy, C.C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 93–110. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_6CrossRefGoogle Scholar
  20. 20.
    Li, X., Loy, C.C.: Video object segmentation with re-identification. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  21. 21.
    Wug Oh, S., Lee, J.-Y., Sunkavalli, K., Joo Kim, S.: Fast video object segmentation by reference-guided mask propagation. In: CVPR, pp. 7376–7385 (2018)Google Scholar
  22. 22.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC, pp. 656–672 (2017)Google Scholar
  23. 23.
    Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. arXiv preprint arXiv:1703.09554 (2017)
  24. 24.
    Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: ICCV, pp. 2192–2199 (2013)Google Scholar
  25. 25.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732 (2016)Google Scholar
  26. 26.
    Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
  27. 27.
    Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611 (2018)
  28. 28.
    Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)Google Scholar
  29. 29.
    Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, pp. 5320–5329 (2017)Google Scholar
  30. 30.
    Maninis, K.-K., et al.: Video object segmentation without temporal information. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1515–1530 (2018)CrossRefGoogle Scholar
  31. 31.
    Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR, pp. 3491–3500 (2017)Google Scholar
  32. 32.
    Hu, Y.-T., Huang, J.-B., Schwing, A.: MaskRNN: instance level video object segmentation. In: NIPS, pp. 325–334 (2017)Google Scholar
  33. 33.
    Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 524–539. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01252-6_31CrossRefGoogle Scholar
  34. 34.
    Xiao, H., Kang, B., Liu, Y., Zhang, M., Feng, J.: Online meta adaptation for fast video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell.(2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, School of Computer Science and EngineeringNanjing University of Science and TechnologyNanjingChina

Personalised recommendations