Extending Layered Models to 3D Motion

  • Dong LaoEmail author
  • Ganesh Sundaramoorthi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)


We consider the problem of inferring a layered representation, its depth ordering and motion segmentation from video in which objects may undergo 3D non-planar motion relative to the camera. We generalize layered inference to that case and corresponding self-occlusion phenomena. We accomplish this by introducing a flattened 3D object representation, which is a compact representation of an object that contains all visible portions of the object seen in the video, including parts of an object that are self-occluded (as well as occluded) in one frame but seen in another. We formulate the inference of such flattened representations and motion segmentation, and derive an optimization scheme. We also introduce a new depth ordering scheme, which is independent of layered inference and addresses the case of self-occlusion. It requires little computation given the flattened representations. Experiments on benchmark datasets show the advantage of our method over existing layered methods, which do not model 3D motion and self-occlusion.


Motion Video segmentation Layered models 

Supplementary material

474197_1_En_27_MOESM1_ESM.pdf (15.5 mb)
Supplementary material 1 (pdf 15885 KB)


  1. 1.
    Cremers, D., Soatto, S.: Motion competition: a variational approach to piecewise parametric motion segmentation. Int. J. Comput. Vis. 62(3), 249–265 (2005)CrossRefGoogle Scholar
  2. 2.
    Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014)CrossRefGoogle Scholar
  3. 3.
    Yang, Y., Sundaramoorthi, G., Soatto, S.: Self-occlusions and disocclusions in causal video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4408–4416 (2015)Google Scholar
  4. 4.
    Keuper, M., Andres, B., Brox, T.: Motion trajectory segmentation via minimum cost multicuts. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3271–3279. IEEE (2015)Google Scholar
  5. 5.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. arXiv preprint arXiv:1704.05737 (2017)
  6. 6.
    Jain, S., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. arXiv preprint arXiv:1701.05384 (2017)
  7. 7.
    Wang, J.Y., Adelson, E.H.: Representing moving images with layers. IEEE Trans. Image Process. 3(5), 625–638 (1994)CrossRefGoogle Scholar
  8. 8.
    Darrell, T., Pentland, A.: Robust estimation of a multi-layered motion representation. In: 1991 Proceedings of the IEEE Workshop on Visual Motion, pp. 173–178. IEEE (1991)Google Scholar
  9. 9.
    Hsu, S., Anandan, P., Peleg, S.: Accurate computation of optical flow by using layered motion representations. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition 1994, Conference A: Computer Vision & Image Processing, vol. 1, pp. 743–746. IEEE (1994)Google Scholar
  10. 10.
    Ayer, S., Sawhney, H.S.: Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding. In: Proceedings of Fifth International Conference on Computer Vision 1995, pp. 777–784. IEEE (1995)Google Scholar
  11. 11.
    Bergen, L., Meyer, F.: Motion segmentation and depth ordering based on morphological segmentation. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 531–547. Springer, Heidelberg (1998). Scholar
  12. 12.
    Jojic, N., Frey, B.J.: Learning flexible sprites in video layers. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I–I. IEEE (2001)Google Scholar
  13. 13.
    Smith, P., Drummond, T., Cipolla, R.: Layered motion segmentation and depth ordering by tracking edges. IEEE Trans. Pattern Anal. Mach. Intell. 26(4), 479–494 (2004)CrossRefGoogle Scholar
  14. 14.
    Kumar, M.P., Torr, P.H., Zisserman, A.: Learning layered motion segmentations of video. Int. J. Comput. Vis. 76(3), 301–319 (2008)CrossRefGoogle Scholar
  15. 15.
    Schoenemann, T., Cremers, D.: A coding-cost framework for super-resolution motion layer decomposition. IEEE Trans. Image Process. 21(3), 1097–1110 (2012)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Jackson, J.D., Yezzi, A.J., Soatto, S.: Dynamic shape and appearance modeling via moving and deforming layers. Int. J. Comput. Vis. 79(1), 71–84 (2008)CrossRefGoogle Scholar
  17. 17.
    Sun, D., Sudderth, E.B., Black, M.J.: Layered segmentation and optical flow estimation over time. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1768–1775. IEEE (2012)Google Scholar
  18. 18.
    Sun, D., Wulff, J., Sudderth, E.B., Pfister, H., Black, M.J.: A fully-connected layered model of foreground and background flow. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2451–2458. IEEE (2013)Google Scholar
  19. 19.
    Taylor, B., Karasev, V., Soatto, S.: Causal video object segmentation from persistence of occlusions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4268–4276 (2015)Google Scholar
  20. 20.
    Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)CrossRefGoogle Scholar
  21. 21.
    Black, M.J., Anandan, P.: The robust estimation of multiple motions: parametric and piecewise-smooth flow fields. Comput. Vis. Image Underst. 63(1), 75–104 (1996)CrossRefGoogle Scholar
  22. 22.
    Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004). Scholar
  23. 23.
    Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2439. IEEE (2010)Google Scholar
  24. 24.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2 (2017)Google Scholar
  25. 25.
    Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Tsai, A., Yezzi, A., Willsky, A.S.: Curve evolution implementation of the Mumford-Shah functional for image segmentation, denoising, interpolation, and magnification. IEEE Trans. Image Process. 10(8), 1169–1186 (2001)CrossRefGoogle Scholar
  27. 27.
    Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the Mumford and Shah model. Int. J. Comput. Vis. 50(3), 271–293 (2002)CrossRefGoogle Scholar
  28. 28.
    Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford-Shah functional. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1133–1140. IEEE (2009)Google Scholar
  29. 29.
    Sun, D., Liu, C., Pfister, H.: Local layering for joint motion estimation and occlusion detection (2014)Google Scholar
  30. 30.
    Sevilla-Lara, L., Sun, D., Jampani, V., Black, M.J.: Optical flow with semantic segmentation and localized layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3889–3898 (2016)Google Scholar
  31. 31.
    Yang, Y., Sundaramoorthi, G.: Modeling self-occlusions in dynamic shape and appearance tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 201–208 (2013)Google Scholar
  32. 32.
    Zhu, S.C., Yuille, A.: Region competition: unifying snakes, region growing, and bayes/MDL for multiband image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(9), 884–900 (1996)CrossRefGoogle Scholar
  33. 33.
    Yang, Y., Sundaramoorthi, G.: Shape tracking with occlusions via coarse-to-fine region-based sobolev descent. IEEE Trans. Pattern Anal. Mach. Intell. 37(5), 1053–1066 (2015)CrossRefGoogle Scholar
  34. 34.
    Sundaramoorthi, G., Yezzi, A., Mennucci, A.: Coarse-to-fine segmentation and tracking using sobolev active contours. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 851–864 (2008)CrossRefGoogle Scholar
  35. 35.
    Dollár, P., Zitnick, C.L.: Fast edge detection using structured forests. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1558–1570 (2015)CrossRefGoogle Scholar
  36. 36.
    Liu, C., Freeman, W.T., Adelson, E.H., Weiss, Y.: Human-assisted motion annotation. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  37. 37.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition (2016)Google Scholar
  38. 38.
    Wehrwein, S., Szeliski, R.: Video segmentation with background motion models. In: British Machine Vision Conference (2017)Google Scholar
  39. 39.
    Ayvaci, A., Soatto, S.: Detachable object detection: segmentation and depth ordering from short-baseline video. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 1942–1951 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.KAUSTThuwalSaudi Arabia

Personalised recommendations