MoA-Net: Self-supervised Motion Segmentation

  • Pia BideauEmail author
  • Rakesh R. MenonEmail author
  • Erik Learned-MillerEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11134)


Most recent approaches to motion segmentation use optical flow to segment an image into the static environment and independently moving objects. Neural network based approaches usually require large amounts of labeled training data to achieve state-of-the-art performance. In this work we propose a new approach to train a motion segmentation network in a self-supervised manner. Inspired by visual ecology, the human visual system, and by prior approaches to motion modeling, we break down the problem of motion segmentation into two smaller subproblems: (1) modifying the flow field to remove the observer’s rotation and (2) segmenting the rotation-compensated flow into static environment and independently moving objects. Compensating for rotation leads to essential simplifications that allow us to describe an independently moving object with just a few criteria which can be learned by our new motion segmentation network - the Motion Angle Network (MoA-Net). We compare our network with two other motion segmentation networks and show state-of-the-art performance on Sintel.


Optical flow Motion segmentation Video segmentation Camera motion Visual ecology 


  1. 1.
    Bideau, Pia, Learned-Miller, Erik: It’s moving! A probabilistic model for causal motion segmentation in moving camera videos. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9912, pp. 433–449. Springer, Cham (2016). Scholar
  2. 2.
    Bideau, P., RoyChowdhury, A., Menon, R.R., Learned-Miller, E.: The best of both worlds: combining CNNs and geometric constraints for hierarchical motion segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 508–517 (2018)Google Scholar
  3. 3.
    Brox, Thomas, Malik, Jitendra: Object segmentation by long term analysis of point trajectories. In: Daniilidis, Kostas, Maragos, Petros, Paragios, Nikos (eds.) ECCV 2010. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010). Scholar
  4. 4.
    Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 500–513 (2011)CrossRefGoogle Scholar
  5. 5.
    Butler, Daniel J., Wulff, Jonas, Stanley, Garrett B., Black, Michael J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, Andrew, Lazebnik, Svetlana, Perona, Pietro, Sato, Yoichi, Schmid, Cordelia (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). Scholar
  6. 6.
    Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)CrossRefGoogle Scholar
  7. 7.
    Hur, Junhwa, Roth, Stefan: Joint optical flow and temporally consistent semantic segmentation. In: Hua, Gang, Jégou, Hervé (eds.) ECCV 2016. LNCS, vol. 9913, pp. 163–177. Springer, Cham (2016). Scholar
  8. 8.
    Irani, M., Anandan, P.: A unified approach to moving object detection in 2D and 3D scenes. 20(6), 577–589 (1998)Google Scholar
  9. 9.
    Jain, S., Xiong, B., Grauman, K.: Fusionseg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. In: CVPR (2017)Google Scholar
  10. 10.
    Jain, Suyog Dutt, Grauman, Kristen: Supervoxel-consistent foreground propagation in video. In: Fleet, David, Pajdla, Tomas, Schiele, Bernt, Tuytelaars, Tinne (eds.) ECCV 2014. LNCS, vol. 8692, pp. 656–671. Springer, Cham (2014). Scholar
  11. 11.
    Land, M.F.: Motion and vision: why animals move their eyes. J. Comp. Physiol. A 185(4), 341–352 (1999)CrossRefGoogle Scholar
  12. 12.
    Lappe, M., Hoffmann, K.P., et al.: Optic flow and eye movements. Int. Rev. Neurobiol. 29–50 (2000)Google Scholar
  13. 13.
    Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2192–2199 (2013)Google Scholar
  14. 14.
    Longuet-Higgins, H.C., Prazdny, K., et al.: The interpretation of a moving retinal image. Proc. R. Soc. Lond. B 208(1173), 385–397 (1980)CrossRefGoogle Scholar
  15. 15.
    Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision (1981)Google Scholar
  16. 16.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)Google Scholar
  17. 17.
    Narayana, M., Hanson, A., Learned-Miller, E.: Coherent motion segmentation in moving camera videos using optical flow orientations, pp. 1577–1584 (2013)Google Scholar
  18. 18.
    Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014)CrossRefGoogle Scholar
  19. 19.
    Ogale, A.S., Fermüller, C., Aloimonos, Y.: Motion segmentation using occlusions 27(6), 988–992 (2005)Google Scholar
  20. 20.
    Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Computer Vision and Pattern Recognition (2016)Google Scholar
  21. 21.
    Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3282–3289. IEEE (2012)Google Scholar
  22. 22.
    Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: EpicFlow: edge-preserving interpolation of correspondences for optical flow. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1164–1172 (2015)Google Scholar
  23. 23.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Sawhney, H.S., Guo, Y., Kumar, R.: Independent motion detection in 3d scenes. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1191–1199 (2000)CrossRefGoogle Scholar
  25. 25.
    Sevilla-Lara, L., Sun, D., Jampani, V., Black, M.J.: Optical flow with semantic segmentation and localized layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3889–3898 (2016)Google Scholar
  26. 26.
    Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2439. IEEE (2010)Google Scholar
  27. 27.
    Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)Google Scholar
  28. 28.
    Tang, K., Sukthankar, R., Yagnik, J., Fei-Fei, L.: Discriminative segment annotation in weakly labeled video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2483–2490 (2013)Google Scholar
  29. 29.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning motion patterns in videos. In: CVPR (2017)Google Scholar
  30. 30.
    Tokmakov, P., Alahari, K., Schmid, C.: Learning video object segmentation with visual memory. In: ICCV (2017)Google Scholar
  31. 31.
    Torr, P.H.: Geometric motion segmentation and model selection. Philos. Trans. R. Soc. Lond. A: Math. Phys. Eng. Sci. 356(1740), 1321–1340 (1998)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Vijayanarasimhan, S., Ricco, S., Schmid, C., Sukthankar, R., Fragkiadaki, K.: SfM-Net: learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017)
  33. 33.
    Walls, G.: The evolutionary history of eye movements. Vis. Res. 2(1–4), 69–80 (1962)CrossRefGoogle Scholar
  34. 34.
    Wang, J.Y., Adelson, E.H.: Representing moving images with layers. IEEE Trans. Image Process. 3(5), 625–638 (1994)CrossRefGoogle Scholar
  35. 35.
    Wulff, Jonas, Butler, Daniel J., Stanley, Garrett B., Black, Michael J.: Lessons and insights from creating a synthetic optical flow benchmark. In: Fusiello, Andrea, Murino, Vittorio, Cucchiara, Rita (eds.) ECCV 2012. LNCS, vol. 7584, pp. 168–177. Springer, Heidelberg (2012). Scholar
  36. 36.
    Wulff, J., Sevilla-Lara, L., Black, M.J.: Optical flow in mostly rigid scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  37. 37.
    Zamalieva, D., Yilmaz, A.: Background subtraction for the moving camera: a geometric approach 127, 73–85 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.College of Information and Computer SciencesUniversity of Massachusetts AmherstAmherstUSA

Personalised recommendations