Detecting Parallel-Moving Objects in the Monocular Case Employing CNN Depth Maps

  • Nolang FananiEmail author
  • Matthias Ochs
  • Rudolf Mester
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)


This paper presents a method for detecting independently moving objects (IMOs) from a monocular camera mounted on a moving car. We use an existing state of the art monocular sparse visual odometry/SLAM framework, and specifically attack the notorious problem of identifying those IMOs which move parallel to the ego-car motion, that is, in an ‘epipolar-conformant’ way. IMO candidate patches are obtained from an existing CNN-based car instance detector. While crossing IMOs can be identified as such by epipolar consistency checks, IMOs that move parallel to the camera motion are much harder to detect as their epipolar conformity allows to misinterpret them as static objects in a wrong distance. We employ a CNN to provide an appearance-based depth estimate, and the ambiguity problem can be solved through depth verification. The obtained motion labels (IMO/static) are then propagated over time using the combination of motion cues and appearance-based information of the IMO candidate patches. We evaluate the performance of our method on the KITTI dataset.

Supplementary material

Supplementary material 1 (mp4 56727 KB)


  1. 1.
    Jung, B., Sukhatme, G.S.: Detecting moving objects using a single camera on a mobile robot in an outdoor environment. In: International Conference on Intelligent Autonomous Systems, pp. 980–987 (2004)Google Scholar
  2. 2.
    Kundu, A., Jawahar, C.V., Krishna, K.M.: Realtime moving object detection from a freely moving monocular camera. In: IEEE International Conference on Robotics and Biomimetics, pp. 1635–1640 (2010)Google Scholar
  3. 3.
    Fanani, N., Ochs, M., Stürck, A., Mester, R.: CNN-based multi-frame IMO detection from a monocular camera. In: Intelligent Vehicles Symposium (IV). IEEE (2017)Google Scholar
  4. 4.
    Fanani, N., Stürck, A., Ochs, M., Bradler, H., Mester, R.: Predictive monocular odometry (PMO): what is possible without RANSAC and multiframe bundle adjustment? Image Vis. Comput. 68, 3–13 (2017)CrossRefGoogle Scholar
  5. 5.
    van den Brand, J., Ochs, M., Mester, R.: Instance-level segmentation of vehicles by deep contours. In: Chen, C.-S., Lu, J., Ma, K.-K. (eds.) ACCV 2016. LNCS, vol. 10116, pp. 477–492. Springer, Cham (2017). Scholar
  6. 6.
    Wedel, A., Meißner, A., Rabe, C., Franke, U., Cremers, D.: Detection and segmentation of independently moving objects from dense scene flow. In: Cremers, D., Boykov, Y., Blake, A., Schmidt, F.R. (eds.) EMMCVPR 2009. LNCS, vol. 5681, pp. 14–27. Springer, Heidelberg (2009). Scholar
  7. 7.
    Lenz, P., Ziegler, J., Geiger, A., Roser, M.: Sparse scene flow segmentation for moving object detection in urban environments. In: IEEE Intelligent Vehicles Symposium (IV), pp. 926–932 (2011)Google Scholar
  8. 8.
    Ošep, A., Mehner, W., Mathias, M., Leibe, B.: Combined image- and world-space tracking in traffic scenes. In: ICRA (2017)Google Scholar
  9. 9.
    Zhou, D., Frémont, V., Quost, B., Dai, Y., Li, H.: Moving object detection and segmentation in urban environments from a moving platform. Image Vis. Comput. 68, 76–87 (2017)CrossRefGoogle Scholar
  10. 10.
    López-Rubio, F.J., López-Rubio, E.: Foreground detection for moving cameras with stochastic approximation. Pattern Recogn. Lett. 68, 161–168 (2015)CrossRefGoogle Scholar
  11. 11.
    Yamaguchi, K., Kato, T., Ninomiya, Y.: Vehicle ego-motion estimation and moving object detection using a monocular camera. In: 18th International Conference on Pattern Recognition (ICPR 2006), vol. 4, pp. 610–613 (2006)Google Scholar
  12. 12.
    Jazayeri, A., Cai, H., Zheng, J.Y., Tuceryan, M.: Vehicle detection and tracking in car video based on motion model. IEEE Trans. Intell. Transp. Syst. 12(2), 583–595 (2011)CrossRefGoogle Scholar
  13. 13.
    Ramirez, A., Ohn-Bar, E., Trivedi, M.M.: Go with the flow: improving multi-view vehicle detection with motion cues. In: 22nd International Conference on Pattern Recognition, pp. 4140–4145 (2014)Google Scholar
  14. 14.
    Oliveira, G.L., Radwan, N., Burgard, W., Brox, T.: Topometric localization with deep learning. ArXiv preprint arXiv:1706.08775 (2017)
  15. 15.
    Wulff, J., Sevilla-Lara, L., Black, M.J.: Optical flow in mostly rigid scenes. arXiv preprint arXiv:1705.01352 (2017)
  16. 16.
    Bai, M., Luo, W., Kundu, K., Urtasun, R.: Exploiting semantic information and deep matching for optical flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 154–170. Springer, Cham (2016). Scholar
  17. 17.
    Klappstein, J., Stein, F., Franke, U.: Monocular motion detection using spatial constraints in a unified manner. In: Intelligent Vehicles Symposium, pp. 261–267. IEEE (2006)Google Scholar
  18. 18.
    Wong, C.C., Siu, W.C., Jennings, P., Barnes, S., Fong, B.: A smart moving vehicle detection system using motion vectors and generic line features. IEEE Trans. Consum. Electron. 61(3), 384–392 (2015)CrossRefGoogle Scholar
  19. 19.
    Ranftl, R., Vineet, V., Chen, Q., Koltun, V.: Dense monocular depth estimation in complex dynamic scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4058–4066 (2016)Google Scholar
  20. 20.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  21. 21.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)Google Scholar
  22. 22.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: International Conference on 3D Vision (3DV), pp. 239–248 (2016)Google Scholar
  23. 23.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Conference on Neural Information Processing Systems (NIPS), pp. 2366–2374 (2014)Google Scholar
  24. 24.
    Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). Scholar
  25. 25.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  26. 26.
    Kuznietsov, Y., Stückler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  27. 27.
    Hirschmuller, H.: Stereo processing by semiglobal matching and mutual information. Trans. Pattern Anal. Mach. Intell. (PAMI) 30(2), 328–341 (2008)CrossRefGoogle Scholar
  28. 28.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016)Google Scholar
  29. 29.
    Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill 1(10), e3 (2016)CrossRefGoogle Scholar
  30. 30.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  31. 31.
    Siam, M., Mahgoub, H., Zahran, M., Yogamani, S., Jagersand, M., El-Sallab, A.: Modnet: Moving object detection network with motion and appearance for autonomous driving. arXiv preprint arXiv:1709.04821 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Visual Sensorics and Information Processing LabGoethe University FrankfurtFrankfurtGermany
  2. 2.Computer Vision Laboratory, ISYLinköping UniversityLinköpingSweden

Personalised recommendations