Advertisement

3D Vehicle Trajectory Reconstruction in Monocular Video Data Using Environment Structure Constraints

  • Sebastian BullingerEmail author
  • Christoph Bodensteiner
  • Michael Arens
  • Rainer Stiefelhagen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

We present a framework to reconstruct three-dimensional vehicle trajectories using monocular video data. We track two-dimensional vehicle shapes on pixel level exploiting instance-aware semantic segmentation techniques and optical flow cues. We apply Structure from Motion techniques to vehicle and background images to determine for each frame camera poses relative to vehicle instances and background structures. By combining vehicle and background camera pose information, we restrict the vehicle trajectory to a one-parameter family of possible solutions. We compute a ground representation by fusing background structures and corresponding semantic segmentations. We propose a novel method to determine vehicle trajectories consistent to image observations and reconstructed environment structures as well as a criterion to identify frames suitable for scale ratio estimation. We show qualitative results using drone imagery as well as driving sequences from the Cityscape dataset. Due to the lack of suitable benchmark datasets we present a new dataset to evaluate the quality of reconstructed three-dimensional vehicle trajectories. The video sequences show vehicles in urban areas and are rendered using the path-tracing render engine Cycles. In contrast to previous work, we perform a quantitative evaluation of the presented approach. Our algorithm achieves an average reconstruction-to-ground-truth-trajectory distance of 0.31 m using this dataset. The dataset including evaluation scripts will be publicly available on our website (Project page: http://s.fhg.de/trajectory).

Keywords

Vehicle trajectory reconstruction Instance-aware semantic segmentation Structure-from-motion 

References

  1. 1.
    Blender Online Community: Blender - a 3D modelling and rendering package (2016). http://www.blender.org
  2. 2.
    Bullinger, S., Bodensteiner, C., Arens, M.: Instance flow based online multiple object tracking. In: IEEE International Conference on Image Processing (ICIP). IEEE (2017)Google Scholar
  3. 3.
    Chhaya, F., Reddy, N.D., Upadhyay, S., Chari, V., Zia, M.Z., Krishna, K.M.: Monocular reconstruction of vehicles: combining SLAM with shape priors. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE (2016)Google Scholar
  4. 4.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016)Google Scholar
  5. 5.
    Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016)Google Scholar
  6. 6.
    Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(8), 1915–1929 (2013)CrossRefGoogle Scholar
  7. 7.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun. 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016)Google Scholar
  9. 9.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN 0521540518Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2017)Google Scholar
  11. 11.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)Google Scholar
  12. 12.
    Kundu, A., Krishna, K.M., Jawahar, C.V.: Realtime multibody visual slam with a smoothly moving monocular camera. In: IEEE International Conference on Computer Vision (ICCV). IEEE (2011)Google Scholar
  13. 13.
    Lebeda, K., Hadfield, S., Bowden, R.: 2D or not 2D: bridging the gap between tracking and structure from motion. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 642–658. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16817-3_42CrossRefGoogle Scholar
  14. 14.
    Lee, B., Daniilidis, K., Lee, D.D.: Online self-supervised monocular visual odometry for ground vehicles. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE (2015)Google Scholar
  15. 15.
    Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)Google Scholar
  16. 16.
    Moulon, P., Monasse, P., Marlet, R., et al.: OpenMVG: an open multiple view geometry library (2013)Google Scholar
  17. 17.
    Namdev, R.K., Krishna, K.M., Jawahar, C.V.: Multibody VSLAM with relative scale solution for curvilinear motion reconstruction. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE (2013)Google Scholar
  18. 18.
    Ozden, K.E., Cornelis, K., Eycken, L.V., Gool, L.J.V.: Reconstructing 3D trajectories of independently moving objects using generic constraints. Comput. Vis. Image Underst. 96(3), 453–471 (2004)CrossRefGoogle Scholar
  19. 19.
    Park, H.S., Shiratori, T., Matthews, I., Sheikh, Y.: 3D trajectory reconstruction under perspective projection. Int. J. Comput. Vis. 115(2), 115–135 (2015)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_7CrossRefGoogle Scholar
  21. 21.
    Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016)Google Scholar
  22. 22.
    Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_31CrossRefGoogle Scholar
  23. 23.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016)Google Scholar
  24. 24.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 39(4), 640–651 (2017)CrossRefGoogle Scholar
  25. 25.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)CrossRefGoogle Scholar
  26. 26.
    Song, S., Chandraker, M., Guest, C.C.: High accuracy monocular SFM and scale correction for autonomous driving. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 38(4), 730–743 (2016)CrossRefGoogle Scholar
  27. 27.
    Sweeney, C.: Theia Multiview Geometry Library: Tutorial & Reference. University of California Santa Barbara (2014)Google Scholar
  28. 28.
    Tsirikoglou, A., Kronander, J., Wrenninge, M., Unger, J.: Procedural modeling and physically based rendering for synthetic data generation in automotive applications. CoRR (2017)Google Scholar
  29. 29.
    Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 13(4), 376–380 (1991)CrossRefGoogle Scholar
  30. 30.
    Wu, C.: VisualSFM: a visual structure from motion system (2011)Google Scholar
  31. 31.
    Yuan, C., Medioni, G.G.: 3D reconstruction of background and objects moving on ground plane viewed from a moving camera. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2006)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Fraunhofer IOSBEttlingenGermany
  2. 2.Karlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations