Deep Continuous Fusion for Multi-sensor 3D Object Detection

  • Ming LiangEmail author
  • Bin Yang
  • Shenlong Wang
  • Raquel Urtasun
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. Towards this goal, we design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution. Our proposed continuous fusion layer encode both discrete-state image features as well as continuous geometric information. This enables us to design a novel, reliable and efficient end-to-end learnable 3D object detector based on multiple sensors. Our experimental evaluation on both KITTI as well as a large scale 3D object detection benchmark shows significant improvements over the state of the art.


3D object detection Multi-sensor fusion Autonomous driving 


  1. 1.
    Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.: Learning shape correspondence with anisotropic convolutional neural networks. In: NIPS (2016)Google Scholar
  2. 2.
    Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Process. Mag. (2017)Google Scholar
  3. 3.
    Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3d object detection for autonomous driving. In: CVPR (2016)Google Scholar
  4. 4.
    Chen, X., et al.: 3d object proposals for accurate object class detection. In: NIPS (2015)Google Scholar
  5. 5.
    Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3d object proposals using stereo imagery for accurate object class detection. TPAMI (2017)Google Scholar
  6. 6.
    Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3d object detection network for autonomous driving. In: CVPR (2017)Google Scholar
  7. 7.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)Google Scholar
  8. 8.
    Du, X., Ang Jr, M.H., Karaman, S., Rus, D.: A general pipeline for 3d detection of vehicles. arXiv preprint arXiv:1803.00387 (2018)
  9. 9.
    Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: IROS (2015)Google Scholar
  10. 10.
    Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3deep: fast object detection in 3d point clouds using efficient convolutional neural networks. In: ICRA (2017)Google Scholar
  11. 11.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: CVPR (2012)Google Scholar
  12. 12.
    Girshick, R.: Fast R-CNN. In: ICCV (2015)Google Scholar
  13. 13.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  14. 14.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar
  15. 15.
    Gupta, S., Girshick, R., Arbeláez, P., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: ECCV (2014)Google Scholar
  16. 16.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  17. 17.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)Google Scholar
  18. 18.
    Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3d proposal generation and object detection from view aggregation. arXiv preprint arXiv:1712.02294 (2017)
  19. 19.
    Li, B.: 3d fully convolutional network for vehicle detection in point cloud. In: IROS (2017)Google Scholar
  20. 20.
    Li, B., Zhang, T., Xia, T.: Vehicle detection from 3d lidar using fully convolutional network. RSS (2016)Google Scholar
  21. 21.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)Google Scholar
  22. 22.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. ICCV (2017)Google Scholar
  23. 23.
    Liu, W., et al.: Ssd: single shot multibox detector. In: ECCV (2016)Google Scholar
  24. 24.
    Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: CVPR (2018)Google Scholar
  25. 25.
    Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model CNNs. In: CVPR (2017)Google Scholar
  26. 26.
    Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from RGB-D data. In: CVPR (2018)Google Scholar
  27. 27.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)Google Scholar
  28. 28.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space (2017)Google Scholar
  29. 29.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)Google Scholar
  30. 30.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  31. 31.
    Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. TNN (2009)Google Scholar
  32. 32.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: ECCV (2012)Google Scholar
  33. 33.
    Simonovsky, M., Komodakis, N.: Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: CVPR (2017)Google Scholar
  34. 34.
    Song, S., Xiao, J.: Sliding shapes for 3d object detection in depth images. In: ECCV (2014)Google Scholar
  35. 35.
    Song, S., Xiao, J.: Deep sliding shapes for amodal 3d object detection in RGB-D images. In: CVPR (2016)Google Scholar
  36. 36.
    Wang, S., Suo, S., Ma, W.C., Urtasun, R.: Deep parameteric convolutional neural networks. In: CVPR (2018)Google Scholar
  37. 37.
    Yang, B., Luo, W., Urtasun, R.: Pixor: real-time 3d object detection from point clouds. In: CVPR (2018)Google Scholar
  38. 38.
    Yu, S.L., Westfechtel, T., Hamada, R., Ohno, K., Tadokoro, S.: Vehicle detection and localization on bird’s eye view elevation images using convolutional neural network. In: SSRR (2017)Google Scholar
  39. 39.
    Zhou, Y., Tuzel, O.: Voxelnet: end-to-end learning for point cloud based 3d object detection. In: CVPR (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Ming Liang
    • 1
    Email author
  • Bin Yang
    • 1
    • 2
  • Shenlong Wang
    • 1
    • 2
  • Raquel Urtasun
    • 1
    • 2
  1. 1.Uber Advanced Technologies GroupPittsburghUSA
  2. 2.University of TorontoTorontoCanada

Personalised recommendations