Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds

  • Martin SimonEmail author
  • Stefan Milz
  • Karl Amende
  • Horst-Michael Gross
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11129)


Lidar based 3D object detection is inevitable for autonomous driving, because it directly links to environmental understanding and therefore builds the base for prediction and motion planning. The capacity of inferencing highly sparse 3D data in real-time is an ill-posed problem for lots of other application areas besides automated vehicles, e.g. augmented reality, personal robotics or industrial automation. We introduce Complex-YOLO, a state of the art real-time 3D object detection network on point clouds only. In this work, we describe a network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space. Thus, we propose a specific Euler-Region-Proposal Network (E-RPN) to estimate the pose of the object by adding an imaginary and a real fraction to the regression network. This ends up in a closed complex space and avoids singularities, which occur by single angle estimations. The E-RPN supports to generalize well during training. Our experiments on the KITTI benchmark suite show that we outperform current leading methods for 3D object detection specifically in terms of efficiency. We achieve state of the art results for cars, pedestrians and cyclists by being more than five times faster than the fastest competitor. Further, our model is capable of estimating all eight KITTI-classes, including Vans, Trucks or sitting pedestrians simultaneously with high accuracy.


3D object detection Point cloud processing Lidar Autonomous driving 



First, we would like to thank our main employer Valeo, especially Jörg Schrepfer and Johannes Petzold, for giving us the possibility to do fundamental research. Additionally, we would like to thank our colleague Maximillian Jaritz for his important contribution on voxel generation. Last but not least, we would like to thank our academic partner the TU-Ilmenau for having a fruitful partnership.


  1. 1.
    Geiger, A.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) CVPR 2012, pp. 3354–3361. IEEE Computer Society, Washington (2012)Google Scholar
  2. 2.
    Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. CoRR abs/1611.07759 (2016)Google Scholar
  3. 3.
    Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. CoRR abs/1711.06396 (2017)Google Scholar
  4. 4.
    Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., Posner, I.: Vote3Deep: fast object detection in 3D point clouds using efficient convolutional neural networks. CoRR abs/1609.06666 (2016)Google Scholar
  5. 5.
    Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum PointNets for 3D object detection from RGB-D data. CoRR abs/1711.08488 (2017)Google Scholar
  6. 6.
    Wang, D.Z., Posner, I.: Voting for voting in online point cloud object detection. In: Proceedings of Robotics: Science and Systems, Rome July 2015Google Scholar
  7. 7.
    Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.: Joint 3D proposal generation and object detection from view aggregation. arXiv preprint arXiv:1712.02294 (2017)
  8. 8.
    Li, B., Zhang, T., Xia, T.: Vehicle detection from 3D lidar using fully convolutional network. CoRR abs/1608.07916 (2016)Google Scholar
  9. 9.
    Li, B.: 3D fully convolutional network for vehicle detection in point cloud. CoRR abs/1611.08069 (2016)Google Scholar
  10. 10.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. CoRR abs/1612.00593 (2016)Google Scholar
  11. 11.
    Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: beep hierarchical feature learning on point sets in a metric space. CoRR abs/1706.02413 (2017)Google Scholar
  12. 12.
    Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. CoRR abs/1506.02640 (2015)Google Scholar
  13. 13.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016)Google Scholar
  14. 14.
    Liu, W., et al.: SSD: single shot multibox detector. CoRR abs/1512.02325 (2015)Google Scholar
  15. 15.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015)Google Scholar
  16. 16.
    Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. CoRR abs/1607.07155 (2016)Google Scholar
  17. 17.
    Ren, J.S.J., et al.: Accurate single stage detector using recurrent rolling convolution. CoRR abs/1704.05776 (2017)Google Scholar
  18. 18.
    Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: IEEE CVPR (2016)Google Scholar
  19. 19.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013)Google Scholar
  20. 20.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)Google Scholar
  21. 21.
    Chen, X., Kundu, K., Zhu, Y., Ma, H., Fidler, S., Urtasun, R.: 3D object proposals using stereo imagery for accurate object class detection. CoRR abs/1608.07711 (2016)Google Scholar
  22. 22.
    Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015)Google Scholar
  23. 23.
    Li, Y., Bu, R., Sun, M., Chen, B.: PointCNN (2018)Google Scholar
  24. 24.
    Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds (2018)Google Scholar
  25. 25.
    Xiang, Y., Choi, W., Lin, Y., Savarese, S.: Data-driven 3D voxel patterns for object category recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  26. 26.
    Wu, Z., Song, S., Khosla, A., Tang, X., Xiao, J.: 3D shapenets for 2.5D object recognition and next-best-view prediction. CoRR abs/1406.5670 (2014)Google Scholar
  27. 27.
    Beyer, L., Hermans, A., Leibe, B.: Biternion nets: continuous head pose regression from discrete training labels. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 157–168. Springer, Cham (2015). Scholar
  28. 28.
    Redmon, J.: DarkNet: open source neural networks in c. (2013–2016)
  29. 29.
    Chen, X., et al.: 3D object proposals for accurate object class detection. In: NIPS (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Martin Simon
    • 1
    • 2
    Email author
  • Stefan Milz
    • 1
  • Karl Amende
    • 1
    • 2
  • Horst-Michael Gross
    • 2
  1. 1.Valeo Schalter und Sensoren GmbHBietigheim-BissingenGermany
  2. 2.Ilmenau University of TechnologyIlmenauGermany

Personalised recommendations