Advertisement

Structure Aware SLAM Using Quadrics and Planes

  • Mehdi HosseinzadehEmail author
  • Yasir Latif
  • Trung Pham
  • Niko Suenderhauf
  • Ian Reid
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11363)

Abstract

Simultaneous Localization And Mapping (SLAM) is a fundamental problem in mobile robotics. While point-based SLAM methods provide accurate camera localization, the generated maps lack semantic information. On the other hand, state of the art object detection methods provide rich information about entities present in the scene from a single image. This work marries the two and proposes a method for representing generic objects as quadrics which allows object detections to be seamlessly integrated in a SLAM framework. For scene coverage, additional dominant planar structures are modeled as infinite planes. Experiments show that the proposed points-planes-quadrics representation can easily incorporate Manhattan and object affordance constraints, greatly improving camera localization and leading to semantically meaningful maps.

Keywords

Visual semantic SLAM Object SLAM Planes Quadrics 

Supplementary material

Supplementary material 1 (mp4 39976 KB)

484517_1_En_26_MOESM2_ESM.pdf (4.7 mb)
Supplementary material 2 (pdf 4841 KB)

References

  1. 1.
    Bao, S.Y., Bagra, M., Chao, Y.W., Savarese, S.: Semantic structure from motion with points, regions, and objects. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  2. 2.
    Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)CrossRefGoogle Scholar
  3. 3.
    Cross, G., Zisserman, A.: Quadric reconstruction from dual-space geometry. In: 1998 Sixth International Conference on Computer Vision, pp. 25–31. IEEE (1998)Google Scholar
  4. 4.
    Dellaert, F., Kaess, M.: Factor graphs for robot perception. Found. Trends Robot. 6(1–2), 1–139 (2017).  https://doi.org/10.1561/2300000043CrossRefGoogle Scholar
  5. 5.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 2650–2658 (2015).  https://doi.org/10.1109/ICCV.2015.304
  6. 6.
    Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40, 611–625 (2017)CrossRefGoogle Scholar
  7. 7.
    Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_54CrossRefGoogle Scholar
  8. 8.
    Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)Google Scholar
  9. 9.
    Gálvez-López, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)CrossRefGoogle Scholar
  10. 10.
    Gay, P., Bansal, V., Rubino, C., Bue, A.D.: Probabilistic structure from motion with objects (PSfMO). In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3094–3103, October 2017.  https://doi.org/10.1109/ICCV.2017.334
  11. 11.
    Gee, A.P., Mayol-Cuevas, W.: Real-time model-based SLAM using line segments. In: Bebis, G., et al. (eds.) ISVC 2006. LNCS, vol. 4292, pp. 354–363. Springer, Heidelberg (2006).  https://doi.org/10.1007/11919629_37CrossRefGoogle Scholar
  12. 12.
    Gomez-Ojeda, R., Moreno, F.A., Scaramuzza, D., Gonzalez-Jimenez, J.: PL-SLAM: a stereo SLAM system through the combination of points and line segments. arXiv preprint arXiv:1705.09479 (2017)
  13. 13.
    Grisetti, G., Kummerle, R., Stachniss, C., Burgard, W.: A tutorial on graph-based SLAM. IEEE Intell. Transp. Syst. Mag. 2(4), 31–43 (2010)CrossRefGoogle Scholar
  14. 14.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, New York (2003)zbMATHGoogle Scholar
  15. 15.
    Kaess, M.: Simultaneous localization and mapping with infinite planes. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 4605–4611. IEEE (2015)Google Scholar
  16. 16.
    Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: g\(^2\)o: a general framework for graph optimization. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 3607–3613. IEEE (2011)Google Scholar
  17. 17.
    Lemaire, T., Lacroix, S.: Monocular-vision based SLAM using line segments. In: 2007 IEEE International Conference on Robotics and Automation, pp. 2791–2796. IEEE (2007)Google Scholar
  18. 18.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  19. 19.
    McCormac, J., Handa, A., Davison, A., Leutenegger, S.: SemanticFusion: dense 3D semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4628–4635. IEEE (2017)Google Scholar
  20. 20.
    Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
  21. 21.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_54CrossRefGoogle Scholar
  22. 22.
    Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: 2011 10th IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136. IEEE (2011)Google Scholar
  23. 23.
    Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2320–2327. IEEE (2011)Google Scholar
  24. 24.
    Prisacariu, V.A., et al.: InfiniTAM v3: a framework for large-scale 3D reconstruction with loop closure. arXiv preprint arXiv:1708.00783 (2017)
  25. 25.
    Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A., Moreno-Noguer, F.: PL-SLAM: real-time monocular visual SLAM with points and lines. In: Proceedings of the International Conference on Robotics and Automation (ICRA). IEEE (2017)Google Scholar
  26. 26.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  27. 27.
    Rubino, C., Crocco, M., Bue, A.D.: 3D object localisation from multi-view image detections. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2018).  https://doi.org/10.1109/TPAMI.2017.2701373
  28. 28.
    Salas-Moreno, R.F., Glocken, B., Kelly, P.H.J., Davison, A.J.: Dense planar SLAM. In: 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 157–164, September 2014.  https://doi.org/10.1109/ISMAR.2014.6948422
  29. 29.
    Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2013, pp. 1352–1359 (2013).  https://doi.org/10.1109/CVPR.2013.178
  30. 30.
    Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: Proceedings of the International Conference on Intelligent Robot Systems (IROS), October 2012Google Scholar
  31. 31.
    Sünderhauf, N., Milford, M.: Dual quadrics from object detection boundingboxes as landmark representations in SLAM. Preprints arXiv:1708.00965, August 2017
  32. 32.
    Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5079–5085. IEEE (2017)Google Scholar
  33. 33.
    Taguchi, Y., Jian, Y.D., Ramalingam, S., Feng, C.: Point-plane SLAM for hand-held 3D sensors. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 5182–5189. IEEE (2013)Google Scholar
  34. 34.
    Trevor, A., Gedikli, S., Rusu, R., Christensen, H.: Efficient organized point cloud segmentation with connected components. In: 3rd Workshop on Semantic Perception Mapping and Exploration (SPME), Karlsruhe, Germany (2013)Google Scholar
  35. 35.
    Yang, S., Song, Y., Kaess, M., Scherer, S.: Pop-up SLAM: semantic monocular plane SLAM for low-texture environments. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1222–1229. IEEE (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mehdi Hosseinzadeh
    • 1
    • 3
    Email author
  • Yasir Latif
    • 1
    • 3
  • Trung Pham
    • 4
  • Niko Suenderhauf
    • 2
    • 3
  • Ian Reid
    • 1
    • 3
  1. 1.The University of AdelaideAdelaideAustralia
  2. 2.Queensland University of TechnologyBrisbaneAustralia
  3. 3.Australian Centre for Robotic VisionBrisbaneAustralia
  4. 4.NVIDIASanta ClaraUSA

Personalised recommendations