Advertisement

Efficient and Robust 3D Object Reconstruction Based on Monocular SLAM and CNN Semantic Segmentation

  • Thomas WeberEmail author
  • Sergey TriputenEmail author
  • Atmaraaj Gopal
  • Steffen Eißler
  • Christian Höfert
  • Kristiaan Schreve
  • Matthias Rätsch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11531)

Abstract

Various applications implement slam technology, especially in the field of robot navigation. We show the advantage of slam technology for independent 3d object reconstruction. To receive a point cloud of every object of interest void of its environment, we leverage deep learning. We utilize recent cnn deep learning research for accurate semantic segmentation of objects. In this work, we propose two fusion methods for cnn-based semantic segmentation and slam for the 3d reconstruction of objects of interest in order to obtain a more robustness and efficiency. As a major novelty, we introduce a cnn-based masking to focus slam only on feature points belonging to every single object. Noisy, complex or even non-rigid features in the background are filtered out, improving the estimation of the camera pose and the 3d point cloud of each object. Our experiments are constrained to the reconstruction of industrial objects. We present an analysis of the accuracy and performance of each method and compare the two methods describing their pros and cons.

Keywords

3d reconstruction slam lsd-slam Monocular camera cnn Semantic segmentation Bin-picking Collaborative robot Depth estimation 

References

  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. CoRR abs/1511.00561 (2015). http://arxiv.org/abs/1511.00561
  2. 2.
    Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992).  https://doi.org/10.1109/34.121791CrossRefGoogle Scholar
  3. 3.
    Cignoni, P., Rocchini, C., Scopigno, R.: Metro: measuring error on simplified surfaces, vol. 17, pp. 167–174, July 1998CrossRefGoogle Scholar
  4. 4.
    Daniel, G.-M.: CloudCompare. http://www.cloudcompare.org/
  5. 5.
    Engel, J., Sturm, J., Cremers, D.: Semi-dense visual odometry for a monocular camera. In: 2013 IEEE International Conference on Computer Vision, pp. 1449–1456, December 2013.  https://doi.org/10.1109/ICCV.2013.183
  6. 6.
    Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_54CrossRefGoogle Scholar
  7. 7.
    Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22, May 2014.  https://doi.org/10.1109/ICRA.2014.6906584
  8. 8.
    Forster, C., Zhang, Z., Gassner, M., Werlberger, M., Scaramuzza, D.: SVO: semidirect visual odometry for monocular and multicamera systems. IEEE Trans. Rob. 33(2), 249–265 (2017).  https://doi.org/10.1109/TRO.2016.2623335CrossRefGoogle Scholar
  9. 9.
    Engel, J., Sturm, J., Cremers, D.: Camera-based navigation of a low-cost quadrocopter. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2815–2821, October 2012.  https://doi.org/10.1109/IROS.2012.6385458
  10. 10.
    Jafari, O.H., Groth, O., Kirillov, A., Yang, M.Y., Rother, C.: Analyzing modular CNN architectures for joint depth prediction and semantic segmentation. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4620–4627, May 2017.  https://doi.org/10.1109/ICRA.2017.7989537
  11. 11.
    Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pp. 225–234, November 2007.  https://doi.org/10.1109/ISMAR.2007.4538852
  12. 12.
    Klein, G., Murray, D.: Parallel tracking and mapping on a camera phone. In: 2009 8th IEEE International Symposium on Mixed and Augmented Reality, pp. 83–86, October 2009.  https://doi.org/10.1109/ISMAR.2009.5336495
  13. 13.
    Li, C., Yang, Y., Feng, M., Chakradhar, S., Zhou, H.: Optimizing memory efficiency for deep convolutional neural networks on GPUs. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 633–644, November 2016.  https://doi.org/10.1109/SC.2016.53
  14. 14.
    Lin, G., Milan, A., Shen, C., Reid, I.D.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. CoRR abs/1611.06612 (2016). http://arxiv.org/abs/1611.06612
  15. 15.
    Lin, T., et al.: Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312
  16. 16.
    Mur-Artal, R., Tardos, J.: ORB-SLAM: tracking and mapping recognizable features. In: Robotics: Science and Systems (RSS) Workshop on Multi View Geometry in Robotics (MVIGRO), July 2014Google Scholar
  17. 17.
    Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. CoRR abs/1610.06475 (2016). http://arxiv.org/abs/1610.06475
  18. 18.
    Pillai, S., Leonard, J.J.: Monocular SLAM supported object recognition. CoRR abs/1506.01732 (2015). http://arxiv.org/abs/1506.01732
  19. 19.
    Pizzoli, M., Forster, C., Scaramuzza, D.: Remode: probabilistic, monocular dense reconstruction in real time. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2609–2616, May 2014.  https://doi.org/10.1109/ICRA.2014.6907233
  20. 20.
    Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 317–324, February 2010.  https://doi.org/10.1109/PDP.2010.43
  21. 21.
    Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.G.: Multi-view convolutional neural networks for 3D shape recognition. CoRR abs/1505.00880 (2015). http://arxiv.org/abs/1505.00880
  22. 22.
    Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: real-time dense monocular SLAM with learned depth prediction. ArXiv e-prints April 2017
  23. 23.
    Triputen, S., Gopal, A., Weber, T., Hofert, C., Schreve, K., Rätsch, M.: Methodology to analyze the accuracy of 3D objects reconstructed with collaborative robot based monocular LSD-SLAM. CoRR abs/1803.02257 (2018). http://arxiv.org/abs/1803.02257
  24. 24.
    Zhang, Y., et al.: Physically-based rendering for indoor scene understanding using convolutional neural networks. CoRR abs/1612.07429 (2016). http://arxiv.org/abs/1612.07429

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Thomas Weber
    • 1
    Email author
  • Sergey Triputen
    • 1
    Email author
  • Atmaraaj Gopal
    • 1
  • Steffen Eißler
    • 1
  • Christian Höfert
    • 1
  • Kristiaan Schreve
    • 2
  • Matthias Rätsch
    • 1
  1. 1.Reutlingen UniversityReutlingenGermany
  2. 2.University of StellenboschStellenboschSouth Africa

Personalised recommendations