Autonomous Robots

, Volume 41, Issue 6, pp 1423–1445 | Cite as

Large-scale, real-time 3D scene reconstruction on a mobile device

  • Ivan Dryanovski
  • Matthew Klingensmith
  • Siddhartha S. Srinivasa
  • Jizhong XiaoEmail author
Part of the following topical collections:
  1. Special Issue on "Robotics: Science and Systems"


Google’s Project Tango has made integrated depth sensing and onboard visual-intertial odometry available to mobile devices such as phones and tablets. In this work, we explore the problem of large-scale, real-time 3D reconstruction on a mobile devices of this type. Solving this problem is a necessary prerequisite for many indoor applications, including navigation, augmented reality and building scanning. The main challenges include dealing with noisy and low-frequency depth data and managing limited computational and memory resources. State of the art approaches in large-scale dense reconstruction require large amounts of memory and high-performance GPU computing. Other existing 3D reconstruction approaches on mobile devices either only build a sparse reconstruction, offload their computation to other devices, or require long post-processing to extract the geometric mesh. In contrast, we can reconstruct and render a global mesh on the fly, using only the mobile device’s CPU, in very large (300 m\(^2\)) scenes, at a resolutions of 2–3 cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing. Furthermore, we describe an on-device post-processing method for fusing datasets from multiple, independent trials, in order to improve the quality and coverage of the reconstruction. We discuss how the particularities of the devices impact our algorithm and implementation decisions. Finally, we provide both qualitative and quantitative results on publicly available RGB-D datasets, and on datasets collected in real-time from two devices.


3D reconstruction Mobile technology SLAM Computer vision Mapping Pose estimation 



This work was done with the support of Googles Advanced Technologies and Projects division (ATAP) for Project Tango. The authors thank to Johnny Lee, Joel Hesch, Esha Nerurkar, Simon Lynen, Ryan Hickman and other ATAP members for their close collaboration and support on this project.

Supplementary material

Supplementary material 1 (mp4 215717 KB)


  1. Amanatides, J., & Woo, A. (1987). A fast voxel traversal algorithm for ray tracing. Eurographics, 87, 3–10.Google Scholar
  2. Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers D. (2013). Real-time camera tracking and 3D reconstruction using signed distance functions. In Robotics: Science and systems (RSS) conference 2013.Google Scholar
  3. Chen, J., Bautembach, D., & Izadi, S. (2013). Scalable real-time volumetric surface reconstruction. ACM Transactions on Graphics (TOG), 32(4), 113.zbMATHGoogle Scholar
  4. Chen, Y., & Medioni, G. (1991, April). Object modeling by registration of multiple range images. In Proceedings., 1991 IEEE international conference on robotics and automation (Vol. 3, pp. 2724 –2729).Google Scholar
  5. Chilimbi, T. M., Hill, M. D., & Larus, J. R. (2000). Making pointer-based data structures cache conscious. Computer, 33(12), 67–74.CrossRefGoogle Scholar
  6. Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In SIGGRAPH 96 conference proceedings (pp. 303–312). ACM.Google Scholar
  7. Elfes, A. (1989). Using occupancy grids for mobile robot perception and navigation. Computer, 22, 46–57.CrossRefGoogle Scholar
  8. Engel, J., Schöps, T., & Cremers, D. (2014, September). LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision (ECCV).Google Scholar
  9. Garland, M., & Heckbert, P. S. (1997). Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on computer graphics and interactive techniques (pp. 209–216). ACM Press/Addison-Wesley Publishing Co.Google Scholar
  10. Google. Project Tango (2014).
  11. Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2015). Scenenet: Understanding real world indoor scenes with synthetic data. In CoRR. arXiv:1511.07041.
  12. Hesch, J. A., Kottas, D. G., Bowman, Sean L., & Roumeliotis, S. I. (2014). Camera-IMU-based localization: Observability analysis and consistency improvement. The International Journal of Robotics Research, 33(1), 182–201.CrossRefGoogle Scholar
  13. Kähler, O., Prisacariu, V. A., Ren, C. Y., Sun, X., Torr, P. H. S., & Murray, D. W. (2015). Very high frame rate volumetric integration of depth images on mobile devices. IEEE Transactions on Visualization and Computer Graphics, 21(11), 1241–1250.CrossRefGoogle Scholar
  14. Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality, ISMAR.Google Scholar
  15. Klingensmith, M., Dryanovski, I., Srinivasa, S., & Xiao, J. (2015, July). Chisel: Real time large scale 3d reconstruction onboard a mobile device using spatially hashed signed distance fields. In Proceedings of robotics: Science and systems, Rome.Google Scholar
  16. Klingensmith, M., Herrmann, M., & Srinivasa, S. S. (2014). Object modeling and recognition from sparse: Noisy data via voxel depth carving. In ISER, number d.Google Scholar
  17. Lepetit, V., Moreno-Noguer, F., & Fua, P. (2009). Epnp: An accurate o (n) solution to the PnP problem. International Journal of Computer Vision, 81(2), 155–166.CrossRefGoogle Scholar
  18. Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In SIGGRAPH 1987, (Vol. 21 pp. 163–169). ACM.Google Scholar
  19. Lynen, S., Bosse, M., Furgale, P., & Siegwart, R. (2014). Placeless place-recognition. In 2nd international conference on 3D vision (3DV) Google Scholar
  20. Microsoft. Kinect for Windows.
  21. Mourikis, A. I., & Roumeliotis, S. I. (2007). A multi-state constraint Kalman filter for vision-aided inertial navigation. In 2007 IEEE international conference on robotics and automation.Google Scholar
  22. Nerurkar, E. D., Wu, K. J., & Roumeliotis, S. I. (2014). C-KLAM: Constrained keyframe-based localization and mapping. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 3638–3643).Google Scholar
  23. Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., & Davison, A. J. Pushmeet K., Jamie S., Steve H., & Andrew F. (2011) KinectFusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality, ISMAR 2011 (pp. 127–136).Google Scholar
  24. Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. 2011 IEEE international conference on computer vision (ICCV).Google Scholar
  25. Nguyen, C. V., Izadi, S., & Lovell, D. (2012). Modeling kinect sensor noise for improved 3D reconstruction and tracking. In Proceedings—2nd joint 3DIM/3DPVT conference: 3D imaging, modeling, processing, visualization and transmission, 3DIMPVT 2012 (pp. 524–530).Google Scholar
  26. Nieß ner, M., Zollhöfer, M., Izadi, S., & Stamminger, M. (2013). Real-time 3D reconstruction at scale using voxel hashing. In ACM transactions on graphics (TOG).Google Scholar
  27. Rusinkiewicz, S., Hall-Holt, O., & Levoy, M. (2002). Real-time 3D model acquisition. In ACM transactions on graphics (Vol. 21, pp. 438–446). ACMGoogle Scholar
  28. Scherzer, D., Wimmer, M., & Purgathofer, W. (2011). A survey of real-time hard shadow mapping methods. In Computer graphics forum (Vol. 30, pp. 169–186). Wiley Online Library.Google Scholar
  29. Schöps, T., Sattler, T., Häne, C., & Pollefeys, M. (2015). 3D modeling on the go: Interactive 3D reconstruction of large-scale scenes on mobile devices. In International conference on 3D vision (3DV).Google Scholar
  30. Structure Sensor.
  31. Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In IEEE international conference on intelligent robots and systems (pp. 573–580).Google Scholar
  32. Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., & Pollefeys, M. (2013). Live metric 3D reconstruction on mobile phones. In 2013 IEEE international conference on computer vision (pp. 65–72).Google Scholar
  33. Teschner, M., Hiedelberger, B., Müller, M., Pomeranets, D., & Gross, M. (2003). 2003. In: Vmv: Optimized spatial hashing for collision detection of deformable objects.Google Scholar
  34. Weise, T., Leibe, B., & Van Gool, L. (2008). Accurate and robust registration for in-hand modeling. In 26th IEEE conference on computer vision and pattern recognition, CVPR (pp. 1–8).Google Scholar
  35. Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A. J. (2015, July). ElasticFusion: Dense SLAM without a pose graph. In Robotics: Science and systems (RSS), Rome.Google Scholar
  36. Whelan, T., Johannsson, H., Kaess, M., Leonard, J. J., & McDonald, J. (2013). Robust real-time visual odometry for dense RGB-D mapping. In 2013 IEEE international conference on robotics and automation (ICRA).Google Scholar
  37. Whelan, T., & Kaess, M. (2013, November). Deformation-based loop closure for large scale dense RGB-D SLAM. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS), Tokyo.Google Scholar
  38. Wurm, K. M., Hornung, A., Bennewitz, M., Stachniss, C., & Burgard, W. (2010). OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proceedings of the ICRA 2010 workshop on best practice in 3D perception and modeling for mobile manipulation.Google Scholar
  39. Zeng, M., Zhao, F., Zheng, J., & Liu, X. (2013). Octree-based fusion for realtime 3D reconstruction. Graphical Models, 75(3), 126–136.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Computer Science, The Graduate CenterThe City University of New York (CUNY)New YorkUSA
  2. 2.Carnegie Mellon Robotics InstitutePittsburghUSA
  3. 3.Electrical Engineering DepartmentThe City College of New YorkNew YorkUSA

Personalised recommendations