Scale Estimation of Monocular SfM for a Multi-modal Stereo Camera

  • Shinya SumikuraEmail author
  • Ken Sakurada
  • Nobuo Kawaguchi
  • Ryosuke Nakamura
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11363)


This paper proposes a novel method of estimating the absolute scale of monocular SfM for a multi-modal stereo camera. In the fields of computer vision and robotics, scale estimation for monocular SfM has been widely investigated in order to simplify systems. This paper addresses the scale estimation problem for a stereo camera system in which two cameras capture different spectral images (e.g., RGB and FIR), whose feature points are difficult to directly match using descriptors. Furthermore, the number of matching points between FIR images can be comparatively small, owing to the low resolution and lack of thermal scene texture. To cope with these difficulties, the proposed method estimates the scale parameter using batch optimization, based on the epipolar constraint of a small number of feature correspondences between the invisible light images. The accuracy and numerical stability of the proposed method are verified by synthetic and real image experiments.



This research is supported by the Hori Sciences & Arts Foundation, the New Energy and Industrial Technology Development Organization (NEDO) and JSPS KAKENHI Grant Number 18K18071.

Supplementary material

484517_1_En_18_MOESM1_ESM.pdf (14.7 mb)
Supplementary material 1 (pdf 15031 KB)

Supplementary material 2 (mp4 30669 KB)


  1. 1.
    Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: International Conference on Computer Vision (ICCV), pp. 72–79 (2009)Google Scholar
  2. 2.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). Scholar
  3. 3.
    Bertozzi, M., Broggi, A., Caraffi, C., Rose, M.D., Felisa, M., Vezzoni, G.: Pedestrian detection by means of far-infrared stereo vision. Comput. Vis. Image Underst. 106(2), 194–204 (2007)CrossRefGoogle Scholar
  4. 4.
    Clipp, B., Kim, J.H., Frahm, J.M., Pollefeys, M., Hartley, R.: Robust 6DOF motion estimation for non-overlapping, multi-camera systems. In: IEEE Workshop on Applications of Computer Vision (WACV) (2008)Google Scholar
  5. 5.
    Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. Trans. Pattern Anal. Mach. Intell. (TPAMI) 29(6), 1052–1067 (2007)CrossRefGoogle Scholar
  6. 6.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: Toward geometric deep SLAM. arXiv preprint arXiv:1707.07410 (2017)
  7. 7.
    Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. Trans. Pattern Anal. Mach. Intell. (TPAMI) 32(8), 1362–1376 (2010)CrossRefGoogle Scholar
  8. 8.
    Ham, Y., Golparvar-Fard, M.: An automated vision-based method for rapid 3D energy performance modeling of existing buildings using thermal and digital imagery. Adv. Eng. Inform. 27(3), 395–409 (2013)CrossRefGoogle Scholar
  9. 9.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3279–3286 (2015)Google Scholar
  10. 10.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN 0521540518CrossRefGoogle Scholar
  11. 11.
    Iwaszczuk, D., Stilla, U.: Camera pose refinement by matching uncertain 3D building models with thermal infrared image sequences for high quality texture extraction. ISPRS J. Photogramm. Remote. Sens. 132, 33–47 (2017)CrossRefGoogle Scholar
  12. 12.
    Jancosek, M., Pajdla, T.: Multi-view reconstruction preserving weakly-supported surfaces. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3121–3128 (2011)Google Scholar
  13. 13.
    Kitt, B.M., Rehder, J., Chambers, A.D., Schonbein, M., Lategahn, H., Singh, S.: Monocular visual odometry using a planar road model to solve scale ambiguity. In: European Conference on Mobile Robots (2011)Google Scholar
  14. 14.
    Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 225–234 (2007)Google Scholar
  15. 15.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)CrossRefGoogle Scholar
  16. 16.
    Müller, A.O., Kroll, A.: Generating high fidelity 3-D thermograms with a handheld real-time thermal imaging system. IEEE Sens. J. 17(3), 774–783 (2017)CrossRefGoogle Scholar
  17. 17.
    Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: International Symposium on Mixed and Augmented Reality (ISMAR), pp. 127–136 (2011)Google Scholar
  18. 18.
    Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 756–770 (2004)CrossRefGoogle Scholar
  19. 19.
    Nützi, G., Weiss, S., Scaramuzza, D., Siegwart, R.: Fusion of IMU and vision for absolute scale estimation in monocular SLAM. J. Intell. Robot. Syst. 61(1), 287–299 (2011)CrossRefGoogle Scholar
  20. 20.
    Oreifej, O., Cramer, J., Zakhor, A.: Automatic generation of 3D thermal maps of building interiors. ASHRAE Trans. 120, C1 (2014)Google Scholar
  21. 21.
    Phuc Truong, T., Yamaguchi, M., Mori, S., Nozick, V., Saito, H.: Registration of RGB and thermal point clouds generated by structure from motion. In: International Conference on Computer Vision Workshop (ICCVW) (2017)Google Scholar
  22. 22.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV), pp. 2564–2571 (2011)Google Scholar
  23. 23.
    Scaramuzza, D., Fraundorfer, F., Pollefeys, M., Siegwart, R.: Absolute scale in structure from motion from a single vehicle mounted camera by exploiting nonholonomic constraints. In: International Conference on Computer Vision (ICCV), pp. 1413–1419 (2009)Google Scholar
  24. 24.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113 (2016)Google Scholar
  25. 25.
    Schönberger, J.L., Zheng, E., Frahm, J.-M., Pollefeys, M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 501–518. Springer, Cham (2016). Scholar
  26. 26.
    Stewénius, H., Engels, C., Nistér, D.: Recent developments on direct relative orientation. ISPRS J. Photogramm. Remote Sens. 60, 284–294 (2006)CrossRefGoogle Scholar
  27. 27.
    Thiele, S.T., Varley, N., James, M.R.: Thermal photogrammetric imaging: a new technique for monitoring dome eruptions. J. Volcanol. Geotherm. Res. 337(Suppl. C), 140–145 (2017)CrossRefGoogle Scholar
  28. 28.
    Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). Scholar
  29. 29.
    Vidas, S., Moghadam, P., Bosse, M.: 3D thermal mapping of building interiors using an RGB-D and thermal camera. In: International Conference on Robotics and Automation (ICRA), pp. 2311–2318 (2013)Google Scholar
  30. 30.
    Weinmann, M., Leitloff, J., Hoegner, L., Jutzi, B., Stilla, U., Hinz, S.: Thermal 3D mapping for object detection in dynamic scenes. ISPRS Ann. Photogramm. Remote. Sens. Spat. Inf. Sci. 2(1), 53 (2014)CrossRefGoogle Scholar
  31. 31.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361 (2015)Google Scholar
  32. 32.
    Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 22, 1330–1334 (2000)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Shinya Sumikura
    • 1
    Email author
  • Ken Sakurada
    • 2
  • Nobuo Kawaguchi
    • 1
  • Ryosuke Nakamura
    • 2
  1. 1.Nagoya UniversityNagoyaJapan
  2. 2.National Institute of Advanced Industrial Science and TechnologyTokyoJapan

Personalised recommendations