Analysis of the Effect of Sensors for End-to-End Machine Learning Odometry

  • Carlos Marquez Rodriguez-PeralEmail author
  • Dexmont PeñaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11134)


Accurate position and orientation estimations are essential for navigation in autonomous robots. Although it is a well studied problem, existing solutions rely on statistical filters, which usually require good parameter initialization or calibration and are computationally expensive. This paper addresses that problem by using an end-to-end machine learning approach. This work explores the incorporation of multiple sources of data (monocular RGB images and inertial data) to overcome the weaknesses of each source independently. Three different odometry approaches are proposed using CNNs and LSTMs and evaluated against the KITTI dataset and compared with other existing approaches. The obtained results show that the performance of the proposed approaches is similar to the state-of-the-art ones, outperforming some of them at a lower computational cost allowing their execution on resource constrained devices.


Navigation Visual Inertial Odometry Machine learning CNN LSTM 


  1. 1.
    Nistér, D., Naroditsky, O., Bergen, J.: Visual odometry. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, vol. 1, pp. I-I. IEEE (2004)Google Scholar
  2. 2.
    Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: dense 3d reconstruction in real-time. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 963–968. IEEE (2011)Google Scholar
  3. 3.
    Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
  4. 4.
    Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2320–2327. IEEE (2011)Google Scholar
  5. 5.
    Engel, J., Sturm, J., Cremers, D.: Semi-dense visual odometry for a monocular camera. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1456 (2013)Google Scholar
  6. 6.
    Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22. IEEE (2014)Google Scholar
  7. 7.
    Castle, R.O., Klein, G., Murray, D.W.: Combining monoslam with object recognition for scene augmentation using a wearable camera. Image Vis. Comput. 28(11), 1548–1556 (2010)CrossRefGoogle Scholar
  8. 8.
    Pillai, S., Leonard, J.: Monocular slam supported object recognition. arXiv preprint arXiv:1506.01732 (2015)
  9. 9.
    D’Alfonso, L., Lucia, W., Muraca, P., Pugliese, P.: Mobile robot localization via EKF and UKF: a comparison based on real data. Robot. Auton. Syst. 74, 122–127 (2015)CrossRefGoogle Scholar
  10. 10.
    Bleser, G., Stricker, D.: Advanced tracking through efficient image processing and visual-inertial sensor fusion. Comput. Graph. 33(1), 59–72 (2009)CrossRefGoogle Scholar
  11. 11.
    Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint Kalman filter for vision-aided inertial navigation. In: 2007 IEEE International Conference on Robotics and Automation, pp. 3565–3572. IEEE (2007)Google Scholar
  12. 12.
    Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)Google Scholar
  13. 13.
    Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)Google Scholar
  14. 14.
    Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.G.: Guided optical flow learning. arXiv preprint arXiv:1702.02295 (2017)
  15. 15.
    Raudies, F.: Optical flow. Accessed 26 June 2018
  16. 16.
    Wang, S., Clark, R., Wen, H., Trigoni, N.: Deepvo: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 2043–2050. IEEE (2017)Google Scholar
  17. 17.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  18. 18.
    Lobo, J., Dias, J.: Relative pose calibration between visual and inertial sensors. Int. J. Robot. Res. 26(6), 561–575 (2007)CrossRefGoogle Scholar
  19. 19.
    Petersen, A., Koch, R.: Video-based realtime IMU-camera calibration for robot navigation. In: Real-Time Image and Video Processing 2012, vol. 8437, p. 843706. International Society for Optics and Photonics (2012)Google Scholar
  20. 20.
    Weiss, S., Achtelik, M.W., Lynen, S., Chli, M., Siegwart, R.: Real-time onboard visual-inertial state estimation and self-calibration of MAVs in unknown environments. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), pp. 957–964. IEEE (2012)Google Scholar
  21. 21.
    Skog, I., Händel, P.: Calibration of a MEMS inertial measurement unit. In: XVII IMEKO World Congress, pp. 1–6 (2006)Google Scholar
  22. 22.
    Rambach, J.R., Tewari, A., Pagani, A., Stricker, D.: Learning to fuse: a deep learning approach to visual-inertial camera pose estimation. In: 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 71–76. IEEE (2016)Google Scholar
  23. 23.
    Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: visual-inertial odometry as a sequence-to-sequence learning problem. AAAI I, 3995–4001 (2017)Google Scholar
  24. 24.
    Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual-inertial odometry using nonlinear optimization. Int. J. Robot. Res. 34(3), 314–334 (2015)CrossRefGoogle Scholar
  25. 25.
    Eade, E.: Lie groups for 2d and 3d transformations, revised December 2013.
  26. 26.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  27. 27.
    Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence o 1/k\(^{\hat{}}2\). In: Doklady AN USSR, vol. 269, pp. 543–547 (1983)Google Scholar
  28. 28.
    Gron, A.: Hands-on machine learning with scikit-learn and tensorflow: concepts, tools, and techniques to build intelligent systems (2017)Google Scholar
  29. 29.
    Senior, A., Heigold, G., Yang, K., et al.: An empirical study of learning rates in deep neural networks for speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6724–6728. IEEE (2013)Google Scholar
  30. 30.
    Hu, J.S., Chen, M.Y.: A sliding-window visual-imu odometer based on tri-focal tensor geometry. In: 2014 IEEE international conference on Robotics and automation (ICRA), pp. 3963–3968. IEEE (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Intel Research and Development, Movidius GroupKildareIreland

Personalised recommendations