Advertisement

RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets

  • Vassileios BalntasEmail author
  • Shuda Li
  • Victor Prisacariu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11218)

Abstract

We propose a method of learning suitable convolutional representations for camera pose retrieval based on nearest neighbour matching and continuous metric learning-based feature descriptors. We introduce information from camera frusta overlaps between pairs of images to optimise our feature embedding network. Thus, the final camera pose descriptor differences represent camera pose changes. In addition, we build a pose regressor that is trained with a geometric loss to infer finer relative poses between a query and nearest neighbour images. Experiments show that our method is able to generalise in a meaningful way, and outperforms related methods across several experiments.

Notes

Acknowledgments

We gratefully acknowledge the Huawei Innovation Research Program (HIRP) FLAGSHIP grant and the European Commission Project Multiple-actOrs Virtual Empathic CARegiver for the Elder (MoveCare) for financially supporting the authors for this work.

References

  1. 1.
    Arun, S.K., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Machine Intell. (PAMI) 9, 698–700 (1987)CrossRefGoogle Scholar
  2. 2.
    Hinterstoisser, S., Lepetit, V., Rajkumar, N., Konolige, K.: Going further with point pair features. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 834–848. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_51CrossRefGoogle Scholar
  3. 3.
    Balntas, V., Doumanoglou, A., Sahin, C., Sock, J., Kouskouridas, R., Kim, T.-K.: Pose guided RGB-D feature learning for 3D object pose estimation. In: Proceedings of International Conference on Computer Vision (ICCV) (2017)Google Scholar
  4. 4.
    Cadena, C., et al.: Simultaneous localization and mapping: present, future, and the robust-perception age. IEEE Trans. Robot. (ToR), 1–27 (2016)Google Scholar
  5. 5.
    Cavallari, T., Golodetz, S., Lord, N.A., Valentin, J., Di Stefano, L., Torr, P.H.: On-the-fly adaptation of regression forests for online camera relocalisation. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  6. 6.
    Chekhlov, D., Pupilli, M., Mayol, W., Calway, A.: Robust real-time visual SLAM using scale prediction and exemplar based feature description. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2007)Google Scholar
  7. 7.
    Clark, R., Wang, S., Markham, A., Trigoni, N., Wen, H.: 6-DoF video-clip relocalization. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  8. 8.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  9. 9.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  10. 10.
    Doumanoglou, A., Balntas, V., Kouskouridas, R., Kim, T.: Siamese regression networks with efficient mid-level feature extraction for 3D object pose estimation. arXiv preprint arXiv:1607.02257 (2016)
  11. 11.
    Drost, B., Ulrich, M., Navab, N., Ilic, S.: Model globally, match locally: efficient and robust 3D object recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 998–1005 (2010)Google Scholar
  12. 12.
    Eade, E.: Lie Groups for 2D and 3D Transformations. Technical report, University of Cambridge (2017)Google Scholar
  13. 13.
    Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_54CrossRefGoogle Scholar
  14. 14.
    Galvez-Lopez, D., Tardos, J.D.: Bags of binary words for fast place recognition in image sequences. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 28, pp. 1188–1197 (2012)CrossRefGoogle Scholar
  15. 15.
    Gee, A., Mayol-Cuevas, W.: 6D relocalisation for RGBD cameras using synthetic view regression. In: Proceedings of British Machine Vision Conference (BMVC) (2012)Google Scholar
  16. 16.
    Glocker, B., Izadi, S., Shotton, J., Criminisi, A.: Real-time RGB-D camera relocalization. In: Proceedings of IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR), vol. 21, pp. 571–583 (2013)Google Scholar
  17. 17.
    Guzman-Rivera, A., et al.: Multi-output learning for camera relocalization. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  18. 18.
    Handa, A., Bloesch, M., Patraucean, V., Stent, S., McCormac, J., Davison, A.: GVNN: neural network library for geometric computer vision. In: Proceedings of the European Conference on Computer Vision Workshops (2016)Google Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  20. 20.
    Horn, B.K.: Closed-form solution of absolute orientation using unit quaternions. J. Opt. Soc. Am. A 4, 629–642 (1986)CrossRefGoogle Scholar
  21. 21.
    Huang, A.S., et al.: Visual odometry and mapping for autonomous flight using an RGB-D camera. In: Proceedings of International Symposium on Robotics Research (ISRR) (2011)Google Scholar
  22. 22.
    Kähler, O., Prisacariu, V.A., Murray, D.W.: Real-time large-scale dense 3D reconstruction with loop closure. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 500–516. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_30CrossRefGoogle Scholar
  23. 23.
    Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 4762–4769 (2016)Google Scholar
  24. 24.
    Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6555–6564 (2017)Google Scholar
  25. 25.
    Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2938–2946 (2015)Google Scholar
  26. 26.
    Kengo, H., Satoko, T., Toru, T., Bisser, R., Kazufumi, K., Toshiyuki, A.: Comparison of 3 DOF pose representations for pose estimations. In: Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV) (2010)Google Scholar
  27. 27.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)Google Scholar
  28. 28.
    Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. arXiv preprint arXiv:1707.09733 (2017)
  29. 29.
    Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Intl. J. Comput. Vis. (IJCV) 81, 155–166 (2009)CrossRefGoogle Scholar
  30. 30.
    Li, S., Calway, A.: RGBD relocalisation using pairwise geometry and concise key point sets. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2015)Google Scholar
  31. 31.
    Li, S., Calway, A.: Absolute pose estimation using multiple forms of correspondences from RGB-D frames. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 4756–4761 (2016)Google Scholar
  32. 32.
    Li, S., Xu, C., Xie, M.: A robust O(n) solution to the perspective-n-point problem. IEEE Trans. Pattern Anal. Machine Intell. (PAMI) 34, 1444–1450 (2012)CrossRefGoogle Scholar
  33. 33.
    Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 494–495 (2017)Google Scholar
  34. 34.
    Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. In: Proceedings of British Machine Vision Conference (BMVC), pp. 91.1–91.12 (2016)Google Scholar
  35. 35.
    Micheals, R.J., Boult, T.E.: On the robustness of absolute orientation. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA) (2000)Google Scholar
  36. 36.
    Moo Yi, K., Verdie, Y., Fua, P., Lepetit, V.: Learning to assign orientations to feature points. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 107–116 (2016)Google Scholar
  37. 37.
    Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. (ToR) 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
  38. 38.
    Prisacariu, V.A., et al.: InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure. arXiv preprint arXiv:1708.00783 (2017)
  39. 39.
    Saeedi, S., Trentini, M., Li, H., Seto, M.: Multiple-robot simultaneous localization and mapping - a review. J. Field Robot. (2015)Google Scholar
  40. 40.
    Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H., Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1352–1359 (2013)Google Scholar
  41. 41.
    Shinji, U.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Machine Intell. (PAMI) 13(4), 376–380 (1991)CrossRefGoogle Scholar
  42. 42.
    Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2930–2937 (2013)Google Scholar
  43. 43.
    Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)
  44. 44.
    Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5622–5631 (2017)Google Scholar
  45. 45.
    Valentin, J., Fitzgibbon, A., Nießner, M., Shotton, J., Torr, P.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  46. 46.
    Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based Localization with Spatial LSTMs. arXiv preprint arXiv:1611.07890 (2016)
  47. 47.
    Zheng, Y., Kuang, Y., Sugimoto, S., Astrom, K., Okutomi, M.: Revisiting the PnP problem: a fast, general and optimal solution. In: Proceedings of International Conference on Computer Vision (ICCV), pp. 2344–2351 (2013)Google Scholar
  48. 48.
    Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Machine Intell. (PAMI) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Vassileios Balntas
    • 1
    Email author
  • Shuda Li
    • 1
  • Victor Prisacariu
    • 1
  1. 1.Active Vision LabUniversity of OxfordOxfordUK

Personalised recommendations