Advertisement

Deep Fundamental Matrix Estimation Without Correspondences

  • Omid PoursaeedEmail author
  • Guandao Yang
  • Aditya Prakash
  • Qiuren Fang
  • Hanqing Jiang
  • Bharath Hariharan
  • Serge Belongie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)

Abstract

Estimating fundamental matrices is a classic problem in computer vision. Traditional methods rely heavily on the correctness of estimated key-point correspondences, which can be noisy and unreliable. As a result, it is difficult for these methods to handle image pairs with large occlusion or significantly different camera poses. In this paper, we propose novel neural network architectures to estimate fundamental matrices in an end-to-end manner without relying on point correspondences. New modules and layers are introduced in order to preserve mathematical properties of the fundamental matrix as a homogeneous rank-2 matrix with seven degrees of freedom. We analyze performance of the proposed models using various metrics on the KITTI dataset, and show that they achieve competitive performance with traditional methods without the need for extracting correspondences.

Keywords

Fundamental matrix Epipolar geometry Deep learning Stereo 

References

  1. 1.
    Armangué, X., Salvi, J.: Overall view regarding fundamental matrix estimation. Image Vis. Comput. 21(2), 205–220 (2003)CrossRefGoogle Scholar
  2. 2.
    Brachmann, E., et al.: DSAC-differentiable RANSAC for camera localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 3 (2017)Google Scholar
  3. 3.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)CrossRefGoogle Scholar
  4. 4.
    Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Advances in Neural Information Processing Systems, pp. 2414–2422 (2016)Google Scholar
  5. 5.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)
  6. 6.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. arXiv preprint arXiv:1712.07629 (2017)
  7. 7.
    DeTone, D., Malisiewicz, T., Rabinovich, A.: Toward geometric deep SLAM. arXiv preprint arXiv:1707.07410 (2017)
  8. 8.
    Nowruzi, F.E., Laganiere, R., Japkowicz, N.: Homography estimation from image pairs with hierarchical convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 913–920 (2017)Google Scholar
  9. 9.
    Fathy, M.E., Hussein, A.S., Tolba, M.F.: Fundamental matrix estimation: a study of error criteria. Pattern Recognit. Lett. 32(2), 383–391 (2011)CrossRefGoogle Scholar
  10. 10.
    Fathy, M.E., Rotkowitz, M.C.: Essential matrix estimation using adaptive penalty formulations. J. Comput. Vis. 74(2), 117–136 (2007)CrossRefGoogle Scholar
  11. 11.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Garon, M., Lalonde, J.F.: Deep 6-DOF tracking. IEEE Trans. Vis. Comput. Graph. 23(11), 2410–2418 (2017)CrossRefGoogle Scholar
  13. 13.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the Kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  14. 14.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  15. 15.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  16. 16.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  17. 17.
    Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997)CrossRefGoogle Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  19. 19.
    Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, p. 4 (2017)Google Scholar
  20. 20.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  21. 21.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)Google Scholar
  22. 22.
    Ji, M., Gall, J., Zheng, H., Liu, Y., Fang, L.: SurfaceNet: an end-to-end 3D neural network for multiview stereopsis. arXiv preprint arXiv:1708.01749 (2017)
  23. 23.
    Kendall, A., Badrinarayanan, V., Cipolla, R.: Bayesian SegNet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680 (2015)
  24. 24.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks, pp. 1097–1105 (2012)Google Scholar
  25. 25.
    Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. arXiv preprint arXiv:1707.09733 (2017)
  26. 26.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  27. 27.
    Longuet-Higgins, H.C.: A computer algorithm for reconstructing a scene from two projections. Nature 293(5828), 133–135 (1981)CrossRefGoogle Scholar
  28. 28.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: The proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE (1999)Google Scholar
  29. 29.
    Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation using convolutional neural networks. In: Blanc-Talon, J., Penne, R., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2017. LNCS, vol. 10617, pp. 675–687. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-70353-4_57CrossRefGoogle Scholar
  30. 30.
    Nguyen, T., Chen, S.W., Skandan, S., Taylor, C.J., Kumar, V.: Unsupervised deep homography: a fast and robust homography estimation model. In: IEEE Robotics and Automation Letters (2018)Google Scholar
  31. 31.
    Poursaeed, O., Katsman, I., Gao, B., Belongie, S.: Generative adversarial perturbations. arXiv preprint arXiv:1712.02328 (2017)
  32. 32.
    Poursaeed, O., Matera, T., Belongie, S.: Vision-based real estate price estimation. arXiv preprint arXiv:1707.05489 (2017)
  33. 33.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  34. 34.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  35. 35.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  36. 36.
    Rocco, I., Arandjelovic, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: Proceedings of CVPR, vol. 2 (2017)Google Scholar
  37. 37.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)Google Scholar
  38. 38.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  39. 39.
    Torr, P.H.S.: Bayesian model estimation and selection for epipolar geometry and generic manifold fitting. Int. J. Comput. Vis. 50(1), 35–61 (2002)CrossRefGoogle Scholar
  40. 40.
    Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)CrossRefGoogle Scholar
  41. 41.
    Workman, S., Greenwell, C., Zhai, M., Baltenberger, R., Jacobs, N.: DEEPFOCAL: a method for direct focal length estimation. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1369–1373. IEEE (2015)Google Scholar
  42. 42.
    Yan, N., Wang, X., Liu, F.: Fundamental matrix estimation for binocular vision measuring system used in wild field. In: International Symposium on Optoelectronic Technology and Application 2014: Image Processing and Pattern Recognition, vol. 9301, p. 93010S. International Society for Optics and Photonics (2014)Google Scholar
  43. 43.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
  44. 44.
    Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE International Conference Computer Vision (ICCV), pp. 5907–5915 (2017)Google Scholar
  45. 45.
    Zhang, Y., Zhang, L., Sun, C., Zhang, G.: Fundamental matrix estimation based on improved genetic algorithm. In: 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 1, pp. 326–329. IEEE (2016)Google Scholar
  46. 46.
    Zhang, Z.: Determining the epipolar geometry and its uncertainty: a review. Int. J. Comput. Vis. 27(2), 161–195 (1998)CrossRefGoogle Scholar
  47. 47.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)Google Scholar
  48. 48.
    Zhou, B., Khosla, A., Lapedriza, A., Torralba, A., Oliva, A.: Places: an image database for deep scene understanding. arXiv preprint arXiv:1610.02055 (2016)
  49. 49.
    Zhou, F., Zhong, C., Zheng, Q.: Method for fundamental matrix estimation combined with feature lines. Neurocomputing 160, 300–307 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Omid Poursaeed
    • 1
    • 2
    Email author
  • Guandao Yang
    • 1
  • Aditya Prakash
    • 3
  • Qiuren Fang
    • 1
  • Hanqing Jiang
    • 1
  • Bharath Hariharan
    • 1
  • Serge Belongie
    • 1
    • 2
  1. 1.Cornell UniversityIthacaUSA
  2. 2.Cornell TechNew YorkUSA
  3. 3.Indian Institute of Technology RoorkeeRoorkeeIndia

Personalised recommendations