Wide baseline pose estimation from video with a density-based uncertainty model

  • Nicola PellicanòEmail author
  • Emanuel Aldea
  • Sylvie Le Hégarat-Mascle
Original Paper


Robust wide baseline pose estimation is an essential step in the deployment of smart camera networks. In this work, we highlight some current limitations of conventional strategies for relative pose estimation in difficult urban scenes. Then, we propose a solution which relies on an adaptive search of corresponding interest points in synchronized video streams which allows us to converge robustly toward a high-quality solution. The core idea of our algorithm is to build across the image space a nonstationary mapping of the local pose estimation uncertainty, based on the spatial distribution of interest points. Subsequently, the mapping guides the selection of new observations from the video stream in order to prioritize the coverage of areas of high uncertainty. With an additional step in the initial stage, the proposed algorithm may also be used for refining an existing pose estimation based on the video data; this mode allows for performing a data-driven self-calibration task for stereo rigs for which accuracy is critical, such as onboard medical or vehicular systems. We validate our method on three different datasets which cover typical scenarios in pose estimation. The results show a fast and robust convergence of the solution, with a significant improvement, compared to single image-based alternatives, of the RMSE of ground-truth matches, and of the maximum absolute error.


Pose estimation Wide baseline Camera calibration Guided matching 



The authors gratefully acknowledge the support of Regent’s Park Mosque for providing access to the site during data collection, and of K. Kiyani. This work was partly funded by ANR grant ANR-15-CE39-0005 and by QNRF grant NPRP-09-768-1-114.


  1. 1.
    Ataer-Cansizoglu, E., Taguchi, Y., Ramalingam, S., Miki, Y.: Calibration of non-overlapping cameras using an external slam system. In: 2nd International Conference on 3D Vision (3DV), vol. 1, pp. 509–516. IEEE (2014)Google Scholar
  2. 2.
    Ayaz, S.M., Kim, M.Y., Park, J.: Survey on zoom-lens calibration methods and techniques. Mach. Vis. Appl. 28(8), 803–818 (2017)Google Scholar
  3. 3.
    Boutros, N., Shortis, M.R., Harvey, E.S.: A comparison of calibration methods and system configurations of underwater stereo-video systems for applications in marine ecology. Limnol. Oceanogr. Methods 13(5), 224–236 (2015)Google Scholar
  4. 4.
    Brückner, M., Bajramovic, F., Denzler, J.: Intrinsic and extrinsic active self-calibration of multi-camera systems. Mach. Vis. Appl. 25(2), 389–403 (2014)Google Scholar
  5. 5.
    Caspi, Y., Simakov, D., Irani, M.: Feature-based sequence-to-sequence matching. Int. J. Comp. Vis. 68(1), 53–64 (2006)Google Scholar
  6. 6.
    Chaquet, J.M., Carmona, E.J., Fernández-Caballero, A.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117(6), 633–659 (2013)Google Scholar
  7. 7.
    Chum, O., Matas, J.: Matching with prosac-progressive sample consensus. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 220–226. IEEE (2005)Google Scholar
  8. 8.
    Conte, D., Foggia, P., Percannella, G., Vento, M.: Counting moving persons in crowded scenes. Mach. Vis. Appl. 24(5), 1029–1042 (2013)Google Scholar
  9. 9.
    Dang, T., Hoffmann, C., Stiller, C.: Continuous stereo self-calibration by camera parameter tracking. IEEE Trans. Image Process. 18(7), 1536–1550 (2009)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Devarajan, D., Radke, R.J., Chung, H.: Distributed metric calibration of ad hoc camera networks. ACM Trans. Sensor Netw. (TOSN) 2(3), 380–403 (2006)Google Scholar
  11. 11.
    Dubuisson, S., Gonzales, C.: A survey of datasets for visual tracking. Mach. Vis. Appl. 27(1), 23–52 (2016)Google Scholar
  12. 12.
    Eshel, R., Moses, Y.: Tracking in a dense crowd using multiple cameras. Int. J. Comput. Vis. 88(1), 129–143 (2010). Google Scholar
  13. 13.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96, 226–231 (1996)Google Scholar
  14. 14.
    Ferryman, J., Shahrokni, A.: Pets2009: Dataset and challenge. In: 12th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS-Winter), 2009, pp. 1–6. IEEE (2009)Google Scholar
  15. 15.
    Foroughi, H., Ray, N., Zhang, H.: Robust people counting using sparse representation and random projection. Pattern Recognit. 48(10), 3038–3052 (2015)Google Scholar
  16. 16.
    Fradi, H., Luvison, B., Pham, Q.C.: Crowd behavior analysis using local mid-level visual descriptors. IEEE Trans. Circuits Syst. Video Technol. 27(3), 589–602 (2017). Google Scholar
  17. 17.
    Fraundorfer, F., Tanskanen, P., Pollefeys, M.: A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles. Comput. Vis.-ECCV 2010, 269–282 (2010)Google Scholar
  18. 18.
    Gemeiner, P., Micusik, B., Pflugfelder, R.: Calibration Methodology for Distant Surveillance Cameras, pp. 162–173. Springer, Cham (2015)Google Scholar
  19. 19.
    Goldman, Y., Rivlin, E., Shimshoni, I.: Robust epipolar geometry estimation using noisy pose priors. Image Vis. Comput. 67, 16–28 (2017)Google Scholar
  20. 20.
    Guo, X., Cao, X.: Triangle-constraint for finding more good features. In: International Conference on Pattern Recognition (ICPR), pp. 1393–1396 (2010)Google Scholar
  21. 21.
    Hansen, P., Alismail, H., Rander, P., Browning, B.: Online continuous stereo extrinsic parameter estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 1059–1066. IEEE (2012)Google Scholar
  22. 22.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, second edn. Cambridge University Press, ISBN: 0521540518 (2004)Google Scholar
  23. 23.
    Kasten, Y., Ben-Artzi, G., Peleg, S., Werman, M.: Fundamental matrices from moving objects using line motion barcodes. In: European Conference on Computer Vision, pp. 220–228. Springer (2016)Google Scholar
  24. 24.
    Khan, S.M., Shah, M.: Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 505–519 (2009)Google Scholar
  25. 25.
    Kneip, L., Chli, M., Siegwart, R.Y.: Robust real-time visual odometry with a single camera and an IMU. In: Proceedings of the British Machine Vision Conference 2011. British Machine Vision Association (2011)Google Scholar
  26. 26.
    Lin, B., Johnson, A., Qian, X., Sanchez, J., Sun, Y.: Simultaneous tracking, 3d reconstruction and deforming point detection for stereoscope guided surgery. In: Augmented Reality Environments for Medical Imaging and Computer-Assisted Interventions, pp. 35–44. Springer (2013)Google Scholar
  27. 27.
    Lin, W.Y., Cheong, L.F., Tan, P., Dong, G., Liu, S.: Simultaneous camera pose and correspondence estimation with motion coherence. Int. J. Comput. Vis. 96(2), 145–161 (2012)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Lin, W.Y., Liu, S., Jiang, N., Do, M.N., Tan, P., Lu, J.: Repmatch: Robust feature matching and pose for reconstructing modern cities. In: European Conference on Computer Vision, pp. 562–579. Springer (2016)Google Scholar
  29. 29.
    Ling, Y., Shen, S.: High-precision online markerless stereo extrinsic calibration. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 1771–1778. IEEE (2016)Google Scholar
  30. 30.
    Liu, Z., Monasse, P., Marlet, R.: Match selection and refinement for highly accurate two-view structure from motion. In: European Conference on Computer Vision, pp. 818–833. Springer (2014)Google Scholar
  31. 31.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60(2), 91–110 (2004)Google Scholar
  32. 32.
    Madrigal, F., Hayet, J.B., Rivera, M.: Motion priors for multiple target visual tracking. Mach. Vis. Appl. 26(2–3), 141–160 (2015)Google Scholar
  33. 33.
    Mahmoud, N., Hostettler, A., Collins, T., Soler, L., Doignon, C., Montiel, J.M.M.: SLAM based quasi dense reconstruction for minimally invasive surgery scenes. ICRA 2017 workshop C4 Surgical Robots: Compliant, Continuum, Cognitive, and Collaborative (2017)Google Scholar
  34. 34.
    Maier-Hein, L., Groch, A., Bartoli, A., Bodenstedt, S., Boissonnat, G., Chang, P.L., Clancy, N., Elson, D.S., Haase, S., Heim, E., et al.: Comparative validation of single-shot optical techniques for laparoscopic 3-d surface reconstruction. IEEE Trans. Med. Imaging 33(10), 1913–1930 (2014)Google Scholar
  35. 35.
    Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, CVPR’07, pp. 1–8. IEEE (2007)Google Scholar
  36. 36.
    Mavrinac, A., Chen, X.: Modeling coverage in camera networks: a survey. Int. J. Comput. Vis. 101(1), 205–226 (2013)MathSciNetGoogle Scholar
  37. 37.
    Mehmood, M.O., Ambellouis, S., Achard, C.: Ghost pruning for people localization in overlapping multicamera systems. In: International Conference on Computer Vision Theory and Applications (VISAPP), 2014, vol. 2, pp. 632–639. IEEE (2014)Google Scholar
  38. 38.
    Milan, A., Roth, S., Schindler, K.: Continuous energy minimization for multitarget tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 58–72 (2014)Google Scholar
  39. 39.
    Moisan, L., Stival, B.: A probabilistic criterion to detect rigid point matches between two images and estimate the fundamental matrix. Int. J. Comp. Vis. 57(3), 201–218 (2004)Google Scholar
  40. 40.
    Mountney, P., Stoyanov, D., Yang, G.Z.: Three-dimensional tissue deformation recovery and tracking. IEEE Signal Process. Mag. 27(4), 14–24 (2010)Google Scholar
  41. 41.
    Mountney, P., Yang, G.Z.: Dynamic view expansion for minimally invasive surgery using simultaneous localization and mapping. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2009, EMBC 2009, pp. 1184–1187. IEEE (2009)Google Scholar
  42. 42.
    Mueller, G.R., Wuensche, H.J.: Continuous extrinsic online calibration for stereo cameras. In: Intelligent Vehicles Symposium (IV), 2016 IEEE, pp. 966–971. IEEE (2016)Google Scholar
  43. 43.
    Ochoa, B., Belongie, S.: Covariance propagation for guided matching. In: Workshop on Statistical Methods in Multi-Image and Video Processing (2006)Google Scholar
  44. 44.
    Pellicano, N., Aldea, E., Le Hégarat-Mascle, S.: Robust wide baseline pose estimation from video. In: 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 3820–3825. IEEE (2016)Google Scholar
  45. 45.
    Pellicanò, N., Aldea, E., Le Hégarat-Mascle, S.: Geometry-based multiple camera head detection in dense crowds. In: Proceedings of the 28th British Machine Vision Conference (BMVC)—5th Activity Monitoring by Multiple Distributed Sensing Workshop (2017)Google Scholar
  46. 46.
    Peng, P., Tian, Y., Wang, Y., Li, J., Huang, T.: Robust multiple cameras pedestrian detection with multi-view bayesian network. Pattern Recognit. 48(5), 1760–1772 (2015)Google Scholar
  47. 47.
    Pollefeys, M., Koch, R., Van Gool, L.: Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters. Int. J. Comput. Vis. 32(1), 7–25 (1999)Google Scholar
  48. 48.
    Pollok, T., Monari, E.: A visual slam-based approach for calibration of distributed camera networks. In: 13th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2016, Colorado Springs, CO, USA, August 23–26, 2016, pp. 429–437 (2016).
  49. 49.
    Puig, L., Daniilidis, K.: Monocular 3d tracking of deformable surfaces. In: IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 580–586. IEEE (2016)Google Scholar
  50. 50.
    Radke, R.J.: A survey of distributed computer vision algorithms. Handbook of Ambient Intelligence and Smart Environments pp. 35–55 (2010)Google Scholar
  51. 51.
    Raguram, R., Chum, O., Pollefeys, M., Matas, J., Frahm, J.M.: Usac: a universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 2022–2038 (2013)Google Scholar
  52. 52.
    Ravichandran, A., Vidal, R.: Video registration using dynamic textures. Patt. Anal. Mach. Intell. 33(1), 158–171 (2011)Google Scholar
  53. 53.
    Remondino, F., Fraser, C.: Digital camera calibration methods: considerations and comparisons. Int. Arch. Photogr. Rem. Sens. Spat. Inf. Sci. 36(5), 266–272 (2006)Google Scholar
  54. 54.
    SanMiguel, J.C., Micheloni, C., Shoop, K., Foresti, G.L., Cavallaro, A.: Self-reconfigurable smart camera networks. IEEE Comput. 47(5), 67–73 (2014)Google Scholar
  55. 55.
    Sekii, T.: Robust, real-time 3d tracking of multiple objects with similar appearances. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4275–4283 (2016)Google Scholar
  56. 56.
    Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B (Methodol.) 53(3), 683–690 (1991)MathSciNetzbMATHGoogle Scholar
  57. 57.
    Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: An experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)Google Scholar
  58. 58.
    Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comp. Vis. 80(2), 189–210 (2008)Google Scholar
  59. 59.
    STEREOLABS: ZED Stereo Camera (2018).
  60. 60.
    Sur, F., Noury, N., Berger, M.O.: Computing the uncertainty of the 8 point algorithm for fundamental matrix estimation. In: 19th British Machine Vision Conference-BMVC 2008, p. 10 (2008)Google Scholar
  61. 61.
    Tan, X., Sun, C., Sirault, X., Furbank, R., Pham, T.D.: Feature matching in stereo images encouraging uniform spatial distribution. Pattern Recognit. 48(8), 2530–2542 (2015)Google Scholar
  62. 62.
    Tang, N.C., Lin, Y.Y., Weng, M.F., Liao, H.Y.M.: Cross-camera knowledge transfer for multiview people counting. IEEE Trans. Image Process. 24(1), 80–93 (2015)MathSciNetzbMATHGoogle Scholar
  63. 63.
    Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., Schiele, B.: Learning people detectors for tracking in crowded scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1049–1056 (2013)Google Scholar
  64. 64.
    Totz, J., Mountney, P., Stoyanov, D., Yang, G.Z.: Dense surface reconstruction for enhanced navigation in mis. Med. Image Comput. Comput.-Assist. Interv.-MICCAI 2011, 89–96 (2011)Google Scholar
  65. 65.
    Tsai, R.: A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE J. Robot. Autom. 3(4), 323–344 (1987)Google Scholar
  66. 66.
    Utasi, Á., Benedek, C.: A bayesian approach on people localization in multicamera systems. IEEE Trans. Circuits Syst. Video Technol. 23(1), 105–115 (2013)Google Scholar
  67. 67.
    Visentini-Scarzanella, M., Stoyanov, D., Yang, G.Z.: Metric depth recovery from monocular images using shape-from-shading and specularities. In: 19th IEEE International Conference on Image Processing (ICIP), 2012, pp. 25–28. IEEE (2012)Google Scholar
  68. 68.
    Wang, B., Wang, G., Chan, K.L., Wang, L.: Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 589–602 (2017)Google Scholar
  69. 69.
    Wu, S., Wong, H.S., Yu, Z.: A bayesian model for crowd escape behavior detection. IEEE Trans. Circuits Syst. Video Technol. 24(1), 85–98 (2014)Google Scholar
  70. 70.
    Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)Google Scholar
  71. 71.
    Xiao, C.B., Feng, D.Z., Yuan, M.D.: An efficient fundamental matrix estimation method for wide baseline images. Pattern Analysis and Applications pp. 1–10 (2016)Google Scholar
  72. 72.
    Ye, M., Giannarou, S., Meining, A., Yang, G.Z.: Online tracking and retargeting with applications to optical biopsy in gastrointestinal endoscopic examinations. Med. Image Anal. 30, 144–157 (2016)Google Scholar
  73. 73.
    Ye, M., Giannarou, S., Patel, N., Teare, J., Yang, G.Z.: Pathological site retargeting under tissue deformation using geometrical association and tracking. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 67–74. Springer (2013)Google Scholar
  74. 74.
    Zamir, A.R., Dehghan, A., Shah, M.: Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In: Computer Vision–ECCV 2012, pp. 343–356. Springer (2012)Google Scholar
  75. 75.
    Zhang, Z.: Determining the epipolar geometry and its uncertainty: a review. Int. J. Comp. Vis. 27(2), 161–195 (1998)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Nicola Pellicanò
    • 1
    Email author
  • Emanuel Aldea
    • 1
  • Sylvie Le Hégarat-Mascle
    • 1
  1. 1.SATIEUniversité Paris-Sud, Université Paris-SaclayGif-sur-YvetteFrance

Personalised recommendations