Advertisement

Automatic Camera Tracking

  • Andrew W. Fitzgibbon
  • Andrew Zisserman
Part of the The International Series in Video Computing book series (VICO, volume 5)

Abstract

The goal of automatic recovery of camera motion and scene structure from video sequences has been a staple of computer vision research for over a decade. As an area of endeavour, it has seen both steady and explosive progress over time, and now represents one of the success stories of computer vision. This task, automatic camera tracking or “matchmoving”, is the sine qua non of modern special effects, allowing the seamless insertion of computer generated objects onto live-action backgrounds (figure 2.1 shows an example). It has moved from a research problem for a small number of uncalibrated images to commercial software which can automatically track cameras through thousands of frames [1]. In addition, camera tracking is an important preprocess for many computer vision algorithms such as multiple-view shape reconstruction, novel view synthesis and autonomous vehicle navigation.

Keywords

Computer Vision Motion Estimation Camera Motion Bundle Adjustment Lens Distortion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    2d3 Ltd. http://www.2d3.com 2002.
  2. [2]
    S. Avidan and A. Shashua. Threading fundamental matrices. In Proceedings of the European Conference on Computer Vision, pages 124–140. Springer-Verlag, 1998.Google Scholar
  3. [3]
    P. A. Beardsley, P. H. S. Torr, and A. Zisserman. 3D model acquisition from extended image sequences. In Proceedings of the 4th European Conference on Computer Vision, LNCS 1065, Cambridge, pages 683–695, 1996.Google Scholar
  4. [4]
    M. J. Black and P. Anandan. A framework for the robust estimation of optical flow. In Proceedings of the 4th International Conference on Computer Vision, Berlin, pages 231–236, 1993.Google Scholar
  5. [5]
    D. Bondyfalat and S. Bougnoux. Imposing euclidean constraints during self-calibration processes. In R. Koch and L. Van Gool, editors, 3D Structure from Multiple Images of Large-Scale Environments. LNCS 1506. Springer-Verlag, 1998.Google Scholar
  6. [6]
    M. Brand. Morphable 3d models from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages II: 456–463, 2001.Google Scholar
  7. [7]
    D.C. Brown. Decentering distortion of lenses. Photogrammetric Engineering, 32(3):444–462, 1966.Google Scholar
  8. [8]
    A. Chiuso, P. Favaro, H. Jin, and S. Soatto. MfM: 3-D motion and structure causally integrated over time. In Proceedings of the 6th European Conference on Computer Vision, Dublin, Ireland, 2000.Google Scholar
  9. [9]
    J. C. Clarke and A. Zisserman. Detecting and tracking of independent motion. Image and Vision Computing, 14:565–572, 1996.CrossRefGoogle Scholar
  10. [10]
    F. Devemay and O. D. Faugeras. Automatic calibration and removal of distortion from scenes of structured environments. In SPlE, volume 2567, San Diego, CA, July 1995.Google Scholar
  11. [11]
    O. Faugeras and Q.-T. Luong. The Geometry of Multiple Images. MIT Press, 2001.MATHGoogle Scholar
  12. [12]
    A. W. Fitzgibbon. Simultaneous linear estimation of multiple view geometry and lens distortion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001.Google Scholar
  13. [13]
    A. W. Fitzgibbon, G. Cross, and A. Zisserman. Automatic 3D model construction for turn-table sequences. In R. Koch and L. Van Gool, editors, 3D Structure from Multiple Images of Large-Scale Environments, LNCS 1506, pages 155–170. Springer-Verlag, June 1998.CrossRefGoogle Scholar
  14. [14]
    A. W. Fitzgibbon and A. Zisserman. Automatic camera recovery for closed or open image sequences. In Proceedings of the European Conference on Computer Vision, pages 311–326. Springer-Verlag, June 1998.Google Scholar
  15. [15]
    A. W. Fitzgibbon and A. Zisserman. Multibody structure and motion: 3-D reconstruction of independently moving objects. In Proceedings of the European Conference on Computer Vision, pages 891–906. Springer-Verlag, June 2000.Google Scholar
  16. [16]
    S. Gibson, J. Cook, T. L. J. Howard, R. J. Hubbold, and D. Dram. Accurate camera calibration for off-line, video-based augmented reality. In IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR 2002), September 2002.Google Scholar
  17. [17]
    R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049, 2000.MATHGoogle Scholar
  18. [18]
    D. Jacobs. Linear fitting with missing data: Applications to structure from motion and to characterizing intensity images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 206–212, 1997.Google Scholar
  19. (19]
    F. Kahl. Geometry and Critical Configurations of Multiple Views. PhD thesis, Lund Institute of Technology, 2001.Google Scholar
  20. [20]
    F. Kahl, R. I. Hartley, and K. Åström. Critical configurations for n-view projective reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, 2001.Google Scholar
  21. [21]
    R. Koch, M. Pollefeys, B. Heigl, L. Van Gool, and H. Niemann. Calibration of hand-held camera sequences for plenoptic modeling. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, pages 585–591, 1999.CrossRefGoogle Scholar
  22. [22]
    S. Laveau. Geometry of a system of N cameras. Theory, estimation, and applications. PhD thesis, INRIA, 1996.Google Scholar
  23. [23]
    D. Martinec and T. Pajdla. Structure from many perspective images with occlusions. In Proceedings of the European Conference on Computer Vision, pages 355–369, 2002.Google Scholar
  24. [24]
    P. McLauchlan. A batch/recursive algorithm for 3D scene reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, pages II: 738–743, 2000.Google Scholar
  25. [25]
    P. McLauchlan, X. Shen, P. Palmer, A. Manessis, and A. Hilton. Surface-based structure-from-motion using feature groupings. In Proceedings of the Asian Conference on Computer Vision, 2000.Google Scholar
  26. [26]
    T. Moons, L. Van Gool, M. Van Diest, and A. Oosterlinck. Affine structure from perspective image pairs obtained by a translating camera. In J. L. Mundy, A. Zissennan, and D. Forsyth, editors, Applications of invariance in computer vision, pages 297–316. Springer-Verlag, 1994.CrossRefGoogle Scholar
  27. [27]
    D. Nister. Reconstruction from uncalibrated sequences with a hierarchy of trifocal tensors. In Proceedings of the European Conference on Computer Vision, 2000.Google Scholar
  28. [28]
    D. Nister. Automatic Dense Reconstruction from Uncalibrated Video Sequences. PhD thesis, Dept. of Numerical Analysis and Computing Science, KTH Stockholm, 2001.Google Scholar
  29. [29]
    M. Pollefeys, F. Verbiest, and L. J. Van Gool. Surviving dominant planes in uncalibrated structure and motion recovery. In ECCV (2), pages 837–851, 2002.Google Scholar
  30. [30]
    H. S. Sawhney, S. Hsu, and R. Kumar. Robust video mosaicing through topology inference and local to global alignment. In Proceedings of the European Conference on Computer Vision, pages 103–119. Springer-Verlag, 1998.Google Scholar
  31. [31]
    R. A. Smith, A. W.Fitzgibbon, and A. Zisserman. Improving augmented reality using image and scene constraints. In Proceedings of the 10th British Machine Vision Conference, Nottingham, pages 295–304. BMVA Press, 1999.Google Scholar
  32. [32]
    G. P. Stein. Lens distortion calibration using point correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, 1997.Google Scholar
  33. [33]
    P. Sturm. On focal length calibration from two views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, 2001.Google Scholar
  34. [34]
    P. Sturm and W.Triggs. A factorization based algorithm for multi-image projective structure and motion. In Proceedings of the 4th European Conference on Computer Vision, Cambridge, UK, pages 709–720, 1996.Google Scholar
  35. [35]
    C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization approach. International Journal of Computer Vision, 9(2):137–154, November 1992.CrossRefGoogle Scholar
  36. [36]
    B. Tordoff and D. W. Murray. Violating rotating camera geometry: The effect of radial distortion on self-calibration. In Proceedings of the International Conference on Pattern Recognition, 2000.Google Scholar
  37. [37]
    P. H. S. Torr, A. W.Fitzgibbon, and A. Zisserman. The problem of degeneracy in structure and motion recovery from uncalibrated image sequences. International Journal of Computer Vision, 32(1):27–44, August 1999.CrossRefGoogle Scholar
  38. [38]
    P. H. S. Torr and D. W. Murray. Stochastic motion clustering. In Proceedings of the 3rd European Conference on Computer Vision, Stockholm, Sweden, pages 328–337. Springer-Verlag, 1994.Google Scholar
  39. [39]
    L. Torresani, D. Yang, G. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001.Google Scholar
  40. [40]
    W. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle adjustment: A modern synthesis. In W. Triggs, A. Zisserrnan, and R. Szeliski, editors, Vision Algorithms: Theory and Practice, LNCS. Springer -Verlag, 2000.CrossRefGoogle Scholar
  41. [41]
    J. Weber and J. Malik. Rigid body segmentation and shape description from dense optical flow under weak perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):139–143, 1997.CrossRefGoogle Scholar
  42. [42]
    L. Wolf and A. Shashua. On projection matrices ℙk → ℙ2, k = 3, ⋯, 6, and their applications in computer vision. In Proceedings of the International Conference on Computer Vision, 2001.Google Scholar
  43. [43]
    Z. Zhang. On the epipolar geometry between two images with lens distortion. In Proceedings of the International Conference on Pattern Recognition, pages 407–411, 1996.Google Scholar

Copyright information

© Springer Science+Business Media New York 2003

Authors and Affiliations

  • Andrew W. Fitzgibbon
    • 1
  • Andrew Zisserman
    • 1
  1. 1.Robotics Research Group Department of Engineering ScienceUniversity of OxfordOxfordUK

Personalised recommendations