On Affine Invariant Clustering and Automatic Cast Listing in Movies

  • Andrew Fitzgibbon
  • Andrew Zisserman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2352)


We develop a distance metric for clustering and classification algorithms which is invariant to affine transformations and includes priors on the transformation parameters. Such clustering requirements are generic to a number of problems in computer vision.

We extend existing techniques for affine-invariant clustering, and show that the new distance metric outperforms existing approximations to affine invariant distance computation, particularly under large transformations. In addition, we incorporate prior probabilities on the transformation parameters. This further regularizes the solution, mitigating arare but serious tendency of the existing solutions to diverge. For the particular special case of corresponding point sets we demonstrate that the affine invariant measure we introduced may be obtained in closed form.

As an application of these ideas we demonstrate that the faces of the principal cast of a feature film can be generated automatically using clustering with appropriate invariance. This is a very demanding test as it involves detecting and clustering over tens of thousands of images with the variances including changes in viewpoint, lighting, scale and expression.


Computer Vision Cluster Algorithm Distance Function Trust Region Distance Matrice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Y. Amit and D. Geman. A computational model for visual selection. Neural Computation, 11(7):1691–1715, 1999.CrossRefGoogle Scholar
  2. 2.
    M. C. Burl, M. Weber, and P. Perona. A probabilistic approach to object recognition using local photometry and global geometry. In ECCV(2), pages 628–641, 1998.Google Scholar
  3. 3.
    R. Byrd, R.B. Schnabel, and G. A. Shultz. A trust region algorithm for nonlinearly constrained optimization. SIAM J. Numer. Anal., 24:1152–1170, 1987.zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    A. R. Conn, N. I. M. Gould, and P. L. Toint. Trust-Region Methods. MPS/SIAM Series on Optimization. SIAM, Philadelphia, 2000.zbMATHGoogle Scholar
  5. 5.
    F. De la Torre and M. J. Black. Robust principal component analysis for computer vision. In Proc. International Conference on Computer Vision, 2001.Google Scholar
  6. 6.
    I. Dryden and K. Mardia. Statistical shape analysis. John Wiley & Sons, New York, 1998.zbMATHGoogle Scholar
  7. 7.
    R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.Google Scholar
  8. 8.
    D. Fasulo. An analysis of recent work on clustering algorithms. Technical Report UW-CSE-01-03-02, University of Washington, 1999.Google Scholar
  9. 9.
    B. Frey and N. Jojic. Transformed component analysis: joint estimation of spatial transformations and image components. In Proc. International Conference on Computer Vision, pages 1190–1196, 1999.Google Scholar
  10. 10.
    R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049, 2000.Google Scholar
  11. 11.
    E. Hjelmås and B. K. Low. Face detection: A survey. Computer Vision and Image Understanding, 83(3):236–274, 2001.zbMATHCrossRefGoogle Scholar
  12. 12.
    M. Irani. Multi-frame optical flow estimation using subspace constraints. In ICCV, pages 626–633, 1999.Google Scholar
  13. 13.
    M. Irani and P. Anandan. About direct methods. In W. Triggs, A. Zisserman, and R. Szeliski, editors, Vision Algorithms: Theory and Practice, volume 1883 of LNCS, pages 267–277. Springer, 2000.CrossRefGoogle Scholar
  14. 14.
    L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, NY, USA, 1990.Google Scholar
  15. 15.
    Y LeCun, L. Bottou, Y Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.CrossRefGoogle Scholar
  16. 16.
    T. Leung and J. Malik. Recognizing surfaces using three-dimensional textons. In Proc. 7th International Conference on Computer Vision, Kerkyra, Greece, pages 1010–1017, Kerkyra, Greece, September 1999.Google Scholar
  17. 17.
    T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, December 1999.Google Scholar
  18. 18.
    K. Mikolajczyk, R. Choudhury, and C. Schmid. Face detection in a video sequence — a temporal approach. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001.Google Scholar
  19. 19.
    B. A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–9, 1996.CrossRefGoogle Scholar
  20. 20.
    W. Press, B. Flannery, S. Teukolsky, and W. Vetterling. Numerical Recipes in C. Cambridge University Press, 1988.Google Scholar
  21. 21.
    C. Schmid. Constructing models for content-based image retrieval. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001.Google Scholar
  22. 22.
    H. Schneiderman and T. Kanade. A histogram-based method for detection of faces and cars. In Proc. ICIP, volume 3, pages 504–507, September 2000.Google Scholar
  23. 23.
    B. Schölkopf, C. Burges, and V. Vapnik. Incorporating invariances in support vector learning machines. In Articial Neural Networks, ICANN’96, pages 47–52, 1996.Google Scholar
  24. 24.
    J. Shi and J. Malik. Normalized cuts and image segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 731–743, 1997.Google Scholar
  25. 25.
    H. Sidenbladh and M. J. Black. Learning image statistics for Bayesian tracking. In Proc. International Conference on Computer Vision, pages II:709–716, 2001.Google Scholar
  26. 26.
    P. Simard, Y. Le Cun, and J. Denker. Efficient pattern recognition using a new transformation distance. In Advances in Neural Info. Proc. Sys. (NIPS), volume 5, pages 50–57, 1993.Google Scholar
  27. 27.
    P. Simard, Y. Le Cun, J. Denker, and B. Victorri. Transformation invariance in pattern recognition—tangent distance and tangent propagation. In Lecture Notes in Computer Science, Vol. 1524, pages 239–274. Springer, 1998.Google Scholar
  28. 28.
    C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization approach. International Journal of Computer Vision, 9(2):137–154, November 1992.Google Scholar
  29. 29.
    P. H. S. Torr and A. Zisserman. Feature based methods for structure and motion estimation. In W. Triggs, A. Zisserman, and R. Szeliski, editors, Vision Algorithms: Theory and Practice, volume 1883 of LNCS, pages 278–294. Springer, 2000.CrossRefGoogle Scholar
  30. 30.
    K. Toyama and A. Blake. Probabalistic tracking in a metric space. In Proc. International Conference on Computer Vision, pages II, 50–57, 2001.Google Scholar
  31. 31.
    N. Vasconcelos and A. Lippman. Multiresolution tangent distance for affine-invariant classification. In Advances in Neural Info. Proc. Sys. (NIPS), volume 10, pages 843–849, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Andrew Fitzgibbon
    • 1
  • Andrew Zisserman
    • 1
  1. 1.Visual Geometry Group Department of Engineering ScienceThe University of OxfordUK

Personalised recommendations