On Affine Invariant Clustering and Automatic Cast Listing in Movies
We develop a distance metric for clustering and classification algorithms which is invariant to affine transformations and includes priors on the transformation parameters. Such clustering requirements are generic to a number of problems in computer vision.
We extend existing techniques for affine-invariant clustering, and show that the new distance metric outperforms existing approximations to affine invariant distance computation, particularly under large transformations. In addition, we incorporate prior probabilities on the transformation parameters. This further regularizes the solution, mitigating arare but serious tendency of the existing solutions to diverge. For the particular special case of corresponding point sets we demonstrate that the affine invariant measure we introduced may be obtained in closed form.
As an application of these ideas we demonstrate that the faces of the principal cast of a feature film can be generated automatically using clustering with appropriate invariance. This is a very demanding test as it involves detecting and clustering over tens of thousands of images with the variances including changes in viewpoint, lighting, scale and expression.
KeywordsComputer Vision Cluster Algorithm Distance Function Trust Region Distance Matrice
Unable to display preview. Download preview PDF.
- 2.M. C. Burl, M. Weber, and P. Perona. A probabilistic approach to object recognition using local photometry and global geometry. In ECCV(2), pages 628–641, 1998.Google Scholar
- 5.F. De la Torre and M. J. Black. Robust principal component analysis for computer vision. In Proc. International Conference on Computer Vision, 2001.Google Scholar
- 7.R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.Google Scholar
- 8.D. Fasulo. An analysis of recent work on clustering algorithms. Technical Report UW-CSE-01-03-02, University of Washington, 1999.Google Scholar
- 9.B. Frey and N. Jojic. Transformed component analysis: joint estimation of spatial transformations and image components. In Proc. International Conference on Computer Vision, pages 1190–1196, 1999.Google Scholar
- 10.R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049, 2000.Google Scholar
- 12.M. Irani. Multi-frame optical flow estimation using subspace constraints. In ICCV, pages 626–633, 1999.Google Scholar
- 14.L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, NY, USA, 1990.Google Scholar
- 16.T. Leung and J. Malik. Recognizing surfaces using three-dimensional textons. In Proc. 7th International Conference on Computer Vision, Kerkyra, Greece, pages 1010–1017, Kerkyra, Greece, September 1999.Google Scholar
- 17.T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, December 1999.Google Scholar
- 18.K. Mikolajczyk, R. Choudhury, and C. Schmid. Face detection in a video sequence — a temporal approach. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001.Google Scholar
- 20.W. Press, B. Flannery, S. Teukolsky, and W. Vetterling. Numerical Recipes in C. Cambridge University Press, 1988.Google Scholar
- 21.C. Schmid. Constructing models for content-based image retrieval. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001.Google Scholar
- 22.H. Schneiderman and T. Kanade. A histogram-based method for detection of faces and cars. In Proc. ICIP, volume 3, pages 504–507, September 2000.Google Scholar
- 23.B. Schölkopf, C. Burges, and V. Vapnik. Incorporating invariances in support vector learning machines. In Articial Neural Networks, ICANN’96, pages 47–52, 1996.Google Scholar
- 24.J. Shi and J. Malik. Normalized cuts and image segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 731–743, 1997.Google Scholar
- 25.H. Sidenbladh and M. J. Black. Learning image statistics for Bayesian tracking. In Proc. International Conference on Computer Vision, pages II:709–716, 2001.Google Scholar
- 26.P. Simard, Y. Le Cun, and J. Denker. Efficient pattern recognition using a new transformation distance. In Advances in Neural Info. Proc. Sys. (NIPS), volume 5, pages 50–57, 1993.Google Scholar
- 27.P. Simard, Y. Le Cun, J. Denker, and B. Victorri. Transformation invariance in pattern recognition—tangent distance and tangent propagation. In Lecture Notes in Computer Science, Vol. 1524, pages 239–274. Springer, 1998.Google Scholar
- 28.C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization approach. International Journal of Computer Vision, 9(2):137–154, November 1992.Google Scholar
- 30.K. Toyama and A. Blake. Probabalistic tracking in a metric space. In Proc. International Conference on Computer Vision, pages II, 50–57, 2001.Google Scholar
- 31.N. Vasconcelos and A. Lippman. Multiresolution tangent distance for affine-invariant classification. In Advances in Neural Info. Proc. Sys. (NIPS), volume 10, pages 843–849, 1998.Google Scholar