Learning the Topology of Object Views
A visual representation of an object must meet at least three basic requirements. First, it must allow identification of the object in the presence of slight but unpredictable changes in its visual appearance. Second, it must account for larger changes in appearance due to variations in the object’s fundamental degrees of freedom, such as, e.g., changes in pose. And last, any object representation must be derivable from visual input alone, i.e., it must be learnable.
We here construct such a representation by deriving transformations between the different views of a given object, so that they can be parameterized in terms of the object’s physical degrees of freedom. Our method allows to automatically derive the appearance representations of an object in conjunction with their linear deformation model from example images. These are subsequently used to provide linear charts to the entire appearance manifold of a three-dimensional object. In contrast to approaches aiming at mere dimensionality reduction the local linear charts to the object’s appearance manifold are estimated on a strictly local basis avoiding any reference to a metric embedding space to all views. A real understanding of the object’s appearance in terms of its physical degrees of freedom is this way learned from single views alone.
KeywordsObject recognition pose estimation view sphere correspondence maps learning
- D. Beymer and T. Poggio. Image representations for visual learning. Science, 272:1905–1909, June 1996.Google Scholar
- C. Eckes and J. C. Vorbrüggen. Combining Data-Driven and Model-Based Cues for Segmentation of Video Sequences. In Proc. WCNN96, pages 868–875. INNS Press & Lawrence Erlbaum Ass., 1996.Google Scholar
- E. Kefalea. Object localization and recognition for a grasping robot. In Proc. IECON, pages 2057–2062. IEEE, 1998.Google Scholar
- K. Okada. Analysis, Synthesis and Recognition of Human Faces with Pose Variations. PhD thesis, Comp. Sci., Univ. of Southern California, 2001.Google Scholar
- T. Poggio and S. Edelman. A network that learns to recognize three-dimensional objects. Nature, 343, January 1990.Google Scholar
- A. Selinger and R. C. Nelson. Appearance-based object recognition using multiple views. In Proceedings CVPR, pages I-905–I-911, 2001.Google Scholar
- A. Selinger and R. C. Nelson. Minimally supervised acquisition of 3d recognition models from cluttered images. In Proceedings CVPR, pages I-213–I-220, 2001.Google Scholar
- J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, December 2000.Google Scholar
- J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural Computation, 12(6):1247–1284, June 2000.Google Scholar
- J. Wieghardt and C. von der Malsburg. Pose-independent object representation by 2-d views. In IEEE International Workshop on Biologically Motivated Computer Vision, May 15–17, Seoul, 2000.Google Scholar