Abstract
This paper introduces a novel probabilistic model for representing objects that change in appearance as a result of changes in pose, due to small deformations of their sub-parts and the relative spatial transformation of sub-parts of the object. We call the model a probabilistic montage. The model is based upon the idea that an image can be represented as a montage using many, small transformed and cropped patches from a collection of latent images. The approach is similar to that which might be employed by a police artist who might represent an image of a criminal suspect’s face using a montage of face parts cut out of a ”library” of face parts. In contrast, for our model, we learn the library of small latent images from a set of examples of objects that are changing in shape. In our approach, first the image is divided into a grid of sub-images. Each sub-image in the grid acts as window that crops a piece out of one of a collection of slightly larger images possible for that location in the image. We illustrate various probability models that can be used to encode the appropriate relationships for latent images and cropping transformations among the different patches. In this paper we present the complete algorithm for a tree-structured model. We show how the approach and model are able to find representations of the appearance of full body images of people in motion. We show how our approach can be used to learn representations of objects in an ”unsupervised” manner and present results using our model for recognition and tracking purposes in a ”supervised” manner.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
B. J. Frey and N. Jojic, “Estimating mixture models of images and inferring spatial transformations using the em algorithm,” Proc. IEEE Computer Vision and Pattern Recognition (CVPR), June 1999.
M. J. Black and Y. Yacoob, “Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motions,” Proc. International Conference on Computer Vision, pp. 374–381, 1995.
M. J. Black S. Ju and Y. Yacoob, “Cardboard people: A parameterized model of articulated image motion,” Proc. International Conference on Face and Gesture Recognition, pp. 38–44, 1996.
A. Blake and M. Isard, Active Contours, Springer-Verlag, 1998.
G.J. Edwards T.F. Cootes and C.J. Taylor, “Active appearance models,” Proc. European Conference on Computer Vision, vol. 2, pp. 484–498, Springer, 1998.
C. Bregler, “Learning and recognizing human dynamics in video sequences,” Proc. IEEE (CVPR), June 1997.
R. Rosales, M. Siddiqui, J. Alon, and S. Sclaroff, “Estimating 3d body pose using uncalibrated cameras,” Proc. IEEE (CVPR), 2001.
C.J. Taylor, “Reconstruction of articulated objects from point correspondences in a single uncalibrated image,” Proc. Computer Vision and Image Understanding (CVIU), pp. 80:349–363, 2000.
H. Lee and Z. Chen, “Determination of 3d human body postures from a single view,” Computer Vision Graphics and Image Processing (CVGIP), pp. 30:148–168, 1985.
J. Pearl, Probabilistic Inference in Intelligent Systems, Morgan Kaufmann, San Mateo, California, 1988.
Y. Bengio Y. LeCun, L. Bottou and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, November 1998.
K. Fukushima, “Neocognitron: A hierarchical neural network capable of visual pattern recognition,” Neural Networks, vol. 1, pp. 119–130, 1988.
M. Jordan, Learning in Graphical Models, Kluwer, Dordrecht, 1998.
S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-6, no. 6, November 1984.
A. Finkelstein and M. Range, “Image mosaics,” Proc. EP’98 and RIDT’98, St. Malo, France, vol. 15, no. 10, pp. 1042–1052, March 1998.
R. Silvers and M. Hawley, Photomosaics, New York: Henry Holt and Company, 1997.
K. Knowlton and L. Harmon, “Computer-produced grey scales,” Computer Graphics and Image Processing, vol. 1, pp. 1–20, 1972.
N. Friedman, “The bayesian structural em algorithm,” Fourteenth Conf. on Uncertainty in Artificial Intelligence (UAI), 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pal, C., Frey, B.J., Jojic, N. (2002). Learning Montages of Transformed Latent Images as Representations of Objects That Change in Appearance. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47979-1_48
Download citation
DOI: https://doi.org/10.1007/3-540-47979-1_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43748-2
Online ISBN: 978-3-540-47979-6
eBook Packages: Springer Book Archive