Abstract
This paper introduces an information projection framework to provide a numerical criteria for evaluating the information contribution of competing image representations. Such representations include structure vs. appearance, and 3D vs. 3D representations. The framework allows a heterogeneous model of mixed representations, and sequentially selects representation elements according to their information gains. Optimal representations for a given set of images can be learned automatically in this manner. Experiments on these two competing representation pairs show that the optimal representation is data dependent, and forms a spectrum across multiple images. This shows the necessity of having numerical solutions to these problems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Barr AH (1981) Superquadrics and angle-preserving transformations. IEEE Comput Graph Appl 1(1):11–23
Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94:115–117
Biederman I, Gerhardstein PC (1995) Viewpoint-dependent mechanisms in visual object recognition: reply to Tarr and Bülthoff (1995). J Exp Psychol Hum Percept Perform 21(6):1506–1514
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8:679–698
Csiszár I, Shields PC (2004) Information theory and statistics: a tutorial. Now Publishers, Hanover
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR, pp 886–893
Dickinson SJ, Pentland AP, Rosenfeld A (1991) From volumes to views: an approach to 3-d object recognition. In: Workshop on directions in automated CAD-based vision, pp 85–96
Hayward WG, Tarr MJ (1997) Testing conditions for viewpoint invariance in object recognition. J Exp Psychol Hum Percept Perform 23(5):1511–1521
Hu W, Zhu S-C (2010) Learning a probabilistic model mixing 3D and 2D primitives for view invariant object recognition. In: Computer vision and pattern recognition, pp 2273–2280
Julesz B (1981) Textons, the elements of texture perception, and their interactions. Nature 290(5802):91–97
Koenderink JJ, Doorn AJ (1976) The singularities of the visual mapping. Biol Cybern 24(1):51–59
Koenderink JJ, Doorn AJ (1979) The internal representation of solid shape with respect to vision. Biol Cybern 32(4):211–216
Leibe B, Schiele B (2003) Analyzing appearance and contour based methods for object categorization. In: CVPR
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Marr D (1982) Vision. A computational investigation into the human representation and processing of visual information. Freeman, New York
Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381:607–609
Poggio T, Edelman S (1990) A network that learns to recognize three-dimensional objects. Nature 343(6255):263–266
Savarese S, Fei-Fei L (2007) 3d generic object categorization, localization and pose estimation. In: ICCV
Si Z, Zhu S-C (2012) Learning hybrid image templates (hit) by information projection. IEEE Trans Pattern Anal Mach Intell 34(7):1354–1367
Tarr MJ, Bülthoff HH (1995) Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993). J Exp Psychol Hum Percept Perform 21(6):1494–1505
Ullman S, Basri R (1991) Recognition by linear combinations of models. IEEE Trans Pattern Anal Mach Intell 13(10):992–1006
Acknowledgements
This work is supported by DARPA grant FA 8650-11-1-7149, NSF IIS1018751 and MURI grant ONR N00014-10-1-0933.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Hu, W., Si, Z., Zhu, SC. (2013). Structure vs. Appearance and 3D vs. 2D? A Numeric Answer. In: Dickinson, S., Pizlo, Z. (eds) Shape Perception in Human and Computer Vision. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5195-1_17
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5195-1_17
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5194-4
Online ISBN: 978-1-4471-5195-1
eBook Packages: Computer ScienceComputer Science (R0)