This contribution proposes a compositionality architecture for visual object categorization, i.e., learning and recognizing multiple visual object classes in unsegmented, cluttered real-world scenes. We propose a sparse image representation based on localized feature histograms of salient regions. Category specific information is then aggregated by using relations from perceptual organization to form compositions of these descriptors. The underlying concept of image region aggregation to condense semantic information advocates for a statistical representation founded on graphical models. On the basis of this structure, objects and their constituent parts are localized.

To complement the learned dependencies between compositions and categories, a global shape model of all compositions that form an object is trained. During inference, belief propagation reconciles bottom-up feature-driven categorization with top-down category models. The system achieves a competitive recognition performance on the standard CalTech database.


Bayesian Network Interest Point Object Categorization Perceptual Organization Factor Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Machine Intell. 26(11) (2004)Google Scholar
  2. 2.
    Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94(2), 115–147 (1987)CrossRefGoogle Scholar
  3. 3.
    Bienenstock, E., Geman, S., Potter, D.: Compositionality, mdl priors, and object recognition. In: NIPS, vol. 9 (1997)Google Scholar
  4. 4.
    Borenstein, E., Sharon, E., Ullman, S.: Combining top-down and bottom-up segmentation. In: CVPR Workshop on Perceptual Organization in Computer Vision (2004)Google Scholar
  5. 5.
    Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: ECCV (2002)Google Scholar
  6. 6.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  7. 7.
    Fergus, R., Perona, P., Zisserman, A.: A visual category filter for google images. In: ECCV (2004)Google Scholar
  8. 8.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. 22(1) (1973)Google Scholar
  9. 9.
    Geman, S., Potter, D.F., Chi, Z.: Composition Systems. Technical report, Division of Applied Mathematics, Brown University, Providence, RI (1998)Google Scholar
  10. 10.
    Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inform. Theory 47(2) (2001)Google Scholar
  11. 11.
    Lades, M., Vorbrüggen, J.C., Buhmann, J.M., Lange, J., von der Malsburg, C., Würtz, R.P., Konen, W.: Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput. 42 (1993)Google Scholar
  12. 12.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV Workshop on Stat. Learning in Computer Vision (2004)Google Scholar
  13. 13.
    Leibe, B., Schiele, B.: Scale-invariant object categorization using a scale-adaptive mean-shift search. In: Pattern Recognition, DAGM (2004)Google Scholar
  14. 14.
    Lowe, D.G.: Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Norwell (1985)Google Scholar
  15. 15.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Computer Vision 60(2) (2004)Google Scholar
  16. 16.
    Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Computer Vision 60(1) (2004)Google Scholar
  17. 17.
    Murphy, K., Weiss, Y., Jordan, M.: Loopy-belief propagation for approximate inference: An empirical study. In: UAI (1999)Google Scholar
  18. 18.
    Ommer, B., Buhmann, J.M.: A compositionality architecture for perceptual feature grouping. In: EMMCVPR (2003)Google Scholar
  19. 19.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)Google Scholar
  20. 20.
    Veltkamp, R.C., Tanase, M.: Content-based image and video retrieval. In: Marques, O., Furht, B. (eds.) A Survey of Content-Based Image Retrieval Systems. Kluwer, Dordrecht (2002)Google Scholar
  21. 21.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: ECCV (2000)Google Scholar
  22. 22.
    Winkler, G.: Image Analysis, Random Fields and Markov Chain Monte Carlo Methods—A Mathematical Introduction, 2nd edn. Springer, Heidelberg (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Björn Ommer
    • 1
  • Joachim M. Buhmann
    • 1
  1. 1.Institute of Computational ScienceETH ZurichZurichSwitzerland

Personalised recommendations