Learning Hierarchical Representations of Object Categories for Robot Vision

Leonardis, Aleš; Fidler, Sanja

doi:10.1007/978-3-642-14743-2_9

Aleš Leonardis⁶ &
Sanja Fidler⁶

Part of the book series: Springer Tracts in Advanced Robotics ((STAR,volume 66))

3731 Accesses
4 Citations

Abstract

This paper presents our recently developed approach to constructing a hierarchical representation of visual input that aims to enable recognition and detection of a large number of object categories. Inspired by the principles of efficient indexing, robust matching, and ideas of compositionality, our approach learns a hierarchy of spatially flexible compositions, i.e. parts, in an unsupervised, statistics-driven manner. Starting with simple, frequent features, we learn the statistically most significant compositions (parts composed of parts), which consequently define the next layer. Parts are learned sequentially, layer after layer, optimally adjusting to the visual data. Lower layers are learned in a category-independent way to obtain complex, yet sharable visual building blocks, which is a crucial step towards a scalable representation. Higher layers of the hierarchy, on the other hand, are constructed by using specific categories, achieving a category representation with a small number of highly generalizable parts that gained their structural flexibility through composition within the hierarchy. Built in this way, new categories can be efficiently and continuously added to the system by adding a small number of parts only in the higher layers. The approach is demonstrated on a large collection of images and a variety of object categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cognitive Systems for Cognitive Assistants (CoSy). EU FP6-004250-IP IST Cognitive Systems Integrated project (2004-2008) http://www.cognitivesystems.org/
Agarwal, A., Triggs, B.: Hyperfeatures - multilevel local coding for visual recognition. In: ECCV, vol. (1), pp. 30–43 (2006)
Google Scholar
Amit, Y., Geman, D.: A computational model for visual selection. Neural Comp. 11(7), 1691–1715 (1999)
Article Google Scholar
Barlow, H.B.: Conditions for versatile learning, Helmholtz’s unconscious inference, and the task of perception. Vision Research 30, 1561–1571 (1990)
Article Google Scholar
Brincat, S.L., Connor, C.E.: Dynamic shape synthesis in posterior inferotemporal cortex. Neuron. 49(1), 17–24 (2006)
Article Google Scholar
Califano, A., Mohan, R.: Multidimensional indexing for recognizing visual shapes. PAMI 16(4), 373–392 (1994)
Google Scholar
Crandall, D.J., Huttenlocher, D.P.: Weakly supervised learning of part-based spatial models for visual object recognition. In: ECCV, vol. (1), pp. 16–29 (2006)
Google Scholar
Edelman, S., Intrator, N.: Towards structural systematicity in distributed, statically bound visual representations. Cognitive Science 27, 73–110 (2003)
Article Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR, vol. (2), pp. 264–271 (2003)
Google Scholar
Fidler, S., Leonardis, A.: Towards scalable representations of visual categories: Learning a hierarchy of parts. In: CVPR (2007)
Google Scholar
Fiser, J., Aslin, R.N.: Statistical learning of new visual feature combinations by infants. Proc. Natl. Acad. Sci. U.S.A. 99(24), 15822–15826 (2002)
Article Google Scholar
Fleuret, F., Geman, D.: Coarse-to-fine face detection. IJCV 41(1/2), 85–107 (2001)
Article MATH Google Scholar
Fukushima, K., Miyake, S., Ito, T.: Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE SMC 13(3), 826–834 (1983)
Google Scholar
Geman, S., Potter, D., Chi, Z.: Composition systems. Quarterly of App. Math. 60(4), 707–736 (2002)
MATH MathSciNet Google Scholar
Grauman, K., Darrell, T.: Efficient image matching with distributions of local invariant features. In: CVPR, vol. (2), pp. 627–634 (2005)
Google Scholar
Huang, F.-J., LeCun, Y.: Large-scale learning with svm and convolutional nets for generic object categorization. In: CVPR, pp. 284–291 (2006)
Google Scholar
Jamone, L., Metta, G., Nori, F., Sandini, G.: James: A humanoid robot acting over an unstructured world. In: 6th IEEE-RAS International Conference on Humanoid Robots, pp. 143–150 (2006)
Google Scholar
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV 2004, SLCV Workshop (2004)
Google Scholar
Mel, B.W., Fiser, J.: Minimizing binding errors using learned conjunctive features. Neural Computation 12(4), 731–762 (2000)
Article Google Scholar
Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple object class detection with a generative model. In: CVPR 2006, pp. 26–36 (2006)
Google Scholar
Mutch, J., Lowe, D.G.: Multiclass object recognition with sparse, localized features. In: CVPR 2006, pp. 11–18 (2006)
Google Scholar
Opelt, A., Pinz, A., Zisserman, A.: A boundary-fragment-model for object detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 575–588. Springer, Heidelberg (2006)
Chapter Google Scholar
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neurosc. 2(11), 1019–1025 (1999)
Article Google Scholar
Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: CVPR, vol. (2), pp. 2033–2040 (2006)
Google Scholar
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Object recognition with cortex-like mechanisms. PAMI 29(3), 411–426 (2007)
Google Scholar
Sudderth, E., Torralba, A., Freeman, W., Willsky, A.: Learning hierarchical models of scenes, objects, and parts. In: ICCV, pp. 1331–1338 (2005)
Google Scholar
Tsunoda, K., Yamane, Y., Nishizaki, M., Tanifuji, M.: Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nature Neuroscience (4), 832–838 (2001)
Google Scholar
Ullman, S., Epshtein, B.: Visual Classification by a Hierarchy of Extended Features. Towards Category-Level Object Recognition, pp. 321–344. Springer, Heidelberg (2006)
Google Scholar
Welke, K., Oztop, E., Ude, A., Dillmann, R., Cheng, G.: Learning feature representations for an object recognition system. In: 6th IEEE-RAS International Conference on Humanoid Robots, pp. 290–295 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Slovenia
Aleš Leonardis & Sanja Fidler

Authors

Aleš Leonardis
View author publications
You can also search for this author in PubMed Google Scholar
Sanja Fidler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mechanical Engineering, Osaka University, 2-1 Yamadaoka, 565-0871, Suita, Osaka, Japan
Makoto Kaneko
Department of Mechano-Informatics, Faculty of Engineering, Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, 113-8656, Bunkyo-ku, Tokyo, Japan
Yoshihiko Nakamura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leonardis, A., Fidler, S. (2010). Learning Hierarchical Representations of Object Categories for Robot Vision. In: Kaneko, M., Nakamura, Y. (eds) Robotics Research. Springer Tracts in Advanced Robotics, vol 66. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14743-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-14743-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14742-5
Online ISBN: 978-3-642-14743-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics