Compositional Object Recognition, Segmentation, and Tracking in Video

Ommer, Björn; Buhmann, Joachim M.

doi:10.1007/978-3-540-74198-5_25

Björn Ommer¹ &
Joachim M. Buhmann¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4679))

Included in the following conference series:

International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition

1391 Accesses
2 Citations

Abstract

The complexity of visual representations is substantially limited by the compositional nature of our visual world which, therefore, renders learning structured object models feasible. During recognition, such structured models might however be disadvantageous, especially under the high computational demands of video. This contribution presents a compositional approach to video analysis that demonstrates the value of compositionality for both, learning of structured object models and recognition in near real-time. We unite category-level, multi-class object recognition, segmentation, and tracking in the same probabilistic graphical model. A model selection strategy is pursued to facilitate recognition and tracking of multiple objects that appear simultaneously in a video. Object models are learned from videos with heavy clutter and camera motion where only an overall category label for a training video is provided, but no hand-segmentation or localization of objects is required. For evaluation purposes a video categorization database is assembled and experiments convincingly demonstrate the suitability of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. PAMI 26(11) (2004)
Google Scholar
Avidan, S.: Ensemble tracking. In: CVPR (2005)
Google Scholar
Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94(2) (1987)
Google Scholar
Brostow, G.J., Cipolla, R.: Unsupervised bayesian detection of independent motion in crowds. In: CVPR (2006)
Google Scholar
Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5) (2003)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
MATH Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, Springer, Heidelberg (2006)
Google Scholar
Demirci, F., Shokoufandeh, A., Keselman, Y., Bretzner, L., Dickinson, S.: Object recognition as many-to-many feature matching. IJCV 69(2) (2006)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)
Google Scholar
Gavrila, D.M., Giebel, J., Munder, S.: Vision-based pedestrian detection: The protector+ system. In: IEEE Intelligent Vehicles Symposium (2004)
Google Scholar
Geman, S., Potter, D.F., Chi, Z.: Composition Systems. Quarterly of Applied Mathematics 60 (2002)
Google Scholar
Goldberger, J., Greenspann, H.: Context-based segmentation of image sequences. PAMI 28(3) (2006)
Google Scholar
Leibe, B., Schiele, B.: Scale-invariant object categorization using a scale-adaptive mean-shift search. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) Pattern Recognition. LNCS, vol. 3175, Springer, Heidelberg (2004)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2) (2004)
Google Scholar
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI (1981)
Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley & Sons, Chichester (1997)
MATH Google Scholar
Ommer, B., Buhmann, J.M.: Object categorization by compositional graphical models. In: Rangarajan, A., Vemuri, B., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, Springer, Heidelberg (2005)
Chapter Google Scholar
Ommer, B., Buhmann, J.M.: Learning compositional categorization models. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, Springer, Heidelberg (2006)
Google Scholar
Puzicha, J., Hofmann, T., Buhmann, J.M.: Histogram clustering for unsupervised segmentation and image retrieval. Pattern Recognition Letters 20 (1999)
Google Scholar
Roth, V., Lange, T.: Adaptive feature selection in image segmentation. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) Pattern Recognition. LNCS, vol. 3175, Springer, Heidelberg (2004)
Google Scholar
Roth, V., Tsuda, K.: Pairwise coupling for machine recognition of hand-printed japanese characters. In: CVPR (2001)
Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994)
Google Scholar
Sivic, J., Schaffalitzky, F., Zisserman, A.: Object level grouping for video shots. IJCV 67(2) (2006)
Google Scholar
Tu, Z.W., Chen, X.R., Yuille, A.L., Zhu, S.C.: Image parsing: Unifying segmentation, detection and recognition. IJCV 63(2) (2005)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
Google Scholar
Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: ICCV (2003)
Google Scholar
Winkler, G.: Image Analysis, Random Fields and Markov Chain Monte Carlo Methods—A Mathematical Introduction, 2nd edn. Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational Science, ETH Zurich, 8092 Zurich, Switzerland
Björn Ommer & Joachim M. Buhmann

Authors

Björn Ommer
View author publications
You can also search for this author in PubMed Google Scholar
Joachim M. Buhmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alan L. Yuille Song-Chun Zhu Daniel Cremers Yongtian Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ommer, B., Buhmann, J.M. (2007). Compositional Object Recognition, Segmentation, and Tracking in Video. In: Yuille, A.L., Zhu, SC., Cremers, D., Wang, Y. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2007. Lecture Notes in Computer Science, vol 4679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74198-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-74198-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74195-4
Online ISBN: 978-3-540-74198-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics