Skip to main content

Compositional Object Recognition, Segmentation, and Tracking in Video

  • Conference paper
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR 2007)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4679))

Abstract

The complexity of visual representations is substantially limited by the compositional nature of our visual world which, therefore, renders learning structured object models feasible. During recognition, such structured models might however be disadvantageous, especially under the high computational demands of video. This contribution presents a compositional approach to video analysis that demonstrates the value of compositionality for both, learning of structured object models and recognition in near real-time. We unite category-level, multi-class object recognition, segmentation, and tracking in the same probabilistic graphical model. A model selection strategy is pursued to facilitate recognition and tracking of multiple objects that appear simultaneously in a video. Object models are learned from videos with heavy clutter and camera motion where only an overall category label for a training video is provided, but no hand-segmentation or localization of objects is required. For evaluation purposes a video categorization database is assembled and experiments convincingly demonstrate the suitability of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. PAMI 26(11) (2004)

    Google Scholar 

  2. Avidan, S.: Ensemble tracking. In: CVPR (2005)

    Google Scholar 

  3. Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94(2) (1987)

    Google Scholar 

  4. Brostow, G.J., Cipolla, R.: Unsupervised bayesian detection of independent motion in crowds. In: CVPR (2006)

    Google Scholar 

  5. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5) (2003)

    Google Scholar 

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  7. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, Springer, Heidelberg (2006)

    Google Scholar 

  8. Demirci, F., Shokoufandeh, A., Keselman, Y., Bretzner, L., Dickinson, S.: Object recognition as many-to-many feature matching. IJCV 69(2) (2006)

    Google Scholar 

  9. Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)

    Google Scholar 

  10. Gavrila, D.M., Giebel, J., Munder, S.: Vision-based pedestrian detection: The protector+ system. In: IEEE Intelligent Vehicles Symposium (2004)

    Google Scholar 

  11. Geman, S., Potter, D.F., Chi, Z.: Composition Systems. Quarterly of Applied Mathematics 60 (2002)

    Google Scholar 

  12. Goldberger, J., Greenspann, H.: Context-based segmentation of image sequences. PAMI 28(3) (2006)

    Google Scholar 

  13. Leibe, B., Schiele, B.: Scale-invariant object categorization using a scale-adaptive mean-shift search. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) Pattern Recognition. LNCS, vol. 3175, Springer, Heidelberg (2004)

    Google Scholar 

  14. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2) (2004)

    Google Scholar 

  15. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI (1981)

    Google Scholar 

  16. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley & Sons, Chichester (1997)

    MATH  Google Scholar 

  17. Ommer, B., Buhmann, J.M.: Object categorization by compositional graphical models. In: Rangarajan, A., Vemuri, B., Yuille, A.L. (eds.) EMMCVPR 2005. LNCS, vol. 3757, Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  18. Ommer, B., Buhmann, J.M.: Learning compositional categorization models. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, Springer, Heidelberg (2006)

    Google Scholar 

  19. Puzicha, J., Hofmann, T., Buhmann, J.M.: Histogram clustering for unsupervised segmentation and image retrieval. Pattern Recognition Letters 20 (1999)

    Google Scholar 

  20. Roth, V., Lange, T.: Adaptive feature selection in image segmentation. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) Pattern Recognition. LNCS, vol. 3175, Springer, Heidelberg (2004)

    Google Scholar 

  21. Roth, V., Tsuda, K.: Pairwise coupling for machine recognition of hand-printed japanese characters. In: CVPR (2001)

    Google Scholar 

  22. Shi, J., Tomasi, C.: Good features to track. In: CVPR (1994)

    Google Scholar 

  23. Sivic, J., Schaffalitzky, F., Zisserman, A.: Object level grouping for video shots. IJCV 67(2) (2006)

    Google Scholar 

  24. Tu, Z.W., Chen, X.R., Yuille, A.L., Zhu, S.C.: Image parsing: Unifying segmentation, detection and recognition. IJCV 63(2) (2005)

    Google Scholar 

  25. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)

    Google Scholar 

  26. Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: ICCV (2003)

    Google Scholar 

  27. Winkler, G.: Image Analysis, Random Fields and Markov Chain Monte Carlo Methods—A Mathematical Introduction, 2nd edn. Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alan L. Yuille Song-Chun Zhu Daniel Cremers Yongtian Wang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ommer, B., Buhmann, J.M. (2007). Compositional Object Recognition, Segmentation, and Tracking in Video. In: Yuille, A.L., Zhu, SC., Cremers, D., Wang, Y. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2007. Lecture Notes in Computer Science, vol 4679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74198-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74198-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74195-4

  • Online ISBN: 978-3-540-74198-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics