Key Object Driven Multi-category Object Recognition, Localization and Tracking Using Spatio-temporal Context

  • Yuan Li
  • Ram Nevatia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5305)


In this paper we address the problem of recognizing, localizing and tracking multiple objects of different categories in meeting room videos. Difficulties such as lack of detail and multi-object co-occurrence make it hard to directly apply traditional object recognition methods. Under such circumstances, we show that incorporating object-level spatio-temporal relationships can lead to significant improvements in inference of object category and state. Contextual relationships are modeled by a dynamic Markov random field, in which recognition, localization and tracking are done simultaneously. Further, we define human as the key object of the scene, which can be detected relatively robustly and therefore is used to guide the inference of other objects. Experiments are done on the CHIL meeting video corpus. Performance is evaluated in terms of object detection and false alarm rates, object recognition confusion matrix and pixel-level accuracy of object segmentation.


Object Recognition False Alarm Rate Interest Point Object Category Object Segmentation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  2. 2.
    Cao, L., Fei-Fei, L.: Spatially coherent latent topic model for concurrent object segmentation and classification. In: ICCV (2007)Google Scholar
  3. 3.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)Google Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  5. 5.
    Wu, B., Nevatia, R.: Cluster boosted tree classifier for multi-view, multi-pose object detection. In: ICCV (2007)Google Scholar
  6. 6.
    Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Learning hierarchical models of scenes, objects, and parts. In: ICCV (2005)Google Scholar
  7. 7.
    Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. In: CVPR (2006)Google Scholar
  8. 8.
    Torralba, A., Murphy, K., Freeman, W., Rubin, M.: Context-based vision system for place and object recognition. In: ICCV (2003)Google Scholar
  9. 9.
    Li, L.-J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  10. 10.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: ECCV (2006)Google Scholar
  11. 11.
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV (2007)Google Scholar
  12. 12.
    Carbonetto, P., de Freitas, N., Barnard, K.: A statistical model for general contextual object recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)Google Scholar
  13. 13.
    Moore, D.J., Essa, I.A., Heyes, M.H.: Exploiting human actions and object context for recognition tasks. In: ICCV (1999)Google Scholar
  14. 14.
    Peursum, P., West, G., Venkatesh, S.: Combining image regions and human activity for indirect object recognition in indoor wide-angle views. In: ICCV (2005)Google Scholar
  15. 15.
    Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: CVPR (2007)Google Scholar
  16. 16.
    Yu, T., Wu, Y.: Collaborative tracking of multiple targets. In: CVPR (2004)Google Scholar
  17. 17.
    Wu, B., Nevatia, R.: Tracking of multiple humans in meetings. In: V4HCI (2006)Google Scholar
  18. 18.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Annals of Statistics 28(2), 337–407 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Transaction on Pattern Analysis and Machine Intelligence 24(5), 603–619 (2002)CrossRefGoogle Scholar
  20. 20.
    Sutton, C., McCallum, A.: Piecewise training for undirected models. In: Conference on Uncertainty in Artificial Intelligence (2005)Google Scholar
  21. 21.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman, San Mateo (1988)Google Scholar
  22. 22.
    Sudderth, E.B., Ihler, A.T., Freeman, W.T., Willsky, A.S.: Nonparametric belief propagation. In: CVPR (2003)Google Scholar
  23. 23.
    CHIL: The chil project,

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yuan Li
    • 1
  • Ram Nevatia
    • 1
  1. 1.Institute for Robotics and Intelligent SystemsUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations