Recognizing Activities with Multiple Cues

  • Rahul Biswas
  • Sebastian Thrun
  • Kikuo Fujimura
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4814)


In this paper, we introduce a first-order probabilistic model that combines multiple cues to classify human activities from video data accurately and robustly. Our system works in a realistic office setting with background clutter, natural illumination, different people, and partial occlusion. The model we present is compact, requires only fifteen sentences of first-order logic grouped as a Dynamic Markov Logic Network (DMLNs) to implement the probabilistic model and leverages existing state-of-the-art work in pose detection and object recognition.


Object Recognition Activity Recognition Information Gain Confusion Matrix Markov Random Field 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Allen, D., Darwiche, A.: New advances in inference by recursive conditioning. In: UAI 2003 (2003)Google Scholar
  3. 3.
    Bacchus, F., Dalmao, S., Pitassi, T.: Value elimination: Bayesian inference via backtracking search. In: UAI 2003 (2003)Google Scholar
  4. 4.
    Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. In: NIPS 2000 (2000)Google Scholar
  5. 5.
    Berg, A.: Shape Matching and Object Recognition. PhD thesis, University of California, Berkeley, (Adviser-Jitendra Malik) (2005)Google Scholar
  6. 6.
    Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR 2005 (2005)Google Scholar
  7. 7.
    Boiman, O., Irani, M.: Detecting irregularities in images and in video. In: ICCV 2005 (2005)Google Scholar
  8. 8.
    Dechter, R., Mateescu, R.: Mixtures of deterministic-probabilistic networks and their and/or search space. In: UAI 2004 (2004)Google Scholar
  9. 9.
    Doucet, A., Freitas, N., Murphy, K., Russell, S.: Rao-blackwellised particle filtering for dynamic bayesian networks. In: UAI 2000 (2000)Google Scholar
  10. 10.
    Enderton, H.: A Mathematical Introduction to Logic. Academic Press, Inc., Florida (1972)zbMATHGoogle Scholar
  11. 11.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. In: PAMI 2006 (2006)Google Scholar
  12. 12.
    Huang, C., Ai, H., Li, Y., Lao, S.: Vector boosting for rotation invariant multi-view face detection. In: ICCV 2005 (2005)Google Scholar
  13. 13.
    Liao, L., Fox, D., Kautz, H.: Extracting places and activities from gps traces using hierarchical conditional random fields. In: IJRR 2007 (2007)Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2) (2004)Google Scholar
  15. 15.
    McAllester, D., Collins, M., Pereira, F.: Case-factor diagrams for structured probabilistic modeling. In: UAI 2004 (2004)Google Scholar
  16. 16.
    Morency, L., Sidner, C., Lee, C., Darrell, T.: The role of context in head gesture recognition. In: AAAI 2006 (2006)Google Scholar
  17. 17.
    Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: combining segmentation and recognition. In: CVPR 2004 (2004)Google Scholar
  18. 18.
    Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees: A graphical model relating features, objects, and scenes. In: NIPS 2003 (2003)Google Scholar
  19. 19.
    Mutch, J., Lowe, D.: Multiclass object recognition with sparse, localized features. In: CVPR 2006 (2006)Google Scholar
  20. 20.
    Oliva, A., Torralba, A.: Building the gist of a scene: The role of global image features in recognition. Progress in Brain Research: Visual Perception 155 (2006)Google Scholar
  21. 21.
    Opelt, A., Pinz, A., Zisserman, A.: A boundary fragment model for object detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, Springer, Heidelberg (2006)Google Scholar
  22. 22.
    Ormoneit, D., Black, M., Hastie, T., Kjellstrom, H.: Representing cyclic human motion using function analysis. In: IVC 2005 (2005)Google Scholar
  23. 23.
    Pasula, H., Russell, S.: Approximate inference for first-order probabilistic languages. In: IJCAI 2001 (2001)Google Scholar
  24. 24.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)Google Scholar
  25. 25.
    Philipose, M., Fishkin, K., Perkowitz, M., Patterson, D., Fox, D., Kautz, H., Haehnel, D.: Inferring activities from interactions with objects. In: IEEE-PC 2004 (2004)Google Scholar
  26. 26.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition, pp. 267–296 (1990)Google Scholar
  27. 27.
    Ramanan, D., Forsyth, D., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In: CVPR 2005 (2005)Google Scholar
  28. 28.
    Ren, X., Berg, A., Malik, J.: Recovering human body configurations using pairwise constraints between parts. In: ICCV 2005 (2005)Google Scholar
  29. 29.
    Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1-2) (2006)Google Scholar
  30. 30.
    Russell, B.C., Efros, A.A., Sivic, J., Freeman, W., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR 2006 (2006)Google Scholar
  31. 31.
    Sanghai, S., Domingos, P., Weld, D.: Learning models of relational stochastic processes. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, Springer, Heidelberg (2005)Google Scholar
  32. 32.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: ICCV 2003 (2003)Google Scholar
  33. 33.
    Sidenbladh, H., Black, M.: Learning the statistics of people in images and video. IJCV 54(1-3) (2003)Google Scholar
  34. 34.
    Sigal, L., Black, M.: Predicting 3d people from 2d pictures. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2006. LNCS, vol. 4069, Springer, Heidelberg (2006)Google Scholar
  35. 35.
    Torralba. A.: Contextual priming for object detection. International Journal of Computer Vision 53(2) (2003)Google Scholar
  36. 36.
    Torralba, A., Murphy, K.: Context-based vision system for place and object recognition. In: ICCV 2003 (2003)Google Scholar
  37. 37.
    Viola, P., Jones, M.: Robust real time object detection. In: SCTV 2001 (2001)Google Scholar
  38. 38.
    Wang, S., Quattoni, A., Morency, L., Demirdjian, D., Darrell, T.: Hidden conditional random fields for gesture recognition. In: CVPR 2006 (2006)Google Scholar
  39. 39.
    Wei, W., Erenrich, J., Selman, B.: Towards efficient sampling: Exploiting random walk strategies. In: AAAI 2004 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Rahul Biswas
    • 1
  • Sebastian Thrun
    • 1
  • Kikuo Fujimura
    • 2
  1. 1.Stanford University 
  2. 2.Honda Research Institute 

Personalised recommendations