Advertisement

Human Action Recognition Using Distribution of Oriented Rectangular Patches

  • Nazlı İkizler
  • Pınar Duygulu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4814)

Abstract

We describe a “bag-of-rectangles” method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distribution of these rectangular patches. In order to carry the information from the spatial domain described by the bag-of-rectangles descriptor to temporal domain for recognition of the actions, four different methods are proposed. These are namely, (i) frame by frame voting, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis by rectangular patches, (iii) a classifier based approach using SVMs, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the descriptor. The detailed experiments are carried out on the action dataset of Blank et. al. High success rates (100%) prove that with a very simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations.

Keywords

IEEE Computer Society Gesture Recognition Dynamic Time Warping IEEE Conf Human Action Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV, pp. 1395–1402 (2005)Google Scholar
  2. 2.
    Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE T. Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001)CrossRefGoogle Scholar
  3. 3.
    Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 994–999. IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conf. on Computer Vision and Pattern Recognition, vol. I, pp. 886–893. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  5. 5.
    Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV 2003, pp. 726–733 (2003)Google Scholar
  6. 6.
    Fei-Fei, L., Perona, P.: A bayesian heirarcical model for learning natural scene categories. In: IEEE Conf. on Computer Vision and Pattern Recognition, IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  7. 7.
    Forsyth, D., Fleck, M.: Body plans. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 678–683. IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
  8. 8.
    Forsyth, D., Arikan, O., Ikemoto, L., O’Brien, J., Ramanan, D.: Computational studies of human motion i: Tracking and animation. Foundations and Trends in Computer Graphics and Vision 1(2/3) (2006)Google Scholar
  9. 9.
    Freeman, W., Roth, M.: Orientation histograms for hand gesture recognition. In: International Workshop on Automatic Face and Gesture Recognition (1995)Google Scholar
  10. 10.
    Hong, P., Turk, M., Huang, T.: Gesture modeling and recognition using finite state machines. In: Int. Conf. Automatic Face and Gesture Recognition, pp. 410–415 (2000)Google Scholar
  11. 11.
    Hongeng, S., Nevatia, R., Bremond, F.: Video-based event recognition: activity representation and probabilistic recognition methods. Computer Vision and Image Understanding 96(2), 129–162 (2004)CrossRefGoogle Scholar
  12. 12.
    Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE transactions on systems, man, and cybernetics c: applications and reviews 34(3) (2004)Google Scholar
  13. 13.
    Ikizler, N., Forsyth, D.: Searching video for complex activities with finite state models. In: IEEE Conf. on Computer Vision and Pattern Recognition (2007)Google Scholar
  14. 14.
    Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Computer Vision 43(1), 29–44 (2001)zbMATHCrossRefGoogle Scholar
  15. 15.
    Ling, H., Okada, K.: Diffusion distance for histogram comparison. In: IEEE Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 246–253 (2006)Google Scholar
  16. 16.
    Monay, F., Gatica-Perez, D.: Modeling semantic aspects for cross-media image retrieval. IEEE T. Pattern Analysis and Machine Intelligence (accepted for publication)Google Scholar
  17. 17.
    Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: IEEE Conf. on Computer Vision and Pattern Recognition, IEEE Computer Society Press, Los Alamitos (2007)Google Scholar
  18. 18.
    Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Computer Vision and Image Understanding 96(2), 163–180 (2004)CrossRefGoogle Scholar
  19. 19.
    Pinhanez, C., Bobick, A.: Pnf propagation and the detection of actions described by temporal intervals. In: DARPA IU Workshop, pp. 227–234 (1997)Google Scholar
  20. 20.
    Pinhanez, C., Bobick, A.: Human action detection using pnf propagation of temporal constraints. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 898–904. IEEE Computer Society Press, Los Alamitos (1998)Google Scholar
  21. 21.
    Polana, R., Nelson, R.: Detecting activities. In: IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2–7. IEEE Computer Society Press, Los Alamitos (1993)CrossRefGoogle Scholar
  22. 22.
    Ramanan, D., Forsyth, D., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In: IEEE Conf. on Computer Vision and Pattern Recognition, vol. I, pp. 271–278. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  23. 23.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Computer Vision 40(2), 99–121 (2000)zbMATHCrossRefGoogle Scholar
  24. 24.
    Siskind, J.M.: Reconstructing force-dynamic models from video sequences. Artificial Intelligence 151, 91–154 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Sivic, J., Russell, B., Efros, A., Zisserman, A., Freeman, W.: Discovering object categories in image collections. In: Int. Conf. on Computer Vision (2005)Google Scholar
  26. 26.
    Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional models for contextual human motion recognition. In: Int. Conf. on Computer Vision, pp. 1808–1815 (2005)Google Scholar
  27. 27.
    Wilson, A., Bobick, A.: Parametric hidden markov models for gesture recognition. IEEE T. Pattern Analysis and Machine Intelligence 21(9), 884–900 (1999)CrossRefGoogle Scholar
  28. 28.
    Yu-Gang Jiang, C.-W.N., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Int. Conf. Image Video Retrieval (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Nazlı İkizler
    • 1
  • Pınar Duygulu
    • 1
  1. 1.Dept. of Computer Engineering, Bilkent University, AnkaraTurkey

Personalised recommendations