Human Action Recognition from Inter-temporal Dictionaries of Key-Sequences

  • Analí Alfaro
  • Domingo Mery
  • Alvaro Soto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8333)

Abstract

This paper addresses the human action recognition in video by proposing a method based on three main processing steps. First, we tackle problems related to intraclass variations and differences in video lengths. We achieve this by reducing an input video to a set of key-sequences that represent atomic meaningful acts of each action class. Second, we use sparse coding techniques to learn a representation for each key-sequence. We then join these representations still preserving information about temporal relationships. We believe that this is a key step of our approach because it provides not only a suitable shared representation to characterize atomic acts, but it also encodes global temporal consistency among these acts. Accordingly, we call this representation inter-temporal acts descriptor. Third, we use this representation and sparse coding techniques to classify new videos. Finally, we show that, our approach outperforms several state-of-the-art methods when is tested using common benchmarks.

Keywords

human action recognition key-sequences sparse coding inter-temporal acts descriptor 

References

  1. 1.
    Aharon, M., Elad, M., Bruckstein, A.: K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Trans. on Signal Processing 54, 4311–4322 (2006)CrossRefGoogle Scholar
  2. 2.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. In: Proc. in Computer Vision and Pattern Recognition (2009)Google Scholar
  3. 3.
    Bobick, A., Davis, J.: The Recognition of Human Movement Using Temporal Templates. IEEE Trans. on Pattern Analysis and Machine Intelligence 23, 257–267 (2001)CrossRefGoogle Scholar
  4. 4.
    Carlsson, S., Sullivan, J.: Action Recognition by Shape Matching to Key Frames. In: Workshop on Models versus Exemplars in Computer Vision (2001)Google Scholar
  5. 5.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior Recognition via Sparse Spatio-Temporal Features. In: VS-PETS (2005)Google Scholar
  6. 6.
    Efros, A., Berg, A., Berg, E., Mori, G., Malik, J.: Recognizing Action at a Distance. In: Proc. in International Conference on Computer Vision (2003)Google Scholar
  7. 7.
    Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Proc. in Computer Vision and Pattern Recognition (2008)Google Scholar
  8. 8.
    Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: Proc. in Computer Vision and Pattern Recognition (2011)Google Scholar
  9. 9.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. in International Conference on Computer Vision (2005)Google Scholar
  10. 10.
    Guha, T., Ward, R.: Learning Sparse Representations for Human Action Recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 34, 1576–1588 (2012)CrossRefGoogle Scholar
  11. 11.
    Guo, K., Ishwar, P., Konrad, J.: Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels. In: Proc. in International Conference on Pattern Recognition (2010)Google Scholar
  12. 12.
    Jiang, Y.-G., Dai, Q., Xue, X., Liu, W., Ngo, C.-W.: Trajectory-Based Modeling of Human Actions with Motion Reference Points. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 425–438. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: Proc. in British Machine Vision Conference (2008)Google Scholar
  14. 14.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. in International Conference on Computer Vision (2003)Google Scholar
  15. 15.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. in Computer Vision and Pattern Recognition (2008)Google Scholar
  16. 16.
    Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: Proc. in Computer Vision and Pattern Recognition (2008)Google Scholar
  17. 17.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Proc. in Computer Vision and Pattern Recognition (2011)Google Scholar
  18. 18.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Pati, Y., Rezaiifar, R., Krishnaprasad, P.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition. In: Proceedings of the 27th Annual Asilomar Conference on Signals, Systems, and Computers (1993)Google Scholar
  20. 20.
    Rodriguez, M., Ahmed, J., Shah, M.: Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition. In: Proc. In: Computer Vision and Pattern Recognition (2008)Google Scholar
  21. 21.
    Schindler, K., Van Gool, L.: Action Snippets: How many frames does human action recognition require? In: Proc. in Computer Vision and Pattern Recognition (2008)Google Scholar
  22. 22.
    Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: Proc. in International Conference on Pattern Recognition (2004)Google Scholar
  23. 23.
    Tran, K., Kakadiaris, I., Shah, S.: Modeling Motion of Body Parts for Action Recognition. In: Proc. in British Machine Vision Conference (2011)Google Scholar
  24. 24.
    Wang, H., Ullah, M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: Proc. in British Machine Vision Conference (2009)Google Scholar
  25. 25.
    Yao, A., Gall, U., Van Gool, L.: A Hough transform-based voting framework for action recognition. In: Proc. in Computer Vision and Pattern Recognition (2010)Google Scholar
  26. 26.
    Yu, T., Kim, T., Cipolla, R.: Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest. In: Proc. in British Machine Vision Conference (2010)Google Scholar
  27. 27.
    Zhang, Y., Liu, X., Chang, M.-C., Ge, W., Chen, T.: Spatio-Temporal Phrases for Activity Recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 707–721. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Analí Alfaro
    • 1
  • Domingo Mery
    • 1
  • Alvaro Soto
    • 1
  1. 1.Department of Computer SciencePontificia Universidad Católica de ChileChile

Personalised recommendations