Modeling the Temporal Extent of Actions

  • Scott Satkin
  • Martial Hebert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


In this paper, we present a framework for estimating what portions of videos are most discriminative for the task of action recognition. We explore the impact of the temporal cropping of training videos on the overall accuracy of an action recognition system, and we formalize what makes a set of croppings optimal. In addition, we present an algorithm to determine the best set of croppings for a dataset, and experimentally show that our approach increases the accuracy of various state-of-the-art action recognition techniques.


Action Recognition Temporal Extent Training Video Multiple Instance Learning Action Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  2. 2.
    Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/script: Alignment and parsing of video and text transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 158–171. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  4. 4.
    Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV (2009)Google Scholar
  5. 5.
    Hall, J., Greenhill, D., Jones, G.A.: Segmenting film sequences using active surfaces. In: ICIP (1997)Google Scholar
  6. 6.
    Wang, J., Bhat, P., Colburn, R.A., Agrawala, M., Cohen, M.F.: Interactive video cutout. ACM Transactions on Graphics 24, 585–594 (2005)CrossRefGoogle Scholar
  7. 7.
    Dietterich, T., Lathrop, R., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)zbMATHCrossRefGoogle Scholar
  8. 8.
    Malisiewicz, T., Efros, A.: Improving spatial support for objects via multiple segmentations. In: BMVC (2007)Google Scholar
  9. 9.
    Uijlings, J.R.R., Smeulders, A.W.M., Scha, R.J.H.: What is the spatial extent of an object. In: CVPR (2009)Google Scholar
  10. 10.
    Pantofaru, C., Schmid, C., Hebert, M.: Object recognition by integrating multiple image segmentations. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 481–494. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Buehler, P., Zisserman, A., Everingham, M.: Learning sign language by watching TV (using weakly aligned subtitles). In: CVPR (2009)Google Scholar
  12. 12.
    Nowozin, S., Bakir, G., Tsuda, K.: Discriminative subsequence mining for action classification. In: CVPR (2007)Google Scholar
  13. 13.
    Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: CVPR (2009)Google Scholar
  14. 14.
    Wolpert, D.H.: Stacked generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar
  15. 15.
    Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV (2007)Google Scholar
  16. 16.
    Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: ICCV (2008)Google Scholar
  17. 17.
    Shechtman, E., Irani, M.: Space-time behavior based correlation. In: CVPR (2005)Google Scholar
  18. 18.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)Google Scholar
  19. 19.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: ICPR (2004)Google Scholar
  20. 20.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)Google Scholar
  21. 21.
    Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: VOEC Workshop (2009)Google Scholar
  22. 22.
    Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research 5 (2004)Google Scholar
  23. 23.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software, Available at

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Scott Satkin
    • 1
  • Martial Hebert
    • 1
  1. 1.The Robotics InstituteCarnegie Mellon University 

Personalised recommendations