Assessing the Quality of Actions

  • Hamed Pirsiavash
  • Carl Vondrick
  • Antonio Torralba
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8694)


While recent advances in computer vision have provided reliable methods to recognize actions in both images and videos, the problem of assessing how well people perform actions has been largely unexplored in computer vision. Since methods for assessing action quality have many real-world applications in healthcare, sports, and video retrieval, we believe the computer vision community should begin to tackle this challenging problem. To spur progress, we introduce a learning-based framework that takes steps towards assessing how well people perform actions in videos. Our approach works by training a regression model from spatiotemporal pose features to scores obtained from expert judges. Moreover, our approach can provide interpretable feedback on how people can improve their action. We evaluate our method on a new Olympic sports dataset, and our experiments suggest our framework is able to rank the athletes more accurately than a non-expert human. While promising, our method is still a long way to rivaling the performance of expert judges, indicating that there is significant opportunity in computer vision research to improve on this difficult yet important task.


Discrete Cosine Transform Discrete Fourier Transform Support Vector Regression Action Recognition Action Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gordon, A.S.: Automated video assessment of human performance. In: AI-ED (1995)Google Scholar
  2. 2.
    Jug, M., Perš, J., Dežman, B., Kovačič, S.: Trajectory based assessment of coordinated human activity. In: Crowley, J.L., Piater, J.H., Vincze, M., Paletta, L. (eds.) ICVS 2003. LNCS, vol. 2626, pp. 534–543. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  3. 3.
    Perše, M., Kristan, M., Perš, J., Kovacic, S.: Automatic Evaluation of Organized Basketball Activity using Bayesian Networks. Citeseer (2007)Google Scholar
  4. 4.
    Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)Google Scholar
  5. 5.
    Ke, Y., Tang, X., Jing, F.: The design of high-level features for photo quality assessment. In: CVPR (2006)Google Scholar
  6. 6.
    Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Van Gool, L.: The interestingness of images (2013)Google Scholar
  7. 7.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Studying aesthetics in photographic images using a computational approach. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 288–301. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Dhar, S., Ordonez, V., Berg, T.L.: High level describable attributes for predicting aesthetics and interestingness. In: CVPR (2011)Google Scholar
  9. 9.
    Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: Using spatial and functional compatibility for recognition. PAMI (2009)Google Scholar
  10. 10.
    Yao, B., Fei-Fei, L.: Action recognition with exemplar based 2.5D graph matching. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 173–186. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Yang, W., Wang, Y., Mori, G.: Recognizing human actions from still images with latent poses. In: CVPR (2010)Google Scholar
  12. 12.
    Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)Google Scholar
  13. 13.
    Delaitre, V., Sivic, J., Laptev, I., et al.: Learning person-object interactions for action recognition in still images. In: NIPS (2011)Google Scholar
  14. 14.
    Laptev, I., Perez, P.: Retrieving actions in movies. In: ICCV (2007)Google Scholar
  15. 15.
    Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR (2012)Google Scholar
  16. 16.
    Rodriguez, M., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, pp. 1–8 (2008)Google Scholar
  17. 17.
    Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: CVPR (2003)Google Scholar
  18. 18.
    Shechtman, E., Irani, M.: Space-time behavior based correlation. In: IEEE PAMI (2007)Google Scholar
  19. 19.
    Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)CrossRefGoogle Scholar
  20. 20.
    Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: A review. ACM Comput. Surv. 16Google Scholar
  21. 21.
    Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)Google Scholar
  22. 22.
    Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  23. 23.
    Laptev, I.: On space-time interest points. In: ICCV (2005)Google Scholar
  24. 24.
    Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR (2011)Google Scholar
  25. 25.
    Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: CVPR (2012)Google Scholar
  26. 26.
    Ekin, A., Tekalp, A.M., Mehrotra, R.: Automatic soccer video analysis and summarization. Transactions on Image Processing (2003)Google Scholar
  27. 27.
    Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)Google Scholar
  28. 28.
    Gong, Y., Liu, X.: Video summarization using singular value decomposition. In: CVPR (2000)Google Scholar
  29. 29.
    Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short: Dynamic video synopsis. In: CVPR (2006)Google Scholar
  30. 30.
    Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. Circuits and Systems for Video Technology (2005)Google Scholar
  31. 31.
    Jiang, R.M., Sadka, A.H., Crookes, D.: Hierarchical video summarization in reference subspace. IEEE Transactions on Consumer Electronics (2009)Google Scholar
  32. 32.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)Google Scholar
  33. 33.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: CVPR (2011)Google Scholar
  34. 34.
    Park, D., Ramanan, D.: N-best maximal decoders for part models. In: ICCV (2011)Google Scholar
  35. 35.
    Drucker, H., Burges, C.J., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines. In: NIPS (1997)Google Scholar
  36. 36.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) (2011)Google Scholar
  37. 37.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  38. 38.
    Zheng, B., Zhao, Y., Yu, J.C., Ikeuchi, K., Zhu, S.C.: Detecting potential falling objects by inferring human action and natural disturbance. In: IEEE Int. Conf. on Robotics and Automation (ICRA) (to appear, 2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hamed Pirsiavash
    • 1
  • Carl Vondrick
    • 1
  • Antonio Torralba
    • 1
  1. 1.Massachusetts Institute of TechnologyUSA

Personalised recommendations