Skip to main content

HMDB51: A Large Video Database for Human Motion Recognition

  • Conference paper
  • First Online:
High Performance Computing in Science and Engineering ‘12

Abstract

With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. The goal of this effort is to provide a tool to evaluate the performance of computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C. Schuldt, I. Laptev, B. Caputo: Recognizing human actions: A Local SVM Approach. In: Proc. ICPR 2004, Cambridge, UK.

    Google Scholar 

  2. R. Blake and M. Shiffrar: Perception of human motion. In: Annu. Rev. Psychol., 58: 47–73, 2007.

    Article  Google Scholar 

  3. M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri: Actions as space-time shapes. In: IEEE International Conference on Computer Vision, 2005. ICCV 2005, 2, 2005.

    Google Scholar 

  4. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei: Imagenet: A large-scale hierarchical image database. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2009 2009.

    Google Scholar 

  5. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman: The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. http://www.pascalnetwork.org/challenges/VOC/voc2010/workshop/index.html.

  6. H. Jhuang, E. Garrote, X. Yu, V. Khilnani, T. Poggio, A. D. Steele, and T. Serre. Automated home-cage behavioural phenotyping of mice. Nature Communications, 1(5): 1–9, 2010.

    Article  Google Scholar 

  7. H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologically inspired system for action recognition. Proceedings of the Eleventh IEEE International Conference on Computer Vision (ICCV), 2007.

    Google Scholar 

  8. R. F. L. Fei-Fei and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. IEEE CVPR Workshop on Generative-Model Based Vision, 2004.

    Google Scholar 

  9. I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2–3): 107–123, 2005

    Article  Google Scholar 

  10. I. Laptev, M. Marszaek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Jan 2008.

    Google Scholar 

  11. B. Russell, A. Torralba, K. Murphy, and W. Freeman. Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1): 157–173, 2008

    Article  Google Scholar 

  12. E. Simoncelli and D. Heeger. A model of neuronal responses in visual area mt. Vision Research, 38(5): 743–61, 1998.

    Article  Google Scholar 

  13. A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE T. Pattern Anal., pages 1958–1970, 2008.

    Google Scholar 

  14. A. Torralba, B. Russell, and J. Yuen. Labelme: online image annotation and applications. Proceedings of the IEEE, 98(8): 1467–1484, 2010.

    Article  Google Scholar 

  15. H. Wang, M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. Biritish Machine Vision Conference, London, UK, pages 1–11, 2009

    Google Scholar 

  16. D.Weinland, E. Boyer, R. Ronfard, and G. LJK-INRIA. Action recognition from arbitrary views using 3d exemplars. Computer Vision, Jan 2007.

    Google Scholar 

  17. D. Weinland, R. Ronfard, and E. Boyer. A survey of visionbased methods for action representation, segmentation and recognition. Computer Vision and Image Understanding, pages 1–51, Oct 2010.

    Google Scholar 

  18. J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3485–3492, 2010.

    Google Scholar 

  19. J. C. Niebles, C.-W. Chen and L. Fei-Fei. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. Proceedings of the 12th European Conference of Computer Vision (ECCV). 2010.

    Google Scholar 

Download references

Acknowledgements

This report describes research done in part at the Center for Biological & Computational Learning, affiliated with MIBR, BCS, CSAIL at MIT. This research was sponsored by grants from DARPA (IPTO and DSO), NSF (NSF-0640097, NSF-0827427), AFSOR-THRL (FA8650-05- C-7262). Additional support was provided by: Adobe, King Abdullah University Science and Technology grant to B. DeVore, NEC, Sony and by the Eugene McDermott Foundation. This work is also done and supported by Brown University, Center for Computation and Visualization, and the Robert J. and Nancy D. Carney Fund for Scientific Innovation, by DARPA (DARPA-BAA-09-31), and ONR (ONR-BAA-11-001). H.K. was supported by a grant from the Ministry of Science, Research and the Arts of Baden Württemberg, Germany.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hilde Kuehne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kuehne, H., Jhuang, H., Stiefelhagen, R., Serre, T. (2013). HMDB51: A Large Video Database for Human Motion Recognition. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering ‘12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33374-3_41

Download citation

Publish with us

Policies and ethics