HMDB51: A Large Video Database for Human Motion Recognition

Kuehne, Hilde; Jhuang, Hueihan; Stiefelhagen, Rainer; Serre, Thomas

doi:10.1007/978-3-642-33374-3_41

Hilde Kuehne⁴,
Hueihan Jhuang⁵,
Rainer Stiefelhagen⁴ &
…
Thomas Serre⁶

2677 Accesses
43 Citations

Abstract

With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. The goal of this effort is to provide a tool to evaluate the performance of computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

C. Schuldt, I. Laptev, B. Caputo: Recognizing human actions: A Local SVM Approach. In: Proc. ICPR 2004, Cambridge, UK.
Google Scholar
R. Blake and M. Shiffrar: Perception of human motion. In: Annu. Rev. Psychol., 58: 47–73, 2007.
Article Google Scholar
M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri: Actions as space-time shapes. In: IEEE International Conference on Computer Vision, 2005. ICCV 2005, 2, 2005.
Google Scholar
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei: Imagenet: A large-scale hierarchical image database. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2009 2009.
Google Scholar
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman: The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. http://www.pascalnetwork.org/challenges/VOC/voc2010/workshop/index.html.
H. Jhuang, E. Garrote, X. Yu, V. Khilnani, T. Poggio, A. D. Steele, and T. Serre. Automated home-cage behavioural phenotyping of mice. Nature Communications, 1(5): 1–9, 2010.
Article Google Scholar
H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologically inspired system for action recognition. Proceedings of the Eleventh IEEE International Conference on Computer Vision (ICCV), 2007.
Google Scholar
R. F. L. Fei-Fei and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. IEEE CVPR Workshop on Generative-Model Based Vision, 2004.
Google Scholar
I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2–3): 107–123, 2005
Article Google Scholar
I. Laptev, M. Marszaek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Jan 2008.
Google Scholar
B. Russell, A. Torralba, K. Murphy, and W. Freeman. Labelme: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1): 157–173, 2008
Article Google Scholar
E. Simoncelli and D. Heeger. A model of neuronal responses in visual area mt. Vision Research, 38(5): 743–61, 1998.
Article Google Scholar
A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE T. Pattern Anal., pages 1958–1970, 2008.
Google Scholar
A. Torralba, B. Russell, and J. Yuen. Labelme: online image annotation and applications. Proceedings of the IEEE, 98(8): 1467–1484, 2010.
Article Google Scholar
H. Wang, M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. Biritish Machine Vision Conference, London, UK, pages 1–11, 2009
Google Scholar
D.Weinland, E. Boyer, R. Ronfard, and G. LJK-INRIA. Action recognition from arbitrary views using 3d exemplars. Computer Vision, Jan 2007.
Google Scholar
D. Weinland, R. Ronfard, and E. Boyer. A survey of visionbased methods for action representation, segmentation and recognition. Computer Vision and Image Understanding, pages 1–51, Oct 2010.
Google Scholar
J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 3485–3492, 2010.
Google Scholar
J. C. Niebles, C.-W. Chen and L. Fei-Fei. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. Proceedings of the 12th European Conference of Computer Vision (ECCV). 2010.
Google Scholar

Download references

Acknowledgements

This report describes research done in part at the Center for Biological & Computational Learning, affiliated with MIBR, BCS, CSAIL at MIT. This research was sponsored by grants from DARPA (IPTO and DSO), NSF (NSF-0640097, NSF-0827427), AFSOR-THRL (FA8650-05- C-7262). Additional support was provided by: Adobe, King Abdullah University Science and Technology grant to B. DeVore, NEC, Sony and by the Eugene McDermott Foundation. This work is also done and supported by Brown University, Center for Computation and Visualization, and the Robert J. and Nancy D. Carney Fund for Scientific Innovation, by DARPA (DARPA-BAA-09-31), and ONR (ONR-BAA-11-001). H.K. was supported by a grant from the Ministry of Science, Research and the Arts of Baden Württemberg, Germany.

Author information

Authors and Affiliations

KIT, Institue for Anthorpomathics, Karlsruhe, Germany
Hilde Kuehne & Rainer Stiefelhagen
Perceiving Systems Department, Max Planck Institute for Intelligent Systems, Tübingen, Germany
Hueihan Jhuang
Institute for Brain Sciences, Brown University, Providence, RI, USA
Thomas Serre

Authors

Hilde Kuehne
View author publications
You can also search for this author in PubMed Google Scholar
Hueihan Jhuang
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Stiefelhagen
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Serre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hilde Kuehne .

Editor information

Editors and Affiliations

Zentrum für Informationsdienste, und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Helmholtzstr. 10, Dresden, 01069, Germany
Wolfgang E. Nagel
, Abteilung für Angewandte Mathematik, Universität Freiburg, Hermann-Herder Str. 10, Freiburg, 79104, Germany
Dietmar H. Kröner
Höchstleistungsrechenzentrum, Stuttgart (HLRS), Universität Stuttgart, Nobelstr. 19, Stuttgart, 70569, Germany
Michael M. Resch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuehne, H., Jhuang, H., Stiefelhagen, R., Serre, T. (2013). HMDB51: A Large Video Database for Human Motion Recognition. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering ‘12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33374-3_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-33374-3_41
Published: 22 October 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33373-6
Online ISBN: 978-3-642-33374-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics