Comparing Evaluation Protocols on the KTH Dataset

Gao, Zan; Chen, Ming-yu; Hauptmann, Alexander G.; Cai, Anni

doi:10.1007/978-3-642-14715-9_10

Zan Gao¹⁹,
Ming-yu Chen²⁰,
Alexander G. Hauptmann²⁰ &
…
Anni Cai¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6219))

Included in the following conference series:

International Workshop on Human Behavior Understanding

2712 Accesses
34 Citations
1 Altmetric

Abstract

Human action recognition has become a hot research topic, and a lot of algorithms have been proposed. Most of researchers evaluated their performances on the KTH dataset, but there is no unified standard how to evaluate algorithms on this dataset. Different researchers have employed different test setups, so the comparison is not accurate, fair or complete. In order to know how much difference there is when different experimental setups are used, we take our own spatio-temporal MoSIFT feature as an example to assess its performance on the KTH dataset using different test scenarios and different partitioning of the data. In all experiments, support vector machine (SVM) with a chi-square kernel is adopted. First, we evaluate performance changes resulting from differing vocabulary sizes of the codebook, and then decide on a suitable vocabulary size of codebook. Then, we train the models using different training dataset partitions, and test the performances one the corresponding held-out test sets. Experiments show that the best performance of MoSIFT can reach 96.33% on the KTH dataset. When different n-fold cross-validation methods are used, there can be up to 10.67% difference in the result. And when different dataset segmentations are used (such as KTH1 and KTH2), the difference in results can be up to 5.8% absolute. In addition, the performance changes dramatically when different scenarios are used in the training and test dataset. When training on KTH1 S1+S2+S3+S4 and testing on KTH1 S1 and S3 scenarios, the performance can reach 97.33% and 89.33% respectively. This paper shows how different test configurations can skew results, even on standard data set. The recommendation is to use a simple leave-one-out as the most easily replicable clear-cut partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. In: IEEE Proceedings of Nonrigid and Articulated Motion Workshop, pp. 90–102 (1997)
Google Scholar
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. In: SMC
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Google Scholar
Schuldt, C.L., Caputo, B.I.: Recognizing human actions: a local SVM approach. In: ICPR, vol. (17), pp. 32–36 (2004)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)
Google Scholar
Shechtman, E., Irani, M.: Space-time behavior-based correlation-OR-How to tell if two underlying motion fields are similar without computing them? PAMI 29(11), 2045–2056 (2007)
Article Google Scholar
Mikolajczyk, K., Uemura, H.: Action recognition with motion-appearance vocabulary forest. In: CVPR, pp. 1–8 (2008)
Google Scholar
Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR, pp. 1–8 (2008)
Google Scholar
Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Google Scholar
Nowozin, S., Bakır, G.O., Tsuda, K.: Discriminative subsequence mining for action classification. In: ICCV, pp. 1–8 (2007)
Google Scholar
Gilbert, A., Illingworth, J., Bowden, R.: Scale invariant action recognition using compound features mined from dense spatio-temporal corners. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 222–233. Springer, Heidelberg (2008)
Chapter Google Scholar
Wong, S.-F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: ICCV, pp. 1–8 (2007)
Google Scholar
Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)
Google Scholar
Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV, pp. 1–8 (2007)
Google Scholar
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR, pp. 1–8 (2008)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)
Article Google Scholar
Oikonomopoulos, A., Patras, L., Pantic, M.: Spatiotemporal saliency for human action recognition. In: ICME, pp. 1–4 (2005)
Google Scholar
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. In: ICCV, pp. 1–7 (2007)
Google Scholar
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, vol. (2), pp. 726–733 (2003)
Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. PAMI 23(3), 257–267 (2001)
Article Google Scholar
Gorelick, L., Galun, M., Sharon, E., Basri, R., Brandt, A.: Shape representation and classification using the Poisson Equation. In: CVPR, vol. (2), II-61–67 (2004)
Google Scholar
Wang, L., Suter, D.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Transactions on Image Processing 16(6), 1646–1661 (2007)
Article MathSciNet Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, pp. 1–8 (2008)
Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: CVPR, pp. 1–8 (2008)
Google Scholar
Schindler, K., Gool, L.v.: Action snippets: how many frames does human action recognition require? In: CVPR, pp. 1–8 (2008)
Google Scholar
Thurau, C., Hlavac, V.: Pose primitive based human action recognition in videos or still images. In: CVPR, pp. 1–8 (2008)
Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)
Article Google Scholar
Jia, K., Yeung, D.-Y.: Human action recognition using local spatio-temporal discriminant embedding. In: CVPR, pp. 1–8 (2008)
Google Scholar
Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding. In: CVPR, pp. 1–8 (2008)
Google Scholar
Wang, L., Geng, X., Leckie, C., Kotagiri, R.: Moving shape dynamics: a signal processing perspective. In: CVPR, pp. 1–8 (2008)
Google Scholar
Sun, X., Chen, M.-Y., Hauptmann, A.: Action Recognition via Local Descriptors and Holistic Features. In: CVPR, pp. 58–65 (June 25, 2009)
Google Scholar
Chen, M.-y., Hauptmann, A.: MoSIFT: Reocgnizing Human Actions in Surveillance Videos. CMU-CS-09-161. Carnegie Mellon University (2009)
Google Scholar
Ikizler, N., Cinbis, R.G., Duygulu, P.: Human action recognition with line and flow histograms. In: ICPR, pp. 1–4 (2008)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, P.R. China
Zan Gao & Anni Cai
School of Computer Science, Carnegie Mellon University, 15213, PA, USA
Ming-yu Chen & Alexander G. Hauptmann

Authors

Zan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ming-yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Alexander G. Hauptmann
View author publications
You can also search for this author in PubMed Google Scholar
Anni Cai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Institute, University of Amsterdam, Science Park 107, 1098XG, The Netherlands
Albert Ali Salah & Theo Gevers &
Dept. of Information Engineering and Computer Science, University of Trento, Via Sommarive, 14, I-38123, POVO, Trento, Italy
Nicu Sebe
Dept. of Computing Science, University of Glasgow, Sir Alwyn Williams Building, G12 8QQ, Glasgow, UK
Alessandro Vinciarelli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Z., Chen, My., Hauptmann, A.G., Cai, A. (2010). Comparing Evaluation Protocols on the KTH Dataset. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds) Human Behavior Understanding. HBU 2010. Lecture Notes in Computer Science, vol 6219. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14715-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-14715-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14714-2
Online ISBN: 978-3-642-14715-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics