Skip to main content

Comparing Evaluation Protocols on the KTH Dataset

  • Conference paper
Human Behavior Understanding (HBU 2010)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6219))

Included in the following conference series:

Abstract

Human action recognition has become a hot research topic, and a lot of algorithms have been proposed. Most of researchers evaluated their performances on the KTH dataset, but there is no unified standard how to evaluate algorithms on this dataset. Different researchers have employed different test setups, so the comparison is not accurate, fair or complete. In order to know how much difference there is when different experimental setups are used, we take our own spatio-temporal MoSIFT feature as an example to assess its performance on the KTH dataset using different test scenarios and different partitioning of the data. In all experiments, support vector machine (SVM) with a chi-square kernel is adopted. First, we evaluate performance changes resulting from differing vocabulary sizes of the codebook, and then decide on a suitable vocabulary size of codebook. Then, we train the models using different training dataset partitions, and test the performances one the corresponding held-out test sets. Experiments show that the best performance of MoSIFT can reach 96.33% on the KTH dataset. When different n-fold cross-validation methods are used, there can be up to 10.67% difference in the result. And when different dataset segmentations are used (such as KTH1 and KTH2), the difference in results can be up to 5.8% absolute. In addition, the performance changes dramatically when different scenarios are used in the training and test dataset. When training on KTH1 S1+S2+S3+S4 and testing on KTH1 S1 and S3 scenarios, the performance can reach 97.33% and 89.33% respectively. This paper shows how different test configurations can skew results, even on standard data set. The recommendation is to use a simple leave-one-out as the most easily replicable clear-cut partitioning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. In: IEEE Proceedings of Nonrigid and Articulated Motion Workshop, pp. 90–102 (1997)

    Google Scholar 

  2. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. In: SMC

    Google Scholar 

  3. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

    Google Scholar 

  4. Schuldt, C.L., Caputo, B.I.: Recognizing human actions: a local SVM approach. In: ICPR, vol. (17), pp. 32–36 (2004)

    Google Scholar 

  5. Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)

    Google Scholar 

  6. Shechtman, E., Irani, M.: Space-time behavior-based correlation-OR-How to tell if two underlying motion fields are similar without computing them? PAMI 29(11), 2045–2056 (2007)

    Article  Google Scholar 

  7. Mikolajczyk, K., Uemura, H.: Action recognition with motion-appearance vocabulary forest. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  8. Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  9. Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  10. Nowozin, S., Bakır, G.O., Tsuda, K.: Discriminative subsequence mining for action classification. In: ICCV, pp. 1–8 (2007)

    Google Scholar 

  11. Gilbert, A., Illingworth, J., Bowden, R.: Scale invariant action recognition using compound features mined from dense spatio-temporal corners. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 222–233. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Wong, S.-F., Cipolla, R.: Extracting spatiotemporal interest points using global information. In: ICCV, pp. 1–8 (2007)

    Google Scholar 

  13. Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)

    Google Scholar 

  14. Willems, G., Tuytelaars, T., Gool, L.V.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV, pp. 1–8 (2007)

    Google Scholar 

  16. Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  17. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. IJCV 79(3), 299–318 (2008)

    Article  Google Scholar 

  18. Oikonomopoulos, A., Patras, L., Pantic, M.: Spatiotemporal saliency for human action recognition. In: ICME, pp. 1–4 (2005)

    Google Scholar 

  19. Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D exemplars. In: ICCV, pp. 1–7 (2007)

    Google Scholar 

  20. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, vol. (2), pp. 726–733 (2003)

    Google Scholar 

  21. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. PAMI 23(3), 257–267 (2001)

    Article  Google Scholar 

  22. Gorelick, L., Galun, M., Sharon, E., Basri, R., Brandt, A.: Shape representation and classification using the Poisson Equation. In: CVPR, vol. (2), II-61–67 (2004)

    Google Scholar 

  23. Wang, L., Suter, D.: Learning and matching of dynamic shape manifolds for human action recognition. IEEE Transactions on Image Processing 16(6), 1646–1661 (2007)

    Article  MathSciNet  Google Scholar 

  24. Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  25. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  26. Schindler, K., Gool, L.v.: Action snippets: how many frames does human action recognition require? In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  27. Thurau, C., Hlavac, V.: Pose primitive based human action recognition in videos or still images. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  28. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  29. Jia, K., Yeung, D.-Y.: Human action recognition using local spatio-temporal discriminant embedding. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  30. Weinland, D., Boyer, E.: Action recognition using exemplar-based embedding. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  31. Wang, L., Geng, X., Leckie, C., Kotagiri, R.: Moving shape dynamics: a signal processing perspective. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  32. Sun, X., Chen, M.-Y., Hauptmann, A.: Action Recognition via Local Descriptors and Holistic Features. In: CVPR, pp. 58–65 (June 25, 2009)

    Google Scholar 

  33. Chen, M.-y., Hauptmann, A.: MoSIFT: Reocgnizing Human Actions in Surveillance Videos. CMU-CS-09-161. Carnegie Mellon University (2009)

    Google Scholar 

  34. Ikizler, N., Cinbis, R.G., Duygulu, P.: Human action recognition with line and flow histograms. In: ICPR, pp. 1–4 (2008)

    Google Scholar 

  35. Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, Z., Chen, My., Hauptmann, A.G., Cai, A. (2010). Comparing Evaluation Protocols on the KTH Dataset. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds) Human Behavior Understanding. HBU 2010. Lecture Notes in Computer Science, vol 6219. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14715-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14715-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14714-2

  • Online ISBN: 978-3-642-14715-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics