Skip to main content

Spatio-temporal SURF for Human Action Recognition

  • Conference paper
Advances in Multimedia Information Processing – PCM 2013 (PCM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8294))

Included in the following conference series:

Abstract

In this paper, we propose a new spatio-temporal descriptor called ST-SURF. The latter is based on a novel combination between the speed up robust feature (SURF) and the optical flow. The Hessian detector is employed to find all interest points. To reduce the computation time, we propose a new methodology for video segmentation into Frame Packets (FPs), based on the interest points trajectory tracking. We consider only moving interest points descriptors to generate robust and powerful discriminative codebook based on K-mean clustering. We use a standard bag-of-visual-words Support Vector Machine (SVM) approach for action recognition. For the purpose of evaluation, the experimentations are carried out on KTH and UCF sports Datasets. It is demonstrated that the designed ST-SURF shows promising results. In fact, on KTH Dataset, the proposed method achieves an accuracy of 88.2% which is equivalent to the state-of-the-art. On the more realistic UCF sports Dataset, our method surpasses the performance of the best results of space-time descriptors/Hessian detector with 80.7%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Youtube: Statistiques@ONLINE (June 2009)

    Google Scholar 

  2. Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, pp. 494–501. ACM, New York (2007)

    Chapter  Google Scholar 

  3. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 334–352

    Google Scholar 

  4. Willamowski, J., Arregui, D., Csurka, G., Dance, C.R., Fan, L.: Categorizing nine visual classes using local appearance descriptors. Illumination 21 (2004)

    Google Scholar 

  5. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,CVPR, p–511. IEEE (2001)

    Google Scholar 

  6. Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 570–576 (1998)

    Google Scholar 

  7. Kim, T.K., Wong, S.F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8. IEEE (2007)

    Google Scholar 

  8. Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: IEEE 12th International Conference on Computer Vision, pp. 444–451. IEEE (2009)

    Google Scholar 

  9. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. International Journal of Computer Vision 65(1-2), 43–72 (2005)

    Article  Google Scholar 

  10. Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: et al.: Evaluation of local spatio-temporal features for action recognition. In: BMVC British Machine Vision Conference (2009)

    Google Scholar 

  11. Sameh, M., Wided, S., Beghdadi, A., Amar, C.B.: Video indexing using salient region based spatio-temporal segmentation approach. In: International Conference on Multimedia Computing and Systems, pp. 170–173. IEEE (2012)

    Google Scholar 

  12. Megrhi, S., Souidene, W., Beghdadi, A.: Spatio-temporal salient feature extraction for perceptual content based video retrieval. In: The Colour and Visual Computing Symposium 2013. IEEE (2012)

    Google Scholar 

  13. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)

    Google Scholar 

  14. Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: MacLean, W.J. (ed.) SCVMA 2004. LNCS, vol. 3667, pp. 91–103. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR, pp. 32–36. IEEE (2004)

    Google Scholar 

  17. Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)

    Google Scholar 

  18. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, p. 22 (2004)

    Google Scholar 

  19. Brandão Lopes, A.P., Alves do Valle Jr., E., Marques de Almeida, J., Albuquerque de Araújo, A.: Action recognition in videos: from motion capture labs to the web. arXiv preprint arXiv:1006.3506 (2010)

    Google Scholar 

  20. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision (3), 299–318 (2008)

    Google Scholar 

  21. Mojarrad, M., Dezfouli, M.A., Rahmani, A.M.: Feature extraction of human body composition in images by segmentation method. World Academy of Science, Engineering and Technology (2008)

    Google Scholar 

  22. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8. IEEE (2008)

    Google Scholar 

  23. Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: IEEE Conference onComputer Vision and Pattern Recognition, CVPR, pp. 2432–2439. IEEE (2010)

    Google Scholar 

  24. Beaudet, P.R.: Rotationally invariant image operators. In: Proceedings of the International Joint Conference on Pattern Recognition, pp. 579–583 (1978)

    Google Scholar 

  25. Noguchi, A., Yanai, K.: A surf-based spatio-temporal feature for feature-fusion-based action recognition. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 153–167. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  26. Beghdadi, A., Mesbah, M., Monteil, J.: A fast incremental approach for accurate measurement of the displacement field. Elsevier Image and Vision Computing 21, 383–399 (2003)

    Article  Google Scholar 

  27. Dementhon, D., Doermann, D.: Video retrieval of near-duplicates using κ-nearest neighbor retrieval of spatio-temporal descriptors. Multimedia Tools and Applications (3), 229–253 (2006)

    Google Scholar 

  28. Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Megrhi, S., Souidène, W., Beghdadi, A. (2013). Spatio-temporal SURF for Human Action Recognition. In: Huet, B., Ngo, CW., Tang, J., Zhou, ZH., Hauptmann, A.G., Yan, S. (eds) Advances in Multimedia Information Processing – PCM 2013. PCM 2013. Lecture Notes in Computer Science, vol 8294. Springer, Cham. https://doi.org/10.1007/978-3-319-03731-8_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03731-8_47

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03730-1

  • Online ISBN: 978-3-319-03731-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics