Spatio-temporal SURF for Human Action Recognition

Megrhi, Sameh; Souidène, Wided; Beghdadi, Azeddine

doi:10.1007/978-3-319-03731-8_47

Sameh Megrhi²²,
Wided Souidène²² &
Azeddine Beghdadi²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8294))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

2925 Accesses
5 Citations
1 Altmetric

Abstract

In this paper, we propose a new spatio-temporal descriptor called ST-SURF. The latter is based on a novel combination between the speed up robust feature (SURF) and the optical flow. The Hessian detector is employed to find all interest points. To reduce the computation time, we propose a new methodology for video segmentation into Frame Packets (FPs), based on the interest points trajectory tracking. We consider only moving interest points descriptors to generate robust and powerful discriminative codebook based on K-mean clustering. We use a standard bag-of-visual-words Support Vector Machine (SVM) approach for action recognition. For the purpose of evaluation, the experimentations are carried out on KTH and UCF sports Datasets. It is demonstrated that the designed ST-SURF shows promising results. In fact, on KTH Dataset, the proposed method achieves an accuracy of 88.2% which is equivalent to the state-of-the-art. On the more realistic UCF sports Dataset, our method surpasses the performance of the best results of space-time descriptors/Hessian detector with 80.7%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Youtube: Statistiques@ONLINE (June 2009)
Google Scholar
Jiang, Y.G., Ngo, C.W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR, pp. 494–501. ACM, New York (2007)
Chapter Google Scholar
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 334–352
Google Scholar
Willamowski, J., Arregui, D., Csurka, G., Dance, C.R., Fan, L.: Categorizing nine visual classes using local appearance descriptors. Illumination 21 (2004)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,CVPR, p–511. IEEE (2001)
Google Scholar
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 570–576 (1998)
Google Scholar
Kim, T.K., Wong, S.F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8. IEEE (2007)
Google Scholar
Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: IEEE 12th International Conference on Computer Vision, pp. 444–451. IEEE (2009)
Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. International Journal of Computer Vision 65(1-2), 43–72 (2005)
Article Google Scholar
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: et al.: Evaluation of local spatio-temporal features for action recognition. In: BMVC British Machine Vision Conference (2009)
Google Scholar
Sameh, M., Wided, S., Beghdadi, A., Amar, C.B.: Video indexing using salient region based spatio-temporal segmentation approach. In: International Conference on Multimedia Computing and Systems, pp. 170–173. IEEE (2012)
Google Scholar
Megrhi, S., Souidene, W., Beghdadi, A.: Spatio-temporal salient feature extraction for perceptual content based video retrieval. In: The Colour and Visual Computing Symposium 2013. IEEE (2012)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)
Google Scholar
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: MacLean, W.J. (ed.) SCVMA 2004. LNCS, vol. 3667, pp. 91–103. Springer, Heidelberg (2006)
Chapter Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Chapter Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR, pp. 32–36. IEEE (2004)
Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, p. 22 (2004)
Google Scholar
Brandão Lopes, A.P., Alves do Valle Jr., E., Marques de Almeida, J., Albuquerque de Araújo, A.: Action recognition in videos: from motion capture labs to the web. arXiv preprint arXiv:1006.3506 (2010)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision (3), 299–318 (2008)
Google Scholar
Mojarrad, M., Dezfouli, M.A., Rahmani, A.M.: Feature extraction of human body composition in images by segmentation method. World Academy of Science, Engineering and Technology (2008)
Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8. IEEE (2008)
Google Scholar
Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: IEEE Conference onComputer Vision and Pattern Recognition, CVPR, pp. 2432–2439. IEEE (2010)
Google Scholar
Beaudet, P.R.: Rotationally invariant image operators. In: Proceedings of the International Joint Conference on Pattern Recognition, pp. 579–583 (1978)
Google Scholar
Noguchi, A., Yanai, K.: A surf-based spatio-temporal feature for feature-fusion-based action recognition. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 153–167. Springer, Heidelberg (2012)
Chapter Google Scholar
Beghdadi, A., Mesbah, M., Monteil, J.: A fast incremental approach for accurate measurement of the displacement field. Elsevier Image and Vision Computing 21, 383–399 (2003)
Article Google Scholar
Dementhon, D., Doermann, D.: Video retrieval of near-duplicates using κ-nearest neighbor retrieval of spatio-temporal descriptors. Multimedia Tools and Applications (3), 229–253 (2006)
Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

L2TI, Institut Galilée, Université Paris 13, Sorbonne Paris Cité, France
Sameh Megrhi, Wided Souidène & Azeddine Beghdadi

Authors

Sameh Megrhi
View author publications
You can also search for this author in PubMed Google Scholar
Wided Souidène
View author publications
You can also search for this author in PubMed Google Scholar
Azeddine Beghdadi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EURECOM, Multimedia Department, Sophia Antipolis, France
Benoit Huet
Department of Computer Science, City University of Hong Kong, Tat Chee Ave, Kowloon, Hong Kong
Chong-Wah Ngo
Nanjing University of Science and Technology, 210093, Nanjing, China
Jinhui Tang
Department of Computer Science and Technology, Nanjing University, Xianlin Avenue No. 163, 210023, Nanjing, China
Zhi-Hua Zhou
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Alexander G. Hauptmann
Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, 117583, Singapore, Singapore
Shuicheng Yan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Megrhi, S., Souidène, W., Beghdadi, A. (2013). Spatio-temporal SURF for Human Action Recognition. In: Huet, B., Ngo, CW., Tang, J., Zhou, ZH., Hauptmann, A.G., Yan, S. (eds) Advances in Multimedia Information Processing – PCM 2013. PCM 2013. Lecture Notes in Computer Science, vol 8294. Springer, Cham. https://doi.org/10.1007/978-3-319-03731-8_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-03731-8_47
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03730-1
Online ISBN: 978-3-319-03731-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics