Robust Spatio-Temporal Features for Human Action Recognition

Mattivi, Riccardo; Shao, Ling

doi:10.1007/978-3-642-19551-8_12

Riccardo Mattivi⁷ &
Ling Shao⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 346))

1610 Accesses
2 Citations

Abstract

In this chapter we describe and evaluate two recent feature detectors and descriptors used in the context of action recognition: 3D SIFT and 3D SURF. We first give an introduction to the algorithms in the 2D domain, named SIFT and SURF. For each method, an explanation of the theory upon which they are based is given and a comparison of the different approaches is shown. Then, we describe the extension of the 2D methods SIFT and SURF into the temporal domain, known as 3D SIFT and 3D SURF. The similarities and differences for both methods are emphasized. As a comparison of the 3D methods, we evaluate the performance of 3D SURF and 3D SIFT in the field of Human Action Recognition. Our results have shown similar accuracy performance, but a greater efficiency for 3D SURF approach compared with 3D SIFT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allaire, S., Kim, J.J., Breen, S.L., Jaffray, D.A., Pekar, V.: Full orientation invariance and improved feature selectivity of 3D SIFT with application to medical image analysis. In: Computer Vision and Pattern Recognition Workshops (January 2008)
Google Scholar
Baumberg, A.: Reliable feature matching across widely separated views. vol. 1, pp. 774–781 (2000)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Chapter Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L., Van Gool, L.: Speeded-up robust features (surf). Computer Vision and Image Understanding 110(3), 346–359 (2008), ISSN 1077-3142
Article Google Scholar
Brown, M., Lowe, D.G.: Recognising panoramas. In: IEEE International Conference on Computer Vision, vol. 2, pp. 12–18 (2003)
Google Scholar
Cheung, W., Hamarneh, G.: N-sift: N-dimensional scale invariant feature transform for matching medical images. In: 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2007, pp. 720–723 (2007)
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features, pp. 65–72 (2005)
Google Scholar
Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 891–906 (1991)
Article Google Scholar
Gordon, I., Lowe, D.G.: What and where: 3D object recognition with accurate pose. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 67–82. Springer, Heidelberg (2006)
Chapter Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detection. In: Proceedings of The Fourth Alvey Vision Conference, pp. 147–151 (1988)
Google Scholar
Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with local binary patterns. Pattern Recogn. 42(3), 425–436 (2009), ISSN 0031-3203
Article MATH Google Scholar
Kadir, T., Brady, M.: Saliency, scale and image description. International Journal of Computer Vision 45(2), 83–105 (2001)
Article MATH Google Scholar
Koenderink, J.J., van Doom, A.J.: Representation of local geometry in the visual system. Biol. Cybern. 55(6), 367–375 (1987), ISSN 0340-1200
Article MATH Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies, pp. 1–8 (June 2008)
Google Scholar
Lindeberg, T.: Scale-space theory: A basic tool for analysing structures at different scales. J. of Applied Statistics 21(2), 224–270 (1994)
Google Scholar
Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30, 79–116 (1998)
Article Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004), ISSN 0920-5691
Article Google Scholar
Matas, J., Chum, O., Martin, U., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference, London, vol. 1, pp. 384–393 (2002)
Google Scholar
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proc. ICCV, pp. 525–531 (2001)
Google Scholar
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Chapter Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell 27(10), 1615–1630 (2005), ISSN 0162-8828
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vision 65(1-2), 43–72 (2005), ISSN 0920-5691
Article Google Scholar
Quack, T., Bay, H., Van Gool, L.: Object recognition for the internet of things. In: Floerkemeier, C., Langheinrich, M., Fleisch, E., Mattern, F., Sarma, S.E. (eds.) IOT 2008. LNCS, vol. 4952, pp. 230–246. Springer, Heidelberg (2008)
Chapter Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: Proceedings of the Pattern Recognition, 17th International Conference on (ICPR 2004), vol. 3, pp. 32–36. IEEE Computer Society Press, Washington, DC (2004)
Chapter Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: MULTIMEDIA 2007: Proceedings of the 15th International Conference on Multimedia, pp. 357–360. ACM, New York (2007), ISBN 978-1-59593-702-5
Chapter Google Scholar
Se, S., Lowe, D., Little, J.: Vision-based mobile robot localization and mapping using scale-invariant features. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2051–2058 (2001)
Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: Spatio-temporal features for robust content-based video copy detection. In: MIR 2008: Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 283–290. ACM Press, New York (2008)
Chapter Google Scholar
Zhang, Z., Huang, Y., Li, C., Kang, Y.: Monocular vision simultaneous localization and mapping using surf, pp. 1651–1656 (June 2008)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Trento, Italy
Riccardo Mattivi
The University of Sheffield, UK
Ling Shao

Authors

Riccardo Mattivi
View author publications
You can also search for this author in PubMed Google Scholar
Ling Shao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering , Nanyang Technological University, 639798, Singapore
Weisi Lin & Dacheng Tao &
Intelligent Systems Laboratory Systems Research Institute , Polish Academy of Sciences, Poland
Janusz Kacprzyk
Department of Computing , Hong Kong Polytechnic University, Hung Hom, Hong Kong
Zhu Li
School of Electronic Engineering and Computer Science, Queen Mary, University of London, London, U.K.
Ebroul Izquierdo
TCL-Thomson Electronics , Santa Clara, California
Haohong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mattivi, R., Shao, L. (2011). Robust Spatio-Temporal Features for Human Action Recognition. In: Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E., Wang, H. (eds) Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence, vol 346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19551-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-19551-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19550-1
Online ISBN: 978-3-642-19551-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics