Skip to main content

Robust Spatio-Temporal Features for Human Action Recognition

  • Chapter
Multimedia Analysis, Processing and Communications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 346))

Abstract

In this chapter we describe and evaluate two recent feature detectors and descriptors used in the context of action recognition: 3D SIFT and 3D SURF. We first give an introduction to the algorithms in the 2D domain, named SIFT and SURF. For each method, an explanation of the theory upon which they are based is given and a comparison of the different approaches is shown. Then, we describe the extension of the 2D methods SIFT and SURF into the temporal domain, known as 3D SIFT and 3D SURF. The similarities and differences for both methods are emphasized. As a comparison of the 3D methods, we evaluate the performance of 3D SURF and 3D SIFT in the field of Human Action Recognition. Our results have shown similar accuracy performance, but a greater efficiency for 3D SURF approach compared with 3D SIFT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allaire, S., Kim, J.J., Breen, S.L., Jaffray, D.A., Pekar, V.: Full orientation invariance and improved feature selectivity of 3D SIFT with application to medical image analysis. In: Computer Vision and Pattern Recognition Workshops (January 2008)

    Google Scholar 

  2. Baumberg, A.: Reliable feature matching across widely separated views. vol. 1, pp. 774–781 (2000)

    Google Scholar 

  3. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L., Van Gool, L.: Speeded-up robust features (surf). Computer Vision and Image Understanding 110(3), 346–359 (2008), ISSN 1077-3142

    Article  Google Scholar 

  5. Brown, M., Lowe, D.G.: Recognising panoramas. In: IEEE International Conference on Computer Vision, vol. 2, pp. 12–18 (2003)

    Google Scholar 

  6. Cheung, W., Hamarneh, G.: N-sift: N-dimensional scale invariant feature transform for matching medical images. In: 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, ISBI 2007, pp. 720–723 (2007)

    Google Scholar 

  7. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features, pp. 65–72 (2005)

    Google Scholar 

  8. Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 891–906 (1991)

    Article  Google Scholar 

  9. Gordon, I., Lowe, D.G.: What and where: 3D object recognition with accurate pose. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 67–82. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Harris, C., Stephens, M.: A combined corner and edge detection. In: Proceedings of The Fourth Alvey Vision Conference, pp. 147–151 (1988)

    Google Scholar 

  11. Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with local binary patterns. Pattern Recogn. 42(3), 425–436 (2009), ISSN 0031-3203

    Article  MATH  Google Scholar 

  12. Kadir, T., Brady, M.: Saliency, scale and image description. International Journal of Computer Vision 45(2), 83–105 (2001)

    Article  MATH  Google Scholar 

  13. Koenderink, J.J., van Doom, A.J.: Representation of local geometry in the visual system. Biol. Cybern. 55(6), 367–375 (1987), ISSN 0340-1200

    Article  MATH  Google Scholar 

  14. Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)

    Google Scholar 

  15. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies, pp. 1–8 (June 2008)

    Google Scholar 

  16. Lindeberg, T.: Scale-space theory: A basic tool for analysing structures at different scales. J. of Applied Statistics 21(2), 224–270 (1994)

    Google Scholar 

  17. Lindeberg, T.: Feature detection with automatic scale selection. International Journal of Computer Vision 30, 79–116 (1998)

    Article  Google Scholar 

  18. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004), ISSN 0920-5691

    Article  Google Scholar 

  19. Matas, J., Chum, O., Martin, U., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference, London, vol. 1, pp. 384–393 (2002)

    Google Scholar 

  20. Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proc. ICCV, pp. 525–531 (2001)

    Google Scholar 

  21. Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell 27(10), 1615–1630 (2005), ISSN 0162-8828

    Article  Google Scholar 

  23. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vision 65(1-2), 43–72 (2005), ISSN 0920-5691

    Article  Google Scholar 

  24. Quack, T., Bay, H., Van Gool, L.: Object recognition for the internet of things. In: Floerkemeier, C., Langheinrich, M., Fleisch, E., Mattern, F., Sarma, S.E. (eds.) IOT 2008. LNCS, vol. 4952, pp. 230–246. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  25. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: Proceedings of the Pattern Recognition, 17th International Conference on (ICPR 2004), vol. 3, pp. 32–36. IEEE Computer Society Press, Washington, DC (2004)

    Chapter  Google Scholar 

  26. Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: MULTIMEDIA 2007: Proceedings of the 15th International Conference on Multimedia, pp. 357–360. ACM, New York (2007), ISBN 978-1-59593-702-5

    Chapter  Google Scholar 

  27. Se, S., Lowe, D., Little, J.: Vision-based mobile robot localization and mapping using scale-invariant features. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2051–2058 (2001)

    Google Scholar 

  28. Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  29. Willems, G., Tuytelaars, T., Van Gool, L.: Spatio-temporal features for robust content-based video copy detection. In: MIR 2008: Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval, pp. 283–290. ACM Press, New York (2008)

    Chapter  Google Scholar 

  30. Zhang, Z., Huang, Y., Li, C., Kang, Y.: Monocular vision simultaneous localization and mapping using surf, pp. 1651–1656 (June 2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mattivi, R., Shao, L. (2011). Robust Spatio-Temporal Features for Human Action Recognition. In: Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquierdo, E., Wang, H. (eds) Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence, vol 346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19551-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19551-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19550-1

  • Online ISBN: 978-3-642-19551-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics