Natural Action Recognition Using Invariant 3D Motion Encoding

  • Simon Hadfield
  • Karel Lebeda
  • Richard Bowden
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)


We investigate the recognition of actions “in the wild” using 3D motion information. The lack of control over (and knowledge of) the camera configuration, exacerbates this already challenging task, by introducing systematic projective inconsistencies between 3D motion fields, hugely increasing intra-class variance. By introducing a robust, sequence based, stereo calibration technique, we reduce these inconsistencies from fully projective to a simple similarity transform. We then introduce motion encoding techniques which provide the necessary scale invariance, along with additional invariances to changes in camera viewpoint.

On the recent Hollywood 3D natural action recognition dataset, we show improvements of 40% over previous state-of-the-art techniques based on implicit motion encoding. We also demonstrate that our robust sequence calibration simplifies the task of recognising actions, leading to recognition rates 2.5 times those for the same technique without calibration. In addition, the sequence calibrations are made available.


Action recognition in the wild 3D motion scene flow invariant encoding stereo sequence calibration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    BMVC, September 3-7 (2012)Google Scholar
  2. 2.
    Basha, T., Avidan, S., Hornung, A., Matusik, W.: Structure and motion from scene registration. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1426–1433 (June 2012)Google Scholar
  3. 3.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI 24(4), 509–522 (2002)CrossRefGoogle Scholar
  4. 4.
    Cheng, Z., Qin, L., Ye, Y., Huang, Q., Tian, Q.: Human daily action analysis with multi-view and color-depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part II. LNCS, vol. 7584, pp. 52–61. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)CrossRefGoogle Scholar
  7. 7.
    Gilbert, A., Illingworth, J., Bowden, R.: Action recognition using mined hierarchical compound features. PAMI 33(5), 883 –897 (may 2011)Google Scholar
  8. 8.
    Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. PAMI 29(12), 2247–2253 (2007)CrossRefGoogle Scholar
  9. 9.
    Hadfield, S., Bowden, R.: Kinecting the dots: Particle based scene flow from depth sensors. In. In: Proceedings, International Conference on Computer Vision, Barcelona, Spain, November 6-13 (2011)Google Scholar
  10. 10.
    Hadfield, S., Bowden, R.: Hollywood 3d: Recognizing actions in 3d natural scenes. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Oregon, USA, June 22-28 (2013)Google Scholar
  11. 11.
    Hadfield, S., Bowden, R.: Scene particles: Unregularized particle based scene flow estimation. IEEE Trans. on Pattern Analysis and Machine Intelligence 36(3), 564–576 (2014)CrossRefGoogle Scholar
  12. 12.
    Hartley, R., Zisserman, A.: Multiple View Geometry in computer vision. Cambridge University press (2000)Google Scholar
  13. 13.
    Konda, K., Memisevic, R.: Learning to combine depth and motion. arXiv preprint arXiv:1312.3429 (2013)Google Scholar
  14. 14.
    Kukelova, Z., Bujnak, M., Pajdla, T.: Polynomial eigenvalue solutions to the 5-pt and 6-pt relative pose problems. In: BMVC, pp. 1–10 (2008)Google Scholar
  15. 15.
    Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Ninth IEEE Int Computer Vision Conf, pp. 432–439 (2003)Google Scholar
  16. 16.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition CVPR 2008, pp. 1–8 (2008)Google Scholar
  17. 17.
    Laptev, I., Perez, P.: Retrieving actions in movies. In: Proc. IEEE 11th Int. Conf. Computer Vision ICCV 2007. pp. 1–8 (2007)Google Scholar
  18. 18.
    Lebeda, K., Matas, J., Chum, O., Bowden: Fixing the locally optimized ransac. In: Bowden, et al. (eds.) [1], pp. 1013–1023Google Scholar
  19. 19.
    Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3D points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14. IEEE (2010)Google Scholar
  20. 20.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  21. 21.
    Lukins, T., Fisher, R.: Colour constrained 4D flow. In: Proc. BMVC, Oxford, UK, September 6-8, pp. 340–348 (2005)Google Scholar
  22. 22.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Proc. IEEE 12th Int. Computer Vision Conf, pp. 104–111 (2009)Google Scholar
  23. 23.
    Oreifej, O., Liu, Z.: Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 716–723. IEEE (2013)Google Scholar
  24. 24.
    Oshin, O., Gilbert, A., Bowden, R.: Capturing the relative distribution of features for action recognition. In: Proc. IEEE Int Automatic Face & Gesture Recognition and Workshops (FG 2011) Conf., pp. 111–116 (2011)Google Scholar
  25. 25.
    Saff, E.B., Kuijlaars, A.B.: Distributing many points on a sphere. The Mathematical Intelligencer 19(1), 5–11 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  26. 26.
    Sapienza, M., Cuzzolin, F., Torr, P.: Learning discriminative space-time actions from weakly labelled videos. In: Proc. BMVC [1]Google Scholar
  27. 27.
    Schuchert, T., Aach, T., Scharr, H.: Range flow in varying illumination: Algorithms and comparisons. PAMI, 1646–1658 (2009)Google Scholar
  28. 28.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. 17th Int. Conf. Pattern Recognition ICPR 2004, vol. 3, pp. 32–36 (2004)Google Scholar
  29. 29.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia, Multimedia 2007, pp. 357–360. ACM, New York (2007)Google Scholar
  30. 30.
    Torr, P., Zisserman, A.: Robust computation and parametrization of multiple view relations. In: Sixth International Conference on Computer Vision, pp. 727–732. IEEE (1998)Google Scholar
  31. 31.
    Vieira, A.W., Nascimento, E.R., Oliveira, G.L., Liu, Z., Campos, M.F.: Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 252–259 (2012)Google Scholar
  32. 32.
    Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 872–885. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  33. 33.
    Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  34. 34.
    Yang, X., Zhang, C., Tian, Y.: Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM International Conference on Multimedia, pp. 1057–1060. ACM (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Simon Hadfield
    • 1
  • Karel Lebeda
    • 1
  • Richard Bowden
    • 1
  1. 1.Centre for Vision, Speech and Signal ProcessingUniversity of SurreyUK

Personalised recommendations