Skip to main content
Log in

Human actions recognition: an approach based on stable motion boundary fields

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic video action recognition have been a long-standing problem in computer vision. To obtain a scalable solution for actions recognition, it is important to have efficient visual representation of motions. In this paper, we propose a new visual representation for actions based in the body motion boundaries. The first step, a set of optical flow frames highlighting the principal motions in the poses is substracted. Then, the motion boundaries are computed from the previous optical flow frames. Maximum Stable Extremal Regions are then applied to motion boundaries maps in order to obtain Motion Stable Shape (MSS) features. Local descriptors were computed based on each detected MSS to capture motion patterns. To predict the classes of the different human actions, we have represented different descriptors with a bag-of-words (BOW) model and for classification, we use a non-linear support vector machine. We have performed a set of experiments on different datasets: Weizmann, KTH, UCF sport, UCF50 and Hollywood to prove the efficiency of our developed model. The achieved results improve the state-of-the-art on the KTH and Weizmann datasets and are comparable to state-of-the-art for UCF sport and UCF50 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Alexander K, Marcin M, Cordelia S (2008) A spatio-temporal descriptor based on 3d-gradients. In: Proceedings of the British machine vision conference. Leeds, pp 995–1004

  2. Alexander K, Marcin M, Cordelia S, Andrew Z (2010) Human focused action localization in video. In: International workshop on sign, gesture, and activity (SGA) in conjunction with ECCV, vol 21. Crete, pp 219–233

  3. Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. In: IEEE Transactions on pattern analysis and machine intelligence, vol 23. Atlanta, pp 257–267

  4. Bregonzio M, Gong S, Xiang T (2009) Recognising action as clouds of space-time interest points. In: IEEE Conference on computer vision and pattern recognition, pp 1948–1955

  5. Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE International workshop on performance evaluation of tracking and surveillance (PETS), vol 3. China, pp 65–72

  6. Efros AA, Berg AC, Greg M, Jitendra M (2003) Recognizing action at a distance. In: IEEE International conference on computer vision, vol 3. Nice, , pp 726-733

  7. Geng C, JianXin S (2015) Human action recognition based on convolutional neural networks with a convolutional auto-encoder. In: 5th International conference on computer sciences and automation engineering (ICCSAE 2015)

  8. Gilbert A, Illingworth J, Bowden R (2011) Action recognition using mined hierarchical compound features. In: IEEE Transactions on pattern analysis and machine intelligence. Guildford, pp 883–897

  9. Ivan L, Patrick P (2007) Retrieving actions in movies. In: Proceedings of the eleventh IEEE international conference on computer vision, vol 2. Rio de Janeiro, pp 1–8

  10. Jain M, Jégou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 21, pp 2555–2562

  11. Ji S, Yang WM, Yu K (2013) 3D convolutional neural networks for human action recognition. In: IEEE Transactions on pattern analysis and machine intelligence, vol 35, pp 221–231

  12. Jiang Y, Bhattacharya S, Chang S, Shah M (2013) High-level event recognition in unconstrained videos. In: International journal of multimedia information retrieval, vol 2, pp 73–101

  13. Jones S, Shao L, Zhang J, Liu Y (2012) Relevance feedback for real-world human action retrieval. In: Pattern recognition letters, vol 33, pp 444–452

  14. Klaser A, Marszafek M, Laptev I, Schmid C (2010) Will person detection help bag-of-features action recognition. In: Rapport de recherche 00514828 NRIA

  15. Konrad S, Luc J (2008) Action snippets: how many frames does human action recognition require. In: Proceedings of the IEEE international conference on computer vision and pattern recognition. Alaska, pp 1–8

  16. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE Conference on computer vision and pattern recognition, pp 2046–2053

  17. Laptev I, Marszaek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1–8

  18. Lassoued I, Zagrouba E, Chahir Y (2011) Video action classification: a new approach combining Spatio-teporal Krawtchouk moments and Laplacian Eiginmaps. In: 7th IEEE International conference on signal image technology and internet-based systems. Dijon, pp 291–301

  19. Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE Conference on computer vision and pattern recognition, vol 3. Springs, pp 3361–3368

  20. Lena G, Moshe B, Eli S, Michal I, Ronen B (2005) Actions as space time shapes. In: Proceedings of the tenth IEEE international conference on computer vision, vol 2. Washington, DC, pp 1395–1402

  21. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide baseline stereo from maximally stable extremal regions. In: Image and vision computing journal, vol 22, pp 761–767

  22. Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: action recognition through the motion analysis of tracked features. In: ICCV Workshops on video-oriented object and event classification. Japan, pp 514–521

  23. Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: IEEE International conference on computer vision, vol 21. Jâpan, pp 104–111

  24. Navneet D, Bill T (2005) Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition, vol 1. San Diego, pp 886–893

  25. Nistér D, Stewénius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of conference on computer vision and pattern recognition, vol 2, pp 2161–2168

  26. Raptis M, Soatto S (2010) Tracklet descriptors for action modeling and video analysis. In: European conference on computer vision. Crete, pp 577–590

  27. Reddy Kishore K, Shah M (2013) Recognizing 50 human action categories of web videos. In: Machine vision and applications, vol 24, pp 971–981

  28. Rodriguez M, Ahmed J, Shah M (2008) Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR, pp 85–91

  29. Sadanand S, Corso J (2012) Action bank: a high-level representation of activity in video. In: Computer vision and pattern recognition (CVPR), pp 1234–1241

  30. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of the 17th international conference on pattern recognition, vol 3, pp 332–336

  31. Scovanner P, Ali S, Shah M (2007) A 3-dimensional SIFT descriptor and its application to action recognition. In: ACM Conference on multimedia. Germany, pp 23-29

  32. Sun J, Mu Y, Yan S, Cheong LF (2010) Activity recognition using dense long-duration trajectories. In: IEEE International conference on multimedia and expo. Singapore, pp 322–327

  33. Sun L, Jia K, Yeung D, Shi B (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: IEEE International conference on computer vision, pp 4597–4605

  34. Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: European conference on computer vision. Greece, pp 140–153

  35. Thomas S, Lior W, Stanley B, Maximilian R, Tomaso P (2007) A biologically inspired system for action recognition. In: Proceedings of the eleventh IEEE international conference on computer vision. Rio de Janeiro, pp 1–8

  36. Thomas S, Lior W, Stanley B, Maximilian R, Tomaso P (2007) Robust object recognition with cortex-like mechanisms. In: IEEE Transactions on pattern analysis and machine intelligence, vol 23, pp 411–426

  37. Wang L, Suter D (2007) Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: IEEE Conference on computer vision and pattern recognition, pp 1–8

  38. Willems G, Tuytelaars T, Gool L (2008) An efficient dense and scaleinvariant spatio-temporal interest point detector. In: European conference on computer vision, vol 4. Heidelberg, pp 650–663

  39. Wong SF, Cipolla R (2007) Extracting spatiotemporal interest points using global information. In: IEEE International conference on computer vision. Rio de Janiero, pp 1–8

  40. Xin M, Zhang H, Wang H, Sun M, Yuan D (2016) ARCH: adaptive recurrent-convolutional hybrid networks for long-term action recognition. In: Neurocomputing, vol 178, pp 87–102

  41. Yap PT, Paramesran R, Ong SH (2003) Image analysis by Krawtchouk moments. In: IEEE Transactions on image processing, vol 12, pp 1367–1376

  42. Yeffet L, Wolf L (2009) Local trinary patterns for human action recognition. In: IEEE International conference on computer vision. Japan, pp 492–497

  43. Yuan J, Liu Z, Wu Y (2011) Discriminative video pattern search for efficient action detection. In: IEEE Transactions on pattern analysis and machine intelligence, vol 33, pp 1728–1743

  44. Zach C, Pock T, Bischof H (2007) A duality based approach for realtime tvl1 optical flow. In: Procedure of pattern recognition, p 214–223

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imen Lassoued.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lassoued, I., Zagrouba, E. Human actions recognition: an approach based on stable motion boundary fields. Multimed Tools Appl 77, 20715–20729 (2018). https://doi.org/10.1007/s11042-017-5477-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5477-0

Keywords

Navigation