Abstract
In this paper, we present a novel descriptor for human action recognition, called Motion of Oriented Magnitudes Patterns (MOMP), which considers the relationships between the local gradient distributions of neighboring patches coming from successive frames in video. The proposed descriptor also characterizes the information changing across different orientations, is therefore very discriminative and robust. The major advantages of MOMP are its very fast computation time and simple implementation. Subsequently, our features are combined with an effective coding scheme VLAD (Vector of locally aggregated descriptors) in the feature representation step, and a SVM (Support Vector Machine) classifier in order to better represent and classify the actions. By experimenting on several common benchmarks, we obtain the state-of-the-art results on the KTH dataset as well as the performance comparable to the literature on the UCF Sport dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. Pattern Anal. Mach. Intell. 35, 221–231 (2013)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE Conference on CVPR 2011, pp. 3361–3368. IEEE (2011)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497. IEEE (2015)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on CVPR 2008, pp. 1–8. IEEE (2008)
Wei, Q., Zhang, X., Kong, Y., Hu, W., Ling, H.: Group action recognition using space-time interest points. In: Bebis, G., et al. (eds.) ISVC 2009. LNCS, vol. 5876, pp. 757–766. Springer, Heidelberg (2009). doi:10.1007/978-3-642-10520-3_72
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 60–79 (2013)
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC 2009-British Machine Vision Conference, pp. 124:1–124:11. BMVA Press (2009)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008–19th British Machine Vision Conference, pp. 275:1–275:10. British Machine Vision Association (2008)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). doi:10.1007/11744047_33
Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: IEEE 12th International Conference on Computer Vision, pp. 492–497. IEEE (2009)
Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 256–269. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_19
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on CVPR 2010, pp. 3304–3311. IEEE (2010)
Vu, N.-S., Caplier, A.: Face recognition with patterns of oriented edge magnitudes. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 313–326. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15549-9_23
Vu, N.S.: Exploring patterns of gradient orientations and magnitudes for face recognition. Inf. Forensics Secur. 8, 295–304 (2013)
Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR 2013, pp. 2555–2562 (2013)
Kantorov, V., Laptev, I.: Efficient feature extraction, encoding and classification for action recognition. In: Proceedings of the IEEE Conference on CVPR, pp. 2593–2600 (2014)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM TIST 2, 27 (2011)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: 2008 IEEE Conference on CVPR, pp. 1–8. IEEE (2008)
Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1234–1241. IEEE (2012)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for har. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2046–2053. IEEE (2010)
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15567-3_11
Liu, L., Shao, L., Li, X., Lu, K.: Learning spatio-temporal representations for action recognition: a genetic programming approach. IEEE Trans. Cybern. 46, 158–170 (2016)
Kläser, A.: Learning human actions in video. Ph.D. thesis, Université de Grenoble (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Phan, HH., Vu, NS., Nguyen, VL., Quoy, M. (2016). Motion of Oriented Magnitudes Patterns for Human Action Recognition. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2016. Lecture Notes in Computer Science(), vol 10073. Springer, Cham. https://doi.org/10.1007/978-3-319-50832-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-50832-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50831-3
Online ISBN: 978-3-319-50832-0
eBook Packages: Computer ScienceComputer Science (R0)