Abstract
In this paper, global-level view-invariant descriptors for human action recognition using 3D reconstruction data are proposed. 3D reconstruction techniques are employed for addressing two of the most challenging issues related to human action recognition in the general case, namely view-variance and the presence of (self-) occlusions. Initially, a set of calibrated Kinect sensors are employed for producing a 3D reconstruction of the performing subjects. Subsequently, a 3D flow field is estimated for every captured frame. For performing action recognition, a novel global 3D flow descriptor is introduced, which achieves to efficiently encode the global motion characteristics in a compact way, while also incorporating spatial distribution related information. Additionally, a new global temporal-shape descriptor that extends the notion of 3D shape descriptions for action recognition, by including temporal information, is also proposed. The latter descriptor efficiently addresses the inherent problems of temporal alignment and compact representation, while also being robust in the presence of noise. Experimental results using public datasets demonstrate the efficiency of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Borges, P.V.K., Conci, N., Cavallaro, A.: Video-based human behavior understanding: a survey. IEEE Trans. Circuits Syst. Video Technol. 23(11), 1993–2008 (2013)
Budd, C., Huang, P., Klaudiny, M., Hilton, A.: Global non-rigid alignment of surface sequences. Int. J. Comput. Vis. 102(1–3), 256–270 (2013)
Cai, X., Zhou, W., Wu, L., Luo, J., Li, H.: Effective active skeleton representation for low latency human action recognition. IEEE Trans. Multimedia 18(2), 141–154 (2016)
Cheng, Z., Qin, L., Ye, Y., Huang, Q., Tian, Q.: Human daily action analysis with multi-view and color-depth data. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7584, pp. 52–61. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33868-7_6
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 726–733. IEEE (2003)
Fanello, S.R., Gori, I., Metta, G., Odone, F.: Keep it simple and sparse: real-time action recognition. J. Mach. Learn. Res. 14(1), 2617–2640 (2013)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Gori, I., Fanello, S.R., Odone, F., Metta, G.: A compositional approach for 3D arm-hand action recognition. In: 2013 IEEE Workshop on Robot Vision (WORV), pp. 126–131. IEEE (2013)
Holte, M.B., Chakraborty, B., Gonzalez, J., Moeslund, T.B.: A local 3-D motion descriptor for multi-view human action recognition from 4-D spatio-temporal interest points. IEEE J. Sel. Top. Sig. Process. 6(5), 553–565 (2012)
Huang, P., Hilton, A., Starck, J.: Shape similarity for 3D video sequences of people. Int. J. Comput. Vis. 89(2–3), 362–381 (2010)
Ji, X., Liu, H.: Advances in view-invariant human motion analysis: a review. IEEE Trans. Syst. Man. Cybern. Part C Appl. Rev. 40(1), 13–24 (2010)
Munaro, M., Ballin, G., Michieletto, S., Menegatti, E.: 3D flow estimation for human action recognition from colored point clouds. Biologically Inspired Cogn. Architectures 5, 42–51 (2013)
Ohkita, Y., Ohishi, Y., Furuya, T., Ohbuchi, R.: Non-rigid 3D model retrieval using set of local statistical features. In: 2012 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 593–598. IEEE (2012)
Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. (TOG) 21(4), 807–832 (2002)
Papadopoulos, G.T., Axenopoulos, A., Daras, P.: Real-time skeleton-tracking-based human action recognition using kinect data. In: International Conference on MultiMedia Modeling, pp. 473–483 (2014)
Papadopoulos, G.T., Daras, P.: Local descriptions for human action recognition from 3d reconstruction data. In: IEEE International Conference on Image Processing (ICIP 2014), pp. 2814–2818, November 2014
Sizintsev, M., Wildes, R.P.: Spatiotemporal stereo and scene flow via stequel matching. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1206–1219 (2012)
Slama, R., Wannous, H., Daoudi, M.: 3D human motion analysis framework for shape similarity and retrieval. Image Vis. Comput. 32(2), 131–154 (2014)
Sun, L., Aizawa, K.: Action recognition using invariant features under unexampled viewing conditions. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 389–392. ACM (2013)
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 872–885. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33709-3_62
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1290–1297. IEEE (2012)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)
Xia, L., Aggarwal, J.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2834–2841. IEEE (2013)
Xia, L., Chen, C.-C., Aggarwal, J.: View invariant human action recognition using histograms of 3D joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27. IEEE (2012)
Xia, L., Gori, I., Aggarwal, J., Ryoo, M.: Robot-centric activity recognition from first-person rgb-d videos (2015)
Yamasaki, T., Aizawa, K.: Motion segmentation and retrieval for 3D video based on modified shape distribution. EURASIP J. Appl. Sig. Process. 2007(1), 211–211 (2007)
Acknowledgment
The work presented in this paper was supported by the European Commission under contract H2020-700367 DANTE.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Papadopoulos, G.T., Daras, P. (2017). Global Flow and Temporal-Shape Descriptors for Human Action Recognition from 3D Reconstruction Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-62416-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)