Abstract
The goal of human action recognition is to predict the label of the action of an individual or a group of people from a video observation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Aggarwal, J., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), pp. 16:1–16:43 (2011)
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Human Behavior Understanding, Springer Berlin Heidelberg, (2011)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. vol. 35, pp. 1798–1828 (2013)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: International Conference on Computer Vision, vol. 2, pp. 1395–1402. IEEE, New York (2005)
Bregonzio, M., Gong, S., Xiang, T.: Recognizing action as clouds of space-time interest points. In: Conference on Computer Vision and Pattern Recognition (2009)
Cao, Y., Barrett, D., Barbu, A., Narayanaswamy, S., Yu, H., Michaux, A., Lin, Y., Dickinson, S., Siskind, J., Wang, S.: Recognizing human activities from partially observed videos. In: Conference on Computer Vision and Pattern Recognition (2013)
Chen, C.-C., Aggarwal, J.K.: Recognizing human action from a far field of view. In: Workshop on Motion and Video Computing, 2009, WMVC ’09 (2009)
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: European Conference on Computer Vision, pp. 215–230. Springer, Berlin (2012)
Choi, W., Shahid, K., Savarese, S.: What are they doing?: collective activity classification using spatio-temporal relationship among people. In: International Conference on Computer Vision Workshops, pp. 1282–1289 (2009)
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: Conference on Computer Vision and Pattern Recognition (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (2005)
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: International Conference on Computer Vision (2009)
Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for static human-object interactions. In: Conference on Computer Vision and Pattern Recognition Workshop on Structured Models in Computer Vision (2010)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance (2005)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: International Conference on Computer Vision, vol. 2, pp. 726–733 (2003)
Filipovych, R., Ribeiro, E.: Recognizing primitive interactions by exploring actor-object states. In: Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE, New York (2008)
Gong, S., Xiang, T.: Recognition of group activities using dynamic probabilistic networks. In: International Conference on Computer Vision, vol. 2, pp. 742–749 (2003)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human-object interactions: using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Hadfield, S., Bowden, R.: Hollywood 3d: recognizing actions in 3d natural scenes. In: Conference on Computer Vision and Pattern Recognition, Portland (2013)
Hasan, M., Roy-Chowdhury, A.K.: Continuous learning of human activity models using deep nets. In: European Conference on Computer Vision (2014)
Hoai, M., De la Torre, F.: Max-margin early event detectors. In: Conference on Computer Vision and Pattern Recognition (2012)
Hoai, M., Lan, Z.-Z., De la Torre, F.: Joint segmentation and classification of human actions in video. In: Conference on Computer Vision and Pattern Recognition (2011)
Jain, M., Jégou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: Conference on Computer Vision and Pattern Recognition (2013)
Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. In: International Conference on Machine Learning (2010)
Ji, S., Xu, W., Yang, M., Yu, K.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Jia, C., Kong, Y., Ding, Z., Fu, Y.: Latent tensor transfer learning for RGB-D action recognition. In: ACM Multimedia (2014)
Liu, J.L.J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Conference on Computer Vision and Pattern Recognition, vol. 35, pp. 1798–1828 (2009)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (2014)
Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: European Conference on Computer Vision (2012)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference (2008)
Kliper-Gross, O., Hassner, T., Wolf, L.: The action similarity labeling challenge. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), pp. 615–621 (2012)
Kong, Y., Fu, Y.: Modeling supporting regions for close human interaction recognition. In: European Conference on Computer Vision Workshop (2014)
Kong, Y., Fu, Y.: Bilinear heterogeneous information machine for RGB-D action recognition. In: Conference on Computer Vision and Pattern Recognition (2015)
Kong, Y., Fu, Y.: Max-margin action prediction machine. IEEE Trans. Pattern Anal. Mach. Intell. (2015)
Kong, Y., Jia, Y., Fu, Y.: Learning human interaction by interactive phrases. In: European Conference on Computer Vision (2012)
Kong, Y., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: European Conference on Computer Vision (2014)
Kong, Y., Jia, Y., Fu, Y.: Interactive phrases: semantic descriptions for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. vol. 36, pp. 1775–1788 (2014)
Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. vol. 32, pp. 951–970 (2013)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 2046–2053. IEEE, New York (2010)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: A large video database for human motion recognition. In: International Conference on Computer Vision (2011)
Kurakin, A., Zhang, Z., Liu, Z.: A real-time system for dynamic hand gesture recognition with a depth sensor. In: European Signal Processing Conference (2012)
Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1549–1562 (2012)
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)
Laptev, I., Lindeberg, T.: Space-time interest points. In: International Conference on Computer Vision, pp. 432–439 (2003)
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Conference on Computer Vision and Pattern Recognition (2008)
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Conference on Computer Vision and Pattern Recognition (2011)
Li, R., Chellappa, R., Zhou, S.K.: Learning multi-modal densities on discriminative temporal interaction manifold for group activity recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 2450–2457 (2009)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: Conference on Computer Vision and Pattern Recognition Workshop (2010)
Li, K., Hu, J., Fu, Y.: Modeling complex temporal composition of actionlets for activity prediction. In: European Conference on Computer Vision (2012)
Li, S., Li, K., Fu, Y.: Temporal subspace clustering for human motion segmentation. In: International Conference on Computer Vision (2015)
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. In: International Joint Conference on Artificial Intelligence (2013)
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: Conference on Computer Vision and Pattern Recognition (2008)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2009)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Conference on Computer Vision and Pattern Recognition (2011)
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
Marszałek, M., Laptev, I., Schmid, C.: Actions in context. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2009)
Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: Conference on Computer Vision and Pattern Recognition (2009)
Ni, B., Yan, S., Kassim, A.A.: Recognizing human group activities with localized causalities. In: Conference on Computer Vision and Pattern Recognition, pp. 1470–1477 (2009)
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: European Conference on Computer Vision (2010)
Odashima, S., Shimosaka, M., Kaneko, T., Fuikui, R., Sato, T.: Collective activity localization with contextual spatial pyramid. In: European Conference on Computer Vision (2012)
Oliver, N.M., Rosario, B., Pentland, A.P.: A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)
Oreifej, O., Liu, Z.: Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences. In: Conference on Computer Vision and Pattern Recognition (2013)
Patron-Perez, A., Marszalek, M., Zissermann, A., Reid, I.: High five: recognising human interactions in tv shows. In: British Machine Vision Conference (2010)
Patron-Perez, A., Marszalek, M., Reid, I., Zissermann, A.: Structured learning of human interaction in tv shows. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2441–2453 (2012)
Pl otz, T., Hammerla, N.Y., Olivier, P.: Feature learning for activity recognition in ubiquitous computing. In: International Joint Conference on Artificial Intelligence (2011)
Raptis, M., Sigal, L.: Poselet key-framing: a model for human activity recognition. In: Conference on Computer Vision and Pattern Recognition (2013)
Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: European Conference on Computer Vision (2010)
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. vol. 24, pp 971–981 (2012)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: Conference on Computer Vision and Pattern Recognition (2008)
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: International Conference on Computer Vision (2011)
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1709–1718 (2006)
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: International Conference on Computer Vision, pp. 1593–1600 (2009)
Ryoo, M., Aggarwal, J.: Ut-interaction dataset, ICPR contest on semantic description of human activities. http://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html (2010)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Indian Council of Philosophical Research, vol. 3, pp. 32–36. IEEE, New York (2004)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of ACM Multimedia (2007)
Shechtman, E., Irani, M.: Space-time behavior based correlation. In: Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 405–412. IEEE, New York (2005)
Shi, Q., Cheng, L., Wang, L., Smola, A.: Human action segmentation and recognition using discriminative semi-Markov models. Int. J. Comput. Vis. 93, 22–32 (2011)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Conference on Neural Information Processing Systems, (2014)
Singh, S.A.V.S., Ragheb, H.: Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: 2nd Workshop on Activity Monitoring by Multi-Camera Surveillance Systems (AMMCSS), pp. 48–55 (2010)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human action classes from videos in the wild (2012). CRCV-TR-12-01
Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: Conference on Computer Vision and Pattern Recognition (2009)
Sun, L., Jia, K., Chan, T.-H., Fang, Y., Wang, G., Yan, S.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: Conference on Computer Vision and Pattern Recognition (2014)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from RGBD images. In: AAAI Workshop on Pattern, Activity and Intent Recognition (2011)
Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: Conference on Computer Vision and Pattern Recognition (2012)
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. In: European Conference on Computer Vision (2010)
Vahdat, A., Gao, B., Ranjbar, M., Mori, G.: A discriminative key pose sequence model for recognizing human interactions. In: International Conference on Computer Vision Workshops, pp. 1729–1736 (2011)
Wang, Y., Mori, G.: Hidden part models for human action recognition: probabilistic vs. max-margin. IEEE Trans. Pattern Anal. Mach. Intell. vol. 33, pp. 1310–1323 (2010)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, Sydney (2013). http://hal.inria.fr/hal-00873267
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference (2008)
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference (2009)
Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision & Pattern Recognition, pp. 3169–3176, Colorado Springs (2011). http://hal.inria.fr/inria-00583818/en
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Conference on Computer Vision and Pattern Recognition (2012)
Wang, J., Liu, Z., Chorowski, J., Chen, Z., Wu, Y.: Robust 3d action recognition with random occupancy patterns. In: European Conference on Computer Vision (2012)
Wang, Z., Wang, J., Xiao, J., Lin, K.-H., Huang, T.S.: Substructural and boundary modeling for continuous action recognition. In: Conference on Computer Vision and Pattern Recognition (2012)
Wang, H., Kla aser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103 pp. 60–79 (2013)
Wang, K., Wang, X., Lin, L., Wang, M., Zuo, W.: 3d human activity recognition with reconfigurable convolutional neural networks. In: ACM Multimedia (2014)
Wang, H., Oneata, D., Verbeek, J., Schmid, C.: A robust and efficient video representation for action recognition. Int. J. Comput. Vis. (2015)
Wang, P., Li, W., Gao, Z., Tang, J.Z.C., Ogunbona, P.: Deep convolutional neural networks for action recognition using depth map sequences. arXiv:1501.04686
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2–3), pp. 249–257 (2006)
Willems, G., Tuytelaars, T., Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: European Conference on Computer Vision (2008)
Xia, L., Aggarwal, J.K.: Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: Conference on Computer Vision and Pattern Recognition (2013)
Xia, L., Chen, C., Aggarwal, J.K.: View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 20–27. IEEE, New York (2012)
Yang, Y., Shah, M.: Complex events detection using data-driven concepts. In: European Conference on Computer Vision (2012)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010)
Yao, B., Fei-Fei, L.: Action recognition with exemplar based 2.5d graph matching. In: European Conference on Computer Vision (2012)
Yao, B., Fei-Fei, L.: Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1691–1703 (2012)
Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Conference on Computer Vision and Pattern Recognition (2009)
Yu, T.-H., Kim, T.-K., Cipolla, R.: Real-time action recognition by spatiotemporal semantic and structural forests. In: British Machine Vision Conference (2010)
Yu, G., Liu, Z., Yuan, J.: Discriminative orderlet mining for real-time recognition of human-object interaction. In: Asian Conference on Computer Vision (2014)
Yuan, J., Liu, Z., Wu, Y.: Discriminative subvolume search for efficient action detection. In: Conference on Computer Vision and Pattern Recognition (2009)
Yuan, J., Liu, Z., Wu, Y.: Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kong, Y., Fu, Y. (2016). Introduction. In: Fu, Y. (eds) Human Activity Recognition and Prediction. Springer, Cham. https://doi.org/10.1007/978-3-319-27004-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-27004-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27002-9
Online ISBN: 978-3-319-27004-3
eBook Packages: EngineeringEngineering (R0)