Abstract
Activity recognition is a field of computer vision which has shown great progress in the past decade. Starting from simple single person activities, research in activity recognition is moving toward more complex scenes involving multiple objects and natural environments. The main challenges in the task include being able to localize and recognize events in a video and deal with the large amount of variation in viewpoint, speed of movement and scale. This chapter gives the reader an overview of the work that has taken place in activity recognition, especially in the domain of complex activities involving multiple interacting objects. We begin with a description of the challenges in activity recognition and give a broad overview of the different approaches. We go into the details of some of the feature descriptors and classification strategies commonly recognized as being the state of the art in this field. We then move to more complex recognition systems, discussing the challenges in complex activity recognition and some of the work which has taken place in this respect. Finally, we provide some examples of recent work in complex activity recognition. The ability to recognize complex behaviors involving multiple interacting objects is a very challenging problem and future work needs to study its various aspects of features, recognition strategies, models, robustness issues, and context, to name a few.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)
Anderson, P.A.: Nonverbal Communication: Forms and Functions, 2nd edn. Waveland Press, Long Grove (2008)
Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12, 43–77 (1994)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2001)
Benezeth, Y., Jodoin, P.M., Saligrama, V., Rosenberger, C.: Abnormal events detection based on spatio-temporal co-occurrences. In: Computer Vision and Pattern Recognition, pp. 2458–2465 (2009)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1395–1402 (2005)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Chaudhary, R., Ravichandran, A., Hager, G.D., Vidal, R.: Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: Computer Vision and Pattern Recognition, pp. 1932–1939 (2009)
Cinbis, N.I., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: European Conference on Computer Vision, pp. 494–507 (2010)
Cock, K.D., Moor, B.D.: Subspace angles and distances between ARMA models. Syst. Control Lett. 46(4), 265–270 (2002)
Cuntoor, N.P., Chellappa, R.: Epitomic representation of human activities. In: Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Denina, G., Bhanu, B., Nguyen, H., Ding, C., Kamal, A., Ravishanka, C., Roy-Chowdhury, A., Ivers, A., Varda, B.: Videoweb dataset for multi-camera activities and non-verbal communication. In: Distributed Video Sensor Networks. Springer, London (2010)
Ding, L., Yilmaz, A.: Learning relations among movie characters: A social network perspective. In: European Conference on Computer Vision, pp. 410–423 (2010)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: International Conference of Computer Vision, pp. 726–733 (2003)
Forstner, W., Gulch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data, pp. 281–305 (1987)
Gaur, U.: Complex activity recognition using string of feature graphs. Master’s thesis, University of California, Riverside, CA, USA (2010)
Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer, Dordrecht (1995)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Fourth Alvey Vision Conference, pp. 147–151 (1988)
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)
Wang, H., Niebles, J.C., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: British Machine Vision Conference (2006)
Jiang, F., Yuan, J., Tsaftaris, S.A., Katsaggelos, A.K.: Anomalous video event detection using spatiotemporal context. Comput. Vis. Image Underst. 115, 323–333 (2011)
Joo, S.W., Chellappa, R.: Attribute grammar-based event recognition and anomaly detection. In: Computer Vision and Pattern Recognition Workshop, p. 107 (2006)
Kale, A., Sundaresan, A., Rajagopalan, A.N., Cuntoor, N.P., Roy-Chowdhury, A.K., Krueger, V., Chellappa, R.: Identification of humans using gait. IEEE Trans. Image Process. 13, 1163–1173 (2004)
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: International Conference on Computer Vision, vol. 1, pp. 166–173 (2005)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space–time neighborhood features for human action recognition. In: Computer Vision and Pattern Recognition, pp. 2046–2053 (2010)
Kuettel, D., Breitenstein, M.D., Gool, L.J.V., Ferrari, V.: What’s going on? discovering spatio-temporal dependencies in dynamic scenes. In: Computer Vision and Pattern Recognition, pp. 1951–1958 (2010)
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: First International Workshop on Spatial Coherence for Visual Motion Analysis (2004)
Lee, M.W., Nevatia, R.: Human pose tracking in monocular sequence using multilevel structured models. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 27–38 (2009)
Leordeanu, M., Hebert, M.: A spectral technique for correspondence problems using pairwise constraints. In: International Conference of Computer Vision, vol. 2, pp. 1482–1489 (October 2005)
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30, 79–116 (1998)
Liu, H., Feris, R.S., Krueger, V., Sun, M.T.: Unsupervised action classification using space–time link analysis. EURASIP J. Image Video Process. 2010, Article ID 626324 (2010)
Liu, Z., Sarkar, S.: Improved gait recognition by gait dynamics normalization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2006 (2006)
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1150–1157 (1999)
Makris, D., Ellis, T.: Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3), 397–408 (2005)
Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: European Conference on Computer Vision (September 2010)
Medioni, G., Nevatia, R., Cohen, I.: Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. 23, 873–889 (1998)
Mehran, R., Moore, B.E., Shah, M.: A streakline representation of flow in crowded scenes. In: European Conference on Computer Vision, pp. 439–452 (2010)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005)
Natarajan, P., Singh, V.K., Nevatia, R.: Learning 3d action models from a few 2d videos for view invariant action recognition. In: Computer Vision and Pattern Recognition, pp. 2006–2013 (2010)
North, B., Blake, A., Isard, M., Rittscher, J.: Learning and classification of complex dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1016–1034 (2000)
Park, S.: A hierarchical Bayesian network for event recognition of human actions and interactions. Assoc. Comput. Mach. Multimedia Syst. J. 10, 164–179 (2004)
Park, S., Aggarwal, J.K.: Recognition of two-person interactions using a hierarchical Bayesian network. In: ACM SIGMM International Workshop on Video Surveillance, New York, NY, USA, pp. 65–76 (2003)
Polana, R., Nelson, R.C.: Detection and recognition of periodic, nonrigid motion. Int. J. Comput. Vis. 23(3), 261–282 (1997)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Ryoo, M.S., Yu, W.: One video is sufficient? human activity recognition using active video composition. In: IEEE Workshop on Motion and Video Computing (2011)
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Computer Vision and Pattern Recognition, pp. 1709–1718 (2006)
Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vis. 82(1), 1–24 (2009)
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: International Conference on Computer Vision, pp. 1593–1600 (2009)
Ryoo, M.S., Chen, C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: International Conference on Pattern Recognition, Berlin, Heidelberg, pp. 270–285 (2010)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Savarese, S., DelPozo, A., Niebles, J.C., Fei-Fei, L.: Spatial-temporal correlations for unsupervised action classification. In: IEEE Workshop on Motion and Video Computing (2008)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition (2004)
Seo, H.J., Milanfar, P.: Detection of human actions from a single example. In: International Conference on Computer Vision (2009)
Sethi, R.J., Roy-Chowdhury, A.K., Ali, S.: Activity recognition by integrating the physics of motion with a neuromorphic model of perception. In: IEEE Workshop on Motion and Video Computing (2009)
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: Computer Vision and Pattern Recognition (2007)
Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)
Vaswani, N., Roy-Chowdhury, A., Chellappa, R.: “Shape activity”: A continuous state HMM for moving/deforming shapes with application to abnormal activity detection. IEEE Trans. Image Process. 14, 1603–1616 (2005)
Wersborg, I.S., Bautze, T., Born, F., Diepold, K.: A cognitive approach for a robotic welding system that can learn how to weld from acoustic data. In: Computational Intelligence in Robotics and Automation, Piscataway, NJ, USA, pp. 108–113 (2009)
Yilmaz, A., Shah, M.: Actions sketch: A novel action representation. In: Computer Vision and Pattern Recognition, vol. 1, pp. 984–989 (2005)
Young, R.A., Lesperance, R.M.: The Gaussian derivative model for spatial-temporal vision. Spat. Vis. 2001, 3–4 (2001)
Zeng, Z., Qiang, J.: Knowledge based activity recognition with dynamic Bayesian network. In: European Conference in Computer Vision, Crete, Greece (2010)
Zhang, Z., Huang, K.Q., Tan, T.N.: Complex activity representation and recognition by extended stochastic grammar. In: Asian Conference on Computer Vision, pp. 150–159 (2006)
Acknowledgements
This work has been partially supported by the DARPA VIRAT program and NSF award IIS-0905671.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Nayak, N.M., Sethi, R.J., Song, B., Roy-Chowdhury, A.K. (2011). Modeling and Recognition of Complex Human Activities. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds) Visual Analysis of Humans. Springer, London. https://doi.org/10.1007/978-0-85729-997-0_15
Download citation
DOI: https://doi.org/10.1007/978-0-85729-997-0_15
Publisher Name: Springer, London
Print ISBN: 978-0-85729-996-3
Online ISBN: 978-0-85729-997-0
eBook Packages: Computer ScienceComputer Science (R0)