Modeling and Recognition of Complex Human Activities

Nayak, Nandita M.; Sethi, Ricky J.; Song, Bi; Roy-Chowdhury, Amit K.

doi:10.1007/978-0-85729-997-0_15

Nandita M. Nayak⁵,
Ricky J. Sethi^6,7,
Bi Song⁵ &
…
Amit K. Roy-Chowdhury⁵

3148 Accesses
16 Citations

Abstract

Activity recognition is a field of computer vision which has shown great progress in the past decade. Starting from simple single person activities, research in activity recognition is moving toward more complex scenes involving multiple objects and natural environments. The main challenges in the task include being able to localize and recognize events in a video and deal with the large amount of variation in viewpoint, speed of movement and scale. This chapter gives the reader an overview of the work that has taken place in activity recognition, especially in the domain of complex activities involving multiple interacting objects. We begin with a description of the challenges in activity recognition and give a broad overview of the different approaches. We go into the details of some of the feature descriptors and classification strategies commonly recognized as being the state of the art in this field. We then move to more complex recognition systems, discussing the challenges in complex activity recognition and some of the work which has taken place in this respect. Finally, we provide some examples of recent work in complex activity recognition. The ability to recognize complex behaviors involving multiple interacting objects is a very challenging problem and future work needs to study its various aspects of features, recognition strategies, models, robustness issues, and context, to name a few.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)
Article Google Scholar
Anderson, P.A.: Nonverbal Communication: Forms and Functions, 2nd edn. Waveland Press, Long Grove (2008)
Google Scholar
Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12, 43–77 (1994)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2001)
Google Scholar
Benezeth, Y., Jodoin, P.M., Saligrama, V., Rosenberger, C.: Abnormal events detection based on spatio-temporal co-occurrences. In: Computer Vision and Pattern Recognition, pp. 2458–2465 (2009)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1395–1402 (2005)
Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Article Google Scholar
Chaudhary, R., Ravichandran, A., Hager, G.D., Vidal, R.: Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: Computer Vision and Pattern Recognition, pp. 1932–1939 (2009)
Google Scholar
Cinbis, N.I., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: European Conference on Computer Vision, pp. 494–507 (2010)
Google Scholar
Cock, K.D., Moor, B.D.: Subspace angles and distances between ARMA models. Syst. Control Lett. 46(4), 265–270 (2002)
Article MATH Google Scholar
Cuntoor, N.P., Chellappa, R.: Epitomic representation of human activities. In: Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Google Scholar
Denina, G., Bhanu, B., Nguyen, H., Ding, C., Kamal, A., Ravishanka, C., Roy-Chowdhury, A., Ivers, A., Varda, B.: Videoweb dataset for multi-camera activities and non-verbal communication. In: Distributed Video Sensor Networks. Springer, London (2010)
Google Scholar
Ding, L., Yilmaz, A.: Learning relations among movie characters: A social network perspective. In: European Conference on Computer Vision, pp. 410–423 (2010)
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Chapter Google Scholar
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: International Conference of Computer Vision, pp. 726–733 (2003)
Chapter Google Scholar
Forstner, W., Gulch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data, pp. 281–305 (1987)
Google Scholar
Gaur, U.: Complex activity recognition using string of feature graphs. Master’s thesis, University of California, Riverside, CA, USA (2010)
Google Scholar
Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer, Dordrecht (1995)
Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Fourth Alvey Vision Conference, pp. 147–151 (1988)
Google Scholar
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)
Article Google Scholar
Wang, H., Niebles, J.C., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: British Machine Vision Conference (2006)
Google Scholar
Jiang, F., Yuan, J., Tsaftaris, S.A., Katsaggelos, A.K.: Anomalous video event detection using spatiotemporal context. Comput. Vis. Image Underst. 115, 323–333 (2011)
Article Google Scholar
Joo, S.W., Chellappa, R.: Attribute grammar-based event recognition and anomaly detection. In: Computer Vision and Pattern Recognition Workshop, p. 107 (2006)
Google Scholar
Kale, A., Sundaresan, A., Rajagopalan, A.N., Cuntoor, N.P., Roy-Chowdhury, A.K., Krueger, V., Chellappa, R.: Identification of humans using gait. IEEE Trans. Image Process. 13, 1163–1173 (2004)
Article Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: International Conference on Computer Vision, vol. 1, pp. 166–173 (2005)
Google Scholar
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space–time neighborhood features for human action recognition. In: Computer Vision and Pattern Recognition, pp. 2046–2053 (2010)
Google Scholar
Kuettel, D., Breitenstein, M.D., Gool, L.J.V., Ferrari, V.: What’s going on? discovering spatio-temporal dependencies in dynamic scenes. In: Computer Vision and Pattern Recognition, pp. 1951–1958 (2010)
Google Scholar
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: First International Workshop on Spatial Coherence for Visual Motion Analysis (2004)
Google Scholar
Lee, M.W., Nevatia, R.: Human pose tracking in monocular sequence using multilevel structured models. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 27–38 (2009)
Article Google Scholar
Leordeanu, M., Hebert, M.: A spectral technique for correspondence problems using pairwise constraints. In: International Conference of Computer Vision, vol. 2, pp. 1482–1489 (October 2005)
Google Scholar
Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30, 79–116 (1998)
Article Google Scholar
Liu, H., Feris, R.S., Krueger, V., Sun, M.T.: Unsupervised action classification using space–time link analysis. EURASIP J. Image Video Process. 2010, Article ID 626324 (2010)
Article Google Scholar
Liu, Z., Sarkar, S.: Improved gait recognition by gait dynamics normalization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2006 (2006)
Article Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1150–1157 (1999)
Chapter Google Scholar
Makris, D., Ellis, T.: Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3), 397–408 (2005)
Article Google Scholar
Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: European Conference on Computer Vision (September 2010)
Google Scholar
Medioni, G., Nevatia, R., Cohen, I.: Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. 23, 873–889 (1998)
Article Google Scholar
Mehran, R., Moore, B.E., Shah, M.: A streakline representation of flow in crowded scenes. In: European Conference on Computer Vision, pp. 439–452 (2010)
Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005)
Article Google Scholar
Natarajan, P., Singh, V.K., Nevatia, R.: Learning 3d action models from a few 2d videos for view invariant action recognition. In: Computer Vision and Pattern Recognition, pp. 2006–2013 (2010)
Google Scholar
North, B., Blake, A., Isard, M., Rittscher, J.: Learning and classification of complex dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1016–1034 (2000)
Article Google Scholar
Park, S.: A hierarchical Bayesian network for event recognition of human actions and interactions. Assoc. Comput. Mach. Multimedia Syst. J. 10, 164–179 (2004)
Google Scholar
Park, S., Aggarwal, J.K.: Recognition of two-person interactions using a hierarchical Bayesian network. In: ACM SIGMM International Workshop on Video Surveillance, New York, NY, USA, pp. 65–76 (2003)
Chapter Google Scholar
Polana, R., Nelson, R.C.: Detection and recognition of periodic, nonrigid motion. Int. J. Comput. Vis. 23(3), 261–282 (1997)
Article Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Ryoo, M.S., Yu, W.: One video is sufficient? human activity recognition using active video composition. In: IEEE Workshop on Motion and Video Computing (2011)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Computer Vision and Pattern Recognition, pp. 1709–1718 (2006)
Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vis. 82(1), 1–24 (2009)
Article Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: International Conference on Computer Vision, pp. 1593–1600 (2009)
Chapter Google Scholar
Ryoo, M.S., Chen, C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: International Conference on Pattern Recognition, Berlin, Heidelberg, pp. 270–285 (2010)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)
Article MATH Google Scholar
Savarese, S., DelPozo, A., Niebles, J.C., Fei-Fei, L.: Spatial-temporal correlations for unsupervised action classification. In: IEEE Workshop on Motion and Video Computing (2008)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition (2004)
Google Scholar
Seo, H.J., Milanfar, P.: Detection of human actions from a single example. In: International Conference on Computer Vision (2009)
Google Scholar
Sethi, R.J., Roy-Chowdhury, A.K., Ali, S.: Activity recognition by integrating the physics of motion with a neuromorphic model of perception. In: IEEE Workshop on Motion and Video Computing (2009)
Google Scholar
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: Computer Vision and Pattern Recognition (2007)
Google Scholar
Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)
Article Google Scholar
Vaswani, N., Roy-Chowdhury, A., Chellappa, R.: “Shape activity”: A continuous state HMM for moving/deforming shapes with application to abnormal activity detection. IEEE Trans. Image Process. 14, 1603–1616 (2005)
Article Google Scholar
Wersborg, I.S., Bautze, T., Born, F., Diepold, K.: A cognitive approach for a robotic welding system that can learn how to weld from acoustic data. In: Computational Intelligence in Robotics and Automation, Piscataway, NJ, USA, pp. 108–113 (2009)
Google Scholar
Yilmaz, A., Shah, M.: Actions sketch: A novel action representation. In: Computer Vision and Pattern Recognition, vol. 1, pp. 984–989 (2005)
Google Scholar
Young, R.A., Lesperance, R.M.: The Gaussian derivative model for spatial-temporal vision. Spat. Vis. 2001, 3–4 (2001)
Google Scholar
Zeng, Z., Qiang, J.: Knowledge based activity recognition with dynamic Bayesian network. In: European Conference in Computer Vision, Crete, Greece (2010)
Google Scholar
Zhang, Z., Huang, K.Q., Tan, T.N.: Complex activity representation and recognition by extended stochastic grammar. In: Asian Conference on Computer Vision, pp. 150–159 (2006)
Google Scholar

Download references

Acknowledgements

This work has been partially supported by the DARPA VIRAT program and NSF award IIS-0905671.

Author information

Authors and Affiliations

University of California, Riverside, 900 University Ave. Riverside, CA, 92521, USA
Nandita M. Nayak, Bi Song & Amit K. Roy-Chowdhury
University of California, Los Angeles, 4532 Boelter Hall, CA, 90095-1596, USA
Ricky J. Sethi
Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Rey, CA, 90292, USA
Ricky J. Sethi

Authors

Nandita M. Nayak
View author publications
You can also search for this author in PubMed Google Scholar
Ricky J. Sethi
View author publications
You can also search for this author in PubMed Google Scholar
Bi Song
View author publications
You can also search for this author in PubMed Google Scholar
Amit K. Roy-Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nandita M. Nayak .

Editor information

Editors and Affiliations

Department of Media Technology, Aalborg University, Niels Jernes Vej 14, Aalborg, 9220, Denmark
Thomas B. Moeslund
Centre for Vision, Speech & Signal Proc., University of Surrey, Guildford, GU2 7XH, Surrey, United Kingdom
Adrian Hilton
Copenhagen Institute of Technology, Aalborg University, Lautrupvang 2B, Ballerup, 2750, Denmark
Volker Krüger
Disney Research, Forbes Avenue 615, Pittsburgh, 15213, Pennsylvania, USA
Leonid Sigal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nayak, N.M., Sethi, R.J., Song, B., Roy-Chowdhury, A.K. (2011). Modeling and Recognition of Complex Human Activities. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds) Visual Analysis of Humans. Springer, London. https://doi.org/10.1007/978-0-85729-997-0_15

Download citation

DOI: https://doi.org/10.1007/978-0-85729-997-0_15
Publisher Name: Springer, London
Print ISBN: 978-0-85729-996-3
Online ISBN: 978-0-85729-997-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics