Skip to main content

Modeling and Recognition of Complex Human Activities

  • Chapter

Abstract

Activity recognition is a field of computer vision which has shown great progress in the past decade. Starting from simple single person activities, research in activity recognition is moving toward more complex scenes involving multiple objects and natural environments. The main challenges in the task include being able to localize and recognize events in a video and deal with the large amount of variation in viewpoint, speed of movement and scale. This chapter gives the reader an overview of the work that has taken place in activity recognition, especially in the domain of complex activities involving multiple interacting objects. We begin with a description of the challenges in activity recognition and give a broad overview of the different approaches. We go into the details of some of the feature descriptors and classification strategies commonly recognized as being the state of the art in this field. We then move to more complex recognition systems, discussing the challenges in complex activity recognition and some of the work which has taken place in this respect. Finally, we provide some examples of recent work in complex activity recognition. The ability to recognize complex behaviors involving multiple interacting objects is a very challenging problem and future work needs to study its various aspects of features, recognition strategies, models, robustness issues, and context, to name a few.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Comput. Vis. Image Underst. 73(3), 428–440 (1999)

    Article  Google Scholar 

  2. Anderson, P.A.: Nonverbal Communication: Forms and Functions, 2nd edn. Waveland Press, Long Grove (2008)

    Google Scholar 

  3. Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. J. Comput. Vis. 12, 43–77 (1994)

    Article  Google Scholar 

  4. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2001)

    Google Scholar 

  5. Benezeth, Y., Jodoin, P.M., Saligrama, V., Rosenberger, C.: Abnormal events detection based on spatio-temporal co-occurrences. In: Computer Vision and Pattern Recognition, pp. 2458–2465 (2009)

    Google Scholar 

  6. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1395–1402 (2005)

    Google Scholar 

  7. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  8. Chaudhary, R., Ravichandran, A., Hager, G.D., Vidal, R.: Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: Computer Vision and Pattern Recognition, pp. 1932–1939 (2009)

    Google Scholar 

  9. Cinbis, N.I., Sclaroff, S.: Object, scene and actions: Combining multiple features for human action recognition. In: European Conference on Computer Vision, pp. 494–507 (2010)

    Google Scholar 

  10. Cock, K.D., Moor, B.D.: Subspace angles and distances between ARMA models. Syst. Control Lett. 46(4), 265–270 (2002)

    Article  MATH  Google Scholar 

  11. Cuntoor, N.P., Chellappa, R.: Epitomic representation of human activities. In: Computer Vision and Pattern Recognition, pp. 1–8 (2007)

    Google Scholar 

  12. Denina, G., Bhanu, B., Nguyen, H., Ding, C., Kamal, A., Ravishanka, C., Roy-Chowdhury, A., Ivers, A., Varda, B.: Videoweb dataset for multi-camera activities and non-verbal communication. In: Distributed Video Sensor Networks. Springer, London (2010)

    Google Scholar 

  13. Ding, L., Yilmaz, A.: Learning relations among movie characters: A social network perspective. In: European Conference on Computer Vision, pp. 410–423 (2010)

    Google Scholar 

  14. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

    Chapter  Google Scholar 

  15. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: International Conference of Computer Vision, pp. 726–733 (2003)

    Chapter  Google Scholar 

  16. Forstner, W., Gulch, E.: A fast operator for detection and precise location of distinct points, corners and centres of circular features. In: ISPRS Intercommission Conference on Fast Processing of Photogrammetric Data, pp. 281–305 (1987)

    Google Scholar 

  17. Gaur, U.: Complex activity recognition using string of feature graphs. Master’s thesis, University of California, Riverside, CA, USA (2010)

    Google Scholar 

  18. Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer, Dordrecht (1995)

    Google Scholar 

  19. Harris, C., Stephens, M.: A combined corner and edge detector. In: Fourth Alvey Vision Conference, pp. 147–151 (1988)

    Google Scholar 

  20. Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 852–872 (2000)

    Article  Google Scholar 

  21. Wang, H., Niebles, J.C., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: British Machine Vision Conference (2006)

    Google Scholar 

  22. Jiang, F., Yuan, J., Tsaftaris, S.A., Katsaggelos, A.K.: Anomalous video event detection using spatiotemporal context. Comput. Vis. Image Underst. 115, 323–333 (2011)

    Article  Google Scholar 

  23. Joo, S.W., Chellappa, R.: Attribute grammar-based event recognition and anomaly detection. In: Computer Vision and Pattern Recognition Workshop, p. 107 (2006)

    Google Scholar 

  24. Kale, A., Sundaresan, A., Rajagopalan, A.N., Cuntoor, N.P., Roy-Chowdhury, A.K., Krueger, V., Chellappa, R.: Identification of humans using gait. IEEE Trans. Image Process. 13, 1163–1173 (2004)

    Article  Google Scholar 

  25. Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: International Conference on Computer Vision, vol. 1, pp. 166–173 (2005)

    Google Scholar 

  26. Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space–time neighborhood features for human action recognition. In: Computer Vision and Pattern Recognition, pp. 2046–2053 (2010)

    Google Scholar 

  27. Kuettel, D., Breitenstein, M.D., Gool, L.J.V., Ferrari, V.: What’s going on? discovering spatio-temporal dependencies in dynamic scenes. In: Computer Vision and Pattern Recognition, pp. 1951–1958 (2010)

    Google Scholar 

  28. Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: First International Workshop on Spatial Coherence for Visual Motion Analysis (2004)

    Google Scholar 

  29. Lee, M.W., Nevatia, R.: Human pose tracking in monocular sequence using multilevel structured models. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 27–38 (2009)

    Article  Google Scholar 

  30. Leordeanu, M., Hebert, M.: A spectral technique for correspondence problems using pairwise constraints. In: International Conference of Computer Vision, vol. 2, pp. 1482–1489 (October 2005)

    Google Scholar 

  31. Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30, 79–116 (1998)

    Article  Google Scholar 

  32. Liu, H., Feris, R.S., Krueger, V., Sun, M.T.: Unsupervised action classification using space–time link analysis. EURASIP J. Image Video Process. 2010, Article ID 626324 (2010)

    Article  Google Scholar 

  33. Liu, Z., Sarkar, S.: Improved gait recognition by gait dynamics normalization. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2006 (2006)

    Article  Google Scholar 

  34. Lowe, D.G.: Object recognition from local scale-invariant features. In: International Conference on Computer Vision, Washington, DC, USA, pp. 1150–1157 (1999)

    Chapter  Google Scholar 

  35. Makris, D., Ellis, T.: Learning semantic scene models from observing activity in visual surveillance. IEEE Trans. Syst. Man Cybern. 35(3), 397–408 (2005)

    Article  Google Scholar 

  36. Matikainen, P., Hebert, M., Sukthankar, R.: Representing pairwise spatial and temporal relations for action recognition. In: European Conference on Computer Vision (September 2010)

    Google Scholar 

  37. Medioni, G., Nevatia, R., Cohen, I.: Event detection and analysis from video streams. IEEE Trans. Pattern Anal. Mach. Intell. 23, 873–889 (1998)

    Article  Google Scholar 

  38. Mehran, R., Moore, B.E., Shah, M.: A streakline representation of flow in crowded scenes. In: European Conference on Computer Vision, pp. 439–452 (2010)

    Google Scholar 

  39. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005)

    Article  Google Scholar 

  40. Natarajan, P., Singh, V.K., Nevatia, R.: Learning 3d action models from a few 2d videos for view invariant action recognition. In: Computer Vision and Pattern Recognition, pp. 2006–2013 (2010)

    Google Scholar 

  41. North, B., Blake, A., Isard, M., Rittscher, J.: Learning and classification of complex dynamics. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1016–1034 (2000)

    Article  Google Scholar 

  42. Park, S.: A hierarchical Bayesian network for event recognition of human actions and interactions. Assoc. Comput. Mach. Multimedia Syst. J. 10, 164–179 (2004)

    Google Scholar 

  43. Park, S., Aggarwal, J.K.: Recognition of two-person interactions using a hierarchical Bayesian network. In: ACM SIGMM International Workshop on Video Surveillance, New York, NY, USA, pp. 65–76 (2003)

    Chapter  Google Scholar 

  44. Polana, R., Nelson, R.C.: Detection and recognition of periodic, nonrigid motion. Int. J. Comput. Vis. 23(3), 261–282 (1997)

    Article  Google Scholar 

  45. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  46. Ryoo, M.S., Yu, W.: One video is sufficient? human activity recognition using active video composition. In: IEEE Workshop on Motion and Video Computing (2011)

    Google Scholar 

  47. Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: Computer Vision and Pattern Recognition, pp. 1709–1718 (2006)

    Google Scholar 

  48. Ryoo, M.S., Aggarwal, J.K.: Semantic representation and recognition of continued and recursive human activities. Int. J. Comput. Vis. 82(1), 1–24 (2009)

    Article  Google Scholar 

  49. Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: International Conference on Computer Vision, pp. 1593–1600 (2009)

    Chapter  Google Scholar 

  50. Ryoo, M.S., Chen, C., Aggarwal, J.K., Roy-Chowdhury, A.: An overview of contest on semantic description of human activities (SDHA) 2010. In: International Conference on Pattern Recognition, Berlin, Heidelberg, pp. 270–285 (2010)

    Google Scholar 

  51. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  52. Savarese, S., DelPozo, A., Niebles, J.C., Fei-Fei, L.: Spatial-temporal correlations for unsupervised action classification. In: IEEE Workshop on Motion and Video Computing (2008)

    Google Scholar 

  53. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: International Conference on Pattern Recognition (2004)

    Google Scholar 

  54. Seo, H.J., Milanfar, P.: Detection of human actions from a single example. In: International Conference on Computer Vision (2009)

    Google Scholar 

  55. Sethi, R.J., Roy-Chowdhury, A.K., Ali, S.: Activity recognition by integrating the physics of motion with a neuromorphic model of perception. In: IEEE Workshop on Motion and Video Computing (2009)

    Google Scholar 

  56. Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  57. Turaga, P.K., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: A survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473–1488 (2008)

    Article  Google Scholar 

  58. Vaswani, N., Roy-Chowdhury, A., Chellappa, R.: “Shape activity”: A continuous state HMM for moving/deforming shapes with application to abnormal activity detection. IEEE Trans. Image Process. 14, 1603–1616 (2005)

    Article  Google Scholar 

  59. Wersborg, I.S., Bautze, T., Born, F., Diepold, K.: A cognitive approach for a robotic welding system that can learn how to weld from acoustic data. In: Computational Intelligence in Robotics and Automation, Piscataway, NJ, USA, pp. 108–113 (2009)

    Google Scholar 

  60. Yilmaz, A., Shah, M.: Actions sketch: A novel action representation. In: Computer Vision and Pattern Recognition, vol. 1, pp. 984–989 (2005)

    Google Scholar 

  61. Young, R.A., Lesperance, R.M.: The Gaussian derivative model for spatial-temporal vision. Spat. Vis. 2001, 3–4 (2001)

    Google Scholar 

  62. Zeng, Z., Qiang, J.: Knowledge based activity recognition with dynamic Bayesian network. In: European Conference in Computer Vision, Crete, Greece (2010)

    Google Scholar 

  63. Zhang, Z., Huang, K.Q., Tan, T.N.: Complex activity representation and recognition by extended stochastic grammar. In: Asian Conference on Computer Vision, pp. 150–159 (2006)

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by the DARPA VIRAT program and NSF award IIS-0905671.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nandita M. Nayak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag London Limited

About this chapter

Cite this chapter

Nayak, N.M., Sethi, R.J., Song, B., Roy-Chowdhury, A.K. (2011). Modeling and Recognition of Complex Human Activities. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds) Visual Analysis of Humans. Springer, London. https://doi.org/10.1007/978-0-85729-997-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-0-85729-997-0_15

  • Publisher Name: Springer, London

  • Print ISBN: 978-0-85729-996-3

  • Online ISBN: 978-0-85729-997-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics