Skip to main content

Comprehensive Representation and Efficient Extraction of Spatial Information for Human Activity Recognition from Video Data

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 460))

Abstract

Of late, human activity recognition (HAR) in video has generated much interest. A fundamental step is to develop a computational representation of interactions. Human body is often abstracted using minimum bounding rectangles (MBRs) and approximated as a set of MBRs corresponding to different body parts. Such approximations assume each MBR as an independent entity. This defeats the idea that these are parts of the whole body. A representation schema for interaction between entities, each of which is considered as set of related rectangles or what is referred to as extended objects holds promise. We propose an efficient representation schema for extended objects together with a simple recursive algorithm to extract spatial information. We evaluate our approach and demonstrate that, for HAR, the spatial information thus extracted leads to better models compared to CORE9 [1] a compact and comprehensive representation schema for video understanding.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.visint.org.

  2. 2.

    We use I-frames obtained using the tool ffmpeg as keyframes, http://www.ffmpeg.org.

References

  1. Cohn, A.G., Renz, J., Sridhar, M.: Thinking inside the box: A comprehensive spatial representation for video analysis. In: Proc. 13th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR2012). pp. 588–592. AAAI Press (2012)

    Google Scholar 

  2. Aggarwal, J., Ryoo, M.: Human activity analysis: A review. ACM Computing Surveys 43(3), 16:1–16:43 (Apr 2011)

    Google Scholar 

  3. Dubba, K.S.R., Bhatt, M., Dylla, F., Hogg, D.C., Cohn, A.G.: Interleaved inductive-abductive reasoning for learning complex event models. In: ILP. Lecture Notes in Computer Science, vol. 7207, pp. 113–129. Springer (2012)

    Google Scholar 

  4. Kusumam, K.: Relational Learning using body parts for Human Activity Recognition in Videos. Master’s thesis, University of Leeds (2012)

    Google Scholar 

  5. Schneider, M., Behr, T.: Topological relationships between complex spatial objects. ACM Trans. Database Syst. 31(1), 39–81 (2006)

    Article  Google Scholar 

  6. Skiadopoulos, S., Koubarakis, M.: On the consistency of cardinal directions constraints. Artificial Intelligence 163, 91 – 135 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen, L., Nugent, C., Mulvenna, M., Finlay, D., Hong, X.: Semantic smart homes: Towards knowledge rich assisted living environments. In: Intelligent Patient Management, vol. 189, pp. 279–296. Springer Berlin Heidelberg (2009)

    Google Scholar 

  8. Cohn, A.G., Hazarika, S.M.: Qualitative spatial representation and reasoning: An overview. Fundam. Inform. 46(1-2), 1–29 (2001)

    MathSciNet  MATH  Google Scholar 

  9. Randell, D.A., Cui, Z., Cohn, A.G.: A spatial logic based on regions and connection. In: Proc. of 3rd Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’92). pp. 165–176. Morgan Kauffman (1992)

    Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    Google Scholar 

  11. al Harbi, N., Gotoh, Y.: Describing spatio-temporal relations between object volumes in video streams. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  12. Sokeh, H.S., Gould, S., J, J.: Efficient extraction and representation of spatial information from video data. In: Proc. of the 23rd Int. Joint Conf. on Artificial Intelligence (IJCAI’13). pp. 1076–1082. AAAI Press/IJCAI (2013)

    Google Scholar 

  13. Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR). vol. 2, pp. 524–531 (2005)

    Google Scholar 

  14. Phan, X.H., Nguyen, C.T.: GibbsLDA++: A C/C++ implementation of latent Dirichlet allocation (LDA) (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shobhanjana Kalita .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this paper

Cite this paper

Kalita, S., Karmakar, A., Hazarika, S.M. (2017). Comprehensive Representation and Efficient Extraction of Spatial Information for Human Activity Recognition from Video Data. In: Raman, B., Kumar, S., Roy, P., Sen, D. (eds) Proceedings of International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 460. Springer, Singapore. https://doi.org/10.1007/978-981-10-2107-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2107-7_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2106-0

  • Online ISBN: 978-981-10-2107-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics