Skip to main content

Recognizing Realistic Action Using Contextual Feature Group

  • Conference paper
  • First Online:
Book cover The Era of Interactive Media

Abstract

Although the spatial–temporal local features and the bag of visual words model (BoW) have achieved a great success and a wide adoption in action classification, there still remain some problems. First, the local features extracted are not stable enough, which may be aroused by the background action or camera shake. Second, using local features alone ignores the spatial–temporal relationships of these features, which may decrease the classification accuracy. Finally, the distance mainly used in the clustering algorithm of the BoW model did not take the semantic context into consideration. Based on these problems, we proposed a systematic framework for recognizing realistic actions, with considering the spatial–temporal relationship between the pruned local features and utilizing a new discriminate group distance to incorporate the semantic context information. The Support Vector Machine (SVM) with multiple kernels is employed to make use of both the local feature and feature group information. The proposed method is evaluated on KTH dataset and a relatively realistic dataset YouTube. Experimental results validate our approach and the recognition performance is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A. Bobick and J. Davis. The representation and recognition of action using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3):257–267, 2001.

    Article  Google Scholar 

  2. Ke Y, Sukthankar R and Hebert. Spatial–temporal shape and flow correlation for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2007.

    Google Scholar 

  3. I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003.

    Google Scholar 

  4. P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, 2005.

    Google Scholar 

  5. L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In CVPR, 2005.

    Google Scholar 

  6. P. Scovanner, S. Ali, and M. Shah. A 3-dimensional SIFT descriptor and its application to action recognition. In ACM International Conference on Multimedia, 2007.

    Google Scholar 

  7. H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologically inspired system for action recognition. In IEEE International Conference on Computer Vision, 2007.

    Google Scholar 

  8. Jinen Liu, Jiebo luo, Mubarak Shah. Rcognizing realisitic actions from videos “in the wild”. In IEEE Conference on Computer Vision and Pattern Recognition, 2009

    Google Scholar 

  9. RYoo, M.S and AGGARWAL, J.K. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In IEEE International Conference on Computer Vision, 2009.

    Google Scholar 

  10. Qiong Hu, Lei Qin, Qingming Huang, Shuqiang Jiang, and Qi Tian. Action Recognition Using Spatial–Temporal Context,” Analysis and Search. 20th International Conference on Pattern Recognition, August 23–26, 2010, Istanbul, Turkey.

    Google Scholar 

  11. Shiliang Zhang, Qingming Huang, Gang Hua, Shuqiang Jiang, Wen Gao, and Qi Tian, Building Contextual Visual Vocabulary for Large-scale Image Applications, in Proceedings of ACM Multimedia Conference, ACM MM (Full Paper), Florence, Italy, pp.501–510, Oct.25–29, 2010.

    Google Scholar 

  12. Y Wang, G Mori, “Human Action Recognition by Semi-Latent Topic Models,” PAMI, 2009.

    Google Scholar 

  13. S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. In ICML, 2010. 3362, 3366

    Google Scholar 

  14. J. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categpries using spatial-temporal words. IJCV, 2008. 3366.

    Google Scholar 

  15. G. Taylor, R. Fergus, Y. Lecun, and C. Bregler. Convolutional learning of spatio-temporal features. In ECCV, 2010. 3361, 3362, 3366, 3367.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Basic Research Program of China (973 Program): 2009CB320906, in part by National Natural Science Foundation of China: 61025011, 61035001 and 61003165, and in part by Beijing Natural Science Foundation: 4111003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingming Huang .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this paper

Cite this paper

Ye, Y., Qin, L., Cheng, Z., Huang, Q. (2013). Recognizing Realistic Action Using Contextual Feature Group. In: The Era of Interactive Media. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3501-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3501-3_38

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-3500-6

  • Online ISBN: 978-1-4614-3501-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics