Abstract
Although the spatial–temporal local features and the bag of visual words model (BoW) have achieved a great success and a wide adoption in action classification, there still remain some problems. First, the local features extracted are not stable enough, which may be aroused by the background action or camera shake. Second, using local features alone ignores the spatial–temporal relationships of these features, which may decrease the classification accuracy. Finally, the distance mainly used in the clustering algorithm of the BoW model did not take the semantic context into consideration. Based on these problems, we proposed a systematic framework for recognizing realistic actions, with considering the spatial–temporal relationship between the pruned local features and utilizing a new discriminate group distance to incorporate the semantic context information. The Support Vector Machine (SVM) with multiple kernels is employed to make use of both the local feature and feature group information. The proposed method is evaluated on KTH dataset and a relatively realistic dataset YouTube. Experimental results validate our approach and the recognition performance is promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Bobick and J. Davis. The representation and recognition of action using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3):257–267, 2001.
Ke Y, Sukthankar R and Hebert. Spatial–temporal shape and flow correlation for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2007.
I. Laptev and T. Lindeberg. Space-time interest points. In ICCV, 2003.
P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, 2005.
L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In CVPR, 2005.
P. Scovanner, S. Ali, and M. Shah. A 3-dimensional SIFT descriptor and its application to action recognition. In ACM International Conference on Multimedia, 2007.
H. Jhuang, T. Serre, L. Wolf, and T. Poggio. A biologically inspired system for action recognition. In IEEE International Conference on Computer Vision, 2007.
Jinen Liu, Jiebo luo, Mubarak Shah. Rcognizing realisitic actions from videos “in the wild”. In IEEE Conference on Computer Vision and Pattern Recognition, 2009
RYoo, M.S and AGGARWAL, J.K. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In IEEE International Conference on Computer Vision, 2009.
Qiong Hu, Lei Qin, Qingming Huang, Shuqiang Jiang, and Qi Tian. Action Recognition Using Spatial–Temporal Context,” Analysis and Search. 20th International Conference on Pattern Recognition, August 23–26, 2010, Istanbul, Turkey.
Shiliang Zhang, Qingming Huang, Gang Hua, Shuqiang Jiang, Wen Gao, and Qi Tian, Building Contextual Visual Vocabulary for Large-scale Image Applications, in Proceedings of ACM Multimedia Conference, ACM MM (Full Paper), Florence, Italy, pp.501–510, Oct.25–29, 2010.
Y Wang, G Mori, “Human Action Recognition by Semi-Latent Topic Models,” PAMI, 2009.
S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. In ICML, 2010. 3362, 3366
J. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categpries using spatial-temporal words. IJCV, 2008. 3366.
G. Taylor, R. Fergus, Y. Lecun, and C. Bregler. Convolutional learning of spatio-temporal features. In ECCV, 2010. 3361, 3362, 3366, 3367.
Acknowledgements
This work was supported in part by National Basic Research Program of China (973 Program): 2009CB320906, in part by National Natural Science Foundation of China: 61025011, 61035001 and 61003165, and in part by Beijing Natural Science Foundation: 4111003.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this paper
Cite this paper
Ye, Y., Qin, L., Cheng, Z., Huang, Q. (2013). Recognizing Realistic Action Using Contextual Feature Group. In: The Era of Interactive Media. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3501-3_38
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3501-3_38
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3500-6
Online ISBN: 978-1-4614-3501-3
eBook Packages: Computer ScienceComputer Science (R0)