Journal of Central South University

, Volume 25, Issue 2, pp 304–314 | Cite as

Human interaction recognition based on sparse representation of feature covariance matrices

  • Jun Wang (王军)
  • Si-chao Zhou (周思超)
  • Li-min Xia (夏利民)


A new method for interaction recognition based on sparse representation of feature covariance matrices was presented. Firstly, the dense trajectories (DT) extracted from the video were clustered into different groups to eliminate the irrelevant trajectories, which could greatly reduce the noise influence on feature extraction. Then, the trajectory tunnels were characterized by means of feature covariance matrices. In this way, the discriminative descriptors could be extracted, which was also an effective solution to the problem that the description of the feature second-order statistics is insufficient. After that, an over-complete dictionary was learned with the descriptors and all the descriptors were encoded using sparse coding (SC). Classification was achieved using multiple instance learning (MIL), which was more suitable for complex environments. The proposed method was tested and evaluated on the WEB Interaction dataset and the UT interaction dataset. The experimental results demonstrated the superior efficiency.

Key words

interaction recognition dense trajectory sparse coding MIL 



人体行为识别是计算机视觉和模式识别领域的一个重要研究方向, 在监控系统、 人机交互、 人工智能等方面具有广阔的应用前景。 本文提出了一种基于协方差矩阵稀疏表示的交互行为识别方法。 首先, 对视频中提取的稠密轨迹进行聚类形成不同的轨迹群组, 以消除无关轨迹、 减少噪声对特征提取的影响。 然后通过协方差矩阵对轨迹通道进行特征描述, 得到有较强区分度的轨迹通道描述符, 该描述符维度更低, 并且能够有效解决以往描述符对特征二阶统计量描述不足的问题; 利用稀疏表示对特征描述符进行稀疏编码。 最后, 采用多示例学习进行行为分类。 在 UT-Interaction 数据集与 WEB-Interaction 数据集上的实验证明了本文方法的有效性。


交互识别 稠密轨迹 稀疏编码 多示例学习 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    KONG Yu, FU Yun. Modeling supporting regions for close human interaction recognition [C]//Computer Vision-ECCV 2014 Workshops. Zurich: Springer International Publishing, 2014: 29–44.Google Scholar
  2. [2]
    KARUNGARU S, KENJI T, FUKUMI M. Human action recognition using normalized cone histogram features [C]//Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP), 2014 IEEE Symposium on. Orkand, FL: IEEE, 2014: 1–5.Google Scholar
  3. [3]
    HOAI M, ZISSERMAN A. Talking heads: detecting humans and recognizing their interactions [C]//Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. Columbus, Dhio: IEEE, 2014: 875–882.CrossRefGoogle Scholar
  4. [4]
    YANG Lu-yu, GAO Cheng-qiang, MENG De-yu, LU Jiang. A novel group-sparsity-optimization-based feature selection model for complex interaction recognition [M]//Computer Vision–ACCV 2014. Singapore: Springer International Publishing, 2015: 508–521.Google Scholar
  5. [5]
    ZHANG J, LIN H, NIE W Z, CHAISORN L, WONG Y K, KANKANHALLI M S. Human action recognition bases on local action attributes [J]. Journal of Electrical Engineering & Technology, 2015, 10(3): 1264–1274.CrossRefGoogle Scholar
  6. [6]
    NOWAK E, JURIE F, TRIGGS B. Sampling strategies for bag-of-features image classification [M]. Computer vision–ECCV 2006. Springer Berlin Heidelberg, 2006: 490–503.CrossRefGoogle Scholar
  7. [7]
    WANG Heng, ULLAH M M, KLÄSER A, et al. Evaluation of local spatio-temporal features for action recognition [C]//British Machine Vision Conference. London: Springer, 2009: 1–10.Google Scholar
  8. [8]
    WANG Heng, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition [J]. International Journal of Computer Vision, 2013, 103(1): 60–79.MathSciNetCrossRefGoogle Scholar
  9. [9]
    HAO Zong-bo, ZHANG Qian-ni, EZQUIERDO E, et al. Human action recognition by fast dense trajectories [C]//Proceedings of the 21st ACM international conference on Multimedia. Barcelona: ACM, 2013: 377–380.CrossRefGoogle Scholar
  10. [10]
    BEAUDRY C, PETERI R, MASCARILLA L. Action recognition in videos using frequency analysis of critical point trajectories [C]//2014 IEEE International Conference on Image Processing (ICIP). Paris: IEEE, 2014: 1445–1449.CrossRefGoogle Scholar
  11. [11]
    SEO J J, BADDAR W J, KIM D H, et al. Human action recognition using time-invariant key-trajectories describing spatio-temporal salient motion [C]//IEEE International Conference on Image Processing. Quebec City: IEEE, 2015: 586–590.Google Scholar
  12. [12]
    NI Bing-bing, MOULIN P, YANG Xiao-kai, et al. Motion Part Regularization: Improving action recognition via trajectory group selection [C]//Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 3698–3706.Google Scholar
  13. [13]
    ZHANG Bo, ROTA P, CONCI N, et al. Human interaction recognition in the wild: Analyzing trajectory clustering from multiple-instance-learning perspective [C]//IEEE International Conference on Multimedia and Expo. Torino: IEEE, 2015: 1–6.Google Scholar
  14. [14]
    IOSIFIDIS A, TEFAS A, PITAS I. Merging linear discriminant analysis with Bag of Words model for human action recognition [C]//IEEE International Conference on Image Processing. Quebec City: IEEE, 2015: 832–836.Google Scholar
  15. [15]
    ELGUEBALY T, BOUGUILA N. Improving codebook generation for action recognition using a mixture of Asymmetric Gaussians [C]//Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP), 2014 IEEE Symposium on. Orbando, FL: IEEE, 2014: 1–7.Google Scholar
  16. [16]
    WANG Yang-yang, LI Yi-bo, JI Xiao-fei. Human action recognition based on global gist feature and local patch coding [J]. Management Review, 2015, 21(11): 38–43.Google Scholar
  17. [17]
    GUO Kai, ISHWAR P, KONRAD J. Action recognition from video using feature covariance matrices [J]. IEEE Transactions on Image Processing, 2013, 22(6): 2479–2494.MathSciNetCrossRefMATHGoogle Scholar
  18. [18]
    BROX T, MALIK J. Object segmentation by long term analysis of point trajectories [C]//Proc European Conference on Computer Vision. Crete, Greece: Springer, 2010: 282–295.Google Scholar
  19. [19]
    SENER F, IKIZLER-CINBIS N. Two-person interaction recognition via spatial multiple instance embedding [J]. Journal of Visual Communication & Image Representation, 2015, 32: 63–73.CrossRefGoogle Scholar
  20. [20]
    CHEN Yi-xin, BI Jin-bo, WANG J Z. MILES: Multiple-instance learning via embedded instance selection [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2006, 28(12): 1931–1947.CrossRefGoogle Scholar
  21. [21]
    GAO Cheng-qiang, YANG Lu-yu, DU Yin-he, et al. From constrained to unconstrained datasets: An evaluation of local action descriptors and fusion strategies for interaction recognition [J]. World Wide Web-internet & Web Information Systems, 2015, 19(2): 1–12.Google Scholar
  22. [22]
    XIA Li-min, SHI Xiao-ting, TU Hong-bin. An approach for complex activity recognition by key frames [J]. Journal of Central South University, 2015, 22(9): 3450–3457.CrossRefGoogle Scholar

Copyright information

© Central South University Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Jun Wang (王军)
    • 1
  • Si-chao Zhou (周思超)
    • 1
  • Li-min Xia (夏利民)
    • 1
  1. 1.School of Information Science and EngineeringCentral South UniversityChangshaChina

Personalised recommendations