Multimedia Tools and Applications

, Volume 78, Issue 10, pp 13313–13329 | Cite as

Effective human action recognition by combining manifold regularization and pairwise constraints

  • Xueqi Ma
  • Dapeng Tao
  • Weifeng LiuEmail author


The ever-growing popularity of mobile networks and electronics has prompted intensive research on multimedia data (e.g. text, image, video, audio, etc.) management. This leads to the researches of semi-supervised learning that can incorporate a small number of labeled and a large number of unlabeled data by exploiting the local structure of data distribution. Manifold regularization and pairwise constraints are representative semi-supervised learning methods. In this paper, we introduce a novel local structure preserving approach by considering both manifold regularization and pairwise constraints. Specifically, we construct a new graph Laplacian that takes advantage of pairwise constraints compared with the traditional Laplacian. The proposed graph Laplacian can better preserve the local geometry of data distribution and achieve the effective recognition. Upon this, we build the graph regularized classifiers including support vector machines and kernel least squares as special cases for action recognition. Experimental results on a multimodal human action database (CAS-YNU-MHAD) show that our proposed algorithms outperform the general algorithms.


Action recognition Local structure preserving Manifold regularization Pairwise constraints 



This paper is partly supported by the National Natural Science Foundation of China (Grant No. 61671480), the Fundamental Research Funds for the Central Universities, China University of Petroleum (East China) (Grant No. 14CX02203A, YCX2017059).


  1. 1.
    Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2012) Effective codebooks for human action representation and classification in unconstrained videos. IEEE Trans Multimedia 14(4):1234–1245Google Scholar
  2. 2.
    Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6(6):937–965MathSciNetzbMATHGoogle Scholar
  3. 3.
    Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Int Conf Neural Inf Proces Syst: Nat and Synth MIT Press 14(6):585–591Google Scholar
  4. 4.
    Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(1):2399–2434MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bernstein M, De Silva V, Langford JC, Tenenbaum JB (2001) Graph approximations to geodesics on embedded manifolds. Tech Rep, Standard University 24(9):153–158Google Scholar
  6. 6.
    Cevikalp H, Verbeek J, Jurie F, Klaser A (2008) Semi-supervised dimensionality reduction using pairwise equivalence constraints. Int Conf Comput Vis Theory Appl 1:489–496Google Scholar
  7. 7.
    Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, CambridgeGoogle Scholar
  8. 8.
    Chen C, Jafari R, Kehtarnavaz N (2015) Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans Hum-Mach Syst 45(1):51–61Google Scholar
  9. 9.
    Coyte JL, Stirling D, Haiping D, Ros M (2016) Seated whole-body vibration analysis, technologies, and modeling: a survey. IEEE Trans Syst Man Cybern Syst 46(6):725–739Google Scholar
  10. 10.
    Ding S, Jia H, Zhang L, Jin F (2014) Research of semi-supervised spectral clustering algorithm based on pairwise constraints. Neural Comput & Applic 24(1):211–219Google Scholar
  11. 11.
    Donoho DL, Grimes C (2003) Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. Natl Acad Sci USA 100(10):5591–5596MathSciNetzbMATHGoogle Scholar
  12. 12.
    Gong C, Liu T, Tao D, Keren F, Enmei T, Yang J (2015) Deformed graph Laplacian for semisupervised learning. IEEE Trans Neural Netw Learn Syst 26(10):2261–2274MathSciNetGoogle Scholar
  13. 13.
    Guo Y, Tao D, Liu W, Cheng J (2017) Multiview Cauchy estimator feature embedding for depth and inertial sensor-based human action recognition. IEEE Trans Syst Man Cybern Syst 47(4):617–627Google Scholar
  14. 14.
    Hong C, Yu J, Tao D, Wang M (2015) Image-based three-dimensional human pose recovery by Multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751Google Scholar
  15. 15.
    Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670MathSciNetzbMATHGoogle Scholar
  16. 16.
    Hong C, Yu J, You J, Chen X, Tao D (2015) Multi-view ensemble manifold regularization for 3D object recognition. Inf Sci 320:395–405MathSciNetGoogle Scholar
  17. 17.
    Huang K, Wang C, Tao D (2015) High-order topology modeling of visual words for image classification. IEEE Trans Image Process 24(11):3598–3608MathSciNetzbMATHGoogle Scholar
  18. 18.
    Jalal A, Uddin MZ, Kim T-S (2012) Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home. IEEE Trans Consum Electron 58(3):863–871Google Scholar
  19. 19.
    Ji X, Zhaojie J, Wang C, Wang C (2015) Multi-view transition HMMs based view-invariant human action recognition method. Multimed Tools Appl 75(19):1–18Google Scholar
  20. 20.
    Jiang J, Hu R, Wang Z, Cai Z (2016) CDMMA: coupled discriminant multi-manifold analysis for matching low-resolution face images. Signal Process 124:162–172Google Scholar
  21. 21.
    Khan AM, Lee Y-K, Lee SY, Kim T-S (2010) A triaxial accelerometer-based physical-activity recognition via augmented-signal features and a hierarchical recognizer. IEEE Trans Inf Technol Biomed 14(5):1166–1172Google Scholar
  22. 22.
    Li L, Dai S (2017) Action recognition with spatio-temporal augmented descriptor and fusion method. Multimed Tools Appl 76(12):13953–13969Google Scholar
  23. 23.
    Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461Google Scholar
  24. 24.
    Liu M, Zhang D (2016) Pairwise constraint-guided sparse learning for feature selection. IEEE Trans Cybern 46(1):298–310MathSciNetGoogle Scholar
  25. 25.
    Liu W, Liu H, Tao D, Wang Y, Lu K (2014) Multiview hessian regularized logistic regression for action recognition. Signal Process 110:101–107Google Scholar
  26. 26.
    Liu A, Yuting S, Jia P, Gao Z, Hao T, Yang Z (2015) Multiple/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208Google Scholar
  27. 27.
    Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124Google Scholar
  28. 28.
    Luo Y, Wen Y, Tao D, Gui J, Xu C (2016) Large margin multi-modal multi-task feature extraction for image classification. IEEE Trans Image Process 25(1):414–427MathSciNetzbMATHGoogle Scholar
  29. 29.
    Luo Y, Wen Y, Tao D (2016) On Combining Side Information and Unlabeled Data for Heterogeneous Multi-Task Metric Learning, International Joint Conference on Artificial Intelligence , pp. 1809–1815
  30. 30.
    Mignon A, Jurie F (2012) PCCA: A new approach for distance learning from sparse pairwise constraints, IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, pp. 2666–2672Google Scholar
  31. 31.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326Google Scholar
  32. 32.
    Sang J, Deng Z, Lu D, Xu C (2015) Cross-OSN user modeling by homogeneous behavior quantification and local social regularization. IEEE Trans Multimed 17(12):2259–2270Google Scholar
  33. 33.
    Schiller H, Chaudhuri BB (1990) Efficient coding of side information in a low bit rate hybrid image coder. Signal Process 19(1):61–73Google Scholar
  34. 34.
    Seeger M (2000) Learning with labeled and unlabeled data. Technical report. University of Edinburgh, EdinburghGoogle Scholar
  35. 35.
    Tentori M, Favela J (2008) Activity-aware computing for healthcare. IEEE Pervasive Comput 7(2):51–57Google Scholar
  36. 36.
    Tosato D, Spera M, Cristani M, Murino V (2013) Characterizing humans on riemannian manifolds. IEEE Trans Pattern Anal Mach Intell 35(8):1972–1984Google Scholar
  37. 37.
    Wagstaff K, Cardie C (2000) Clustering with instance-level constraints, International Conference on Machine Learning DBLP, pp. 1103–1110Google Scholar
  38. 38.
    Wang M, Ni B, Hua X-S, Chua T-S, (2012) Assistive tagging: a survey of multimedia tagging with human-computer joint exploration. ACM Comput Surv (CSUR) 44(4):25Google Scholar
  39. 39.
    Xia L, Aggarwa JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera, IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, pp. 2834–2841Google Scholar
  40. 40.
    Yan M, Sang J, Xu C, Shamim Hossain M (2015) YouTube video promotion by cross-network association: @Britney to advertise Gangnam style. IEEE Trans Multimed 17(8):1248–1261Google Scholar
  41. 41.
    Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients, Proceedings of the ACM international conference on Multimedia, pp. 1057–1060Google Scholar
  42. 42.
    Yu J, Rui Y, Tang Y, Tao D (2014) High-order distance based Multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442Google Scholar
  43. 43.
    Zhang D, Zhou Z-H, Chen S, (2007) Semi-supervised dimensionality reduction, Siam International Conference on Data Mining DBLP, 22, pp. 11–393Google Scholar
  44. 44.
    Zhang D, Chen S, Zhou Z-H (2008) Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recogn 41(5):1440–1451zbMATHGoogle Scholar
  45. 45.
    Zhang T, Liu S, Xu C, Lu H (2013) Mining semantic context information for intelligent video surveillance of traffic scenes. IEEE Trans Ind Inf 9(1):149–160Google Scholar
  46. 46.
    Zhang J, Han Y, Tang J, Hu Q, Jiang J (2017) Semi-supervised image-to-video adaptation for video action recognition. IEEE Trans Cybern 47(4):960–973Google Scholar
  47. 47.
    Zheng J, Jiang Z, Chellappa R (2016) Cross-view action recognition via transferable dictionary learning. IEEE Trans Image Process 25(6):2542–2556MathSciNetzbMATHGoogle Scholar
  48. 48.
    Zhenyong F, Lu Z, Ip HHS, Lu H, Wang Y (2015) Local similarity learning for pairwise constraint propagation. Multimed Tools Appl 74(11):3739–3758Google Scholar
  49. 49.
    Zhu X (2008) Semi-supervised learning literature survey. Comput Sci 37(1):63–77MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.China University of Petroleum (East China)QingdaoChina
  2. 2.Yunnan UniversityKunmingChina

Personalised recommendations