Bags of Graphs for Human Action Recognition

  • Xavier CortésEmail author
  • Donatello Conte
  • Hubert Cardot
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11004)


Bags of visual words are a well known approach for images classification that also has been used in human action recognition. This model proposes to represent images or videos in a structure referred to as bag of visual words before classifying. The process of representing a video in a bag of visual words is known as the encoding process and is based on mapping the interest points detected in the scene into the new structure by means of a codebook. In this paper we propose to improve the representativeness of this model including the structural relations between the interest points using graph sequences. The proposed model achieves very competitive results for human action recognition and could also be applied to solve graph sequences classification problems.



This work is part of the LUMINEUX project supported by a Region Centre-Val de Loire (France). We gratefully acknowledge Region Centre-Val de Loire for its support.


  1. 1.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague, vol. 1, no. 1–22, pp. 1–2 (2004)Google Scholar
  2. 2.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1794–1801. IEEE (2009)Google Scholar
  3. 3.
    Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality constrained linear coding for image classification. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3360–3367. IEEE (2010)Google Scholar
  4. 4.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)CrossRefGoogle Scholar
  5. 5.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  6. 6.
    Wang, X., Wang, L., Qiao, Y.: A comparative study of encoding, pooling and normalization methods for action recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7726, pp. 572–585. Springer, Heidelberg (2013). Scholar
  7. 7.
    Elshourbagy, M., Hemayed, E., Fayek, M.: Enhanced bag of words using multilevel k-means for human activity recognition. Egypt. Inform. J. 17(2), 227–237 (2016)CrossRefGoogle Scholar
  8. 8.
    Mahé, P., Vert, J.-P.: Graph kernels based on tree patterns for molecules. Mach. Learn. 75(1), 3–35 (2009)CrossRefGoogle Scholar
  9. 9.
    Qi, X., Wu, Q., Zhang, Y., Fuller, E., Zhang, C.-Q.: A novel model for DNA sequence similarity analysis based on graph theory. Evol. Bioinform. 7, 149–158 (2011)CrossRefGoogle Scholar
  10. 10.
    Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recogn. Artif. Intell. 18(3), 265–298 (2004)CrossRefGoogle Scholar
  11. 11.
    Li, T., Dong, H., Shi, Y., Dehmer, M.: A comparative analysis of new graph distance measures and graph edit distance. Inf. Sci. 403–404, 15–21 (2017)CrossRefGoogle Scholar
  12. 12.
    Solé-Ribalta, A., Cortés, X., Serratosa, F.: A Comparison between structural and embedding methods for graph classification. SSPR/SPR 2012, 234–242 (2012)Google Scholar
  13. 13.
    Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. 13, 353–362 (1983)CrossRefGoogle Scholar
  14. 14.
    Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recogn. Lett. 1(4), 245–253 (1983)CrossRefGoogle Scholar
  15. 15.
    Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27(4), 950–959 (2009)CrossRefGoogle Scholar
  16. 16.
    Serratosa, F.: Speeding up fast bipartite graph matching through a new cost matrix. Int. J. Pattern Recogn. Artif. Intell. 29(2), 1550010 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Fischer, A., Suen, C.Y., Frinken, V., Riesen, K., Bunke, H.: Approximation of graph edit distance based on Hausdorff matching. Pattern Recogn. 48(2), 331–343 (2015)CrossRefGoogle Scholar
  18. 18.
    Harris, Z.S.: Distributional structure. Word 10(2–3), 146–162 (1954)CrossRefGoogle Scholar
  19. 19.
    Silva, F.B., Werneck, R.d.O., Goldenstein, S., Tabbone, S., Torres, R.d.S.: Graph-based bag-of-words for classification. Pattern Recogn. 74, 266–285 (2018)CrossRefGoogle Scholar
  20. 20.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)CrossRefGoogle Scholar
  21. 21.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, Manchester, UK, vol. 15, no. 50, pp. 147–151 (1988)Google Scholar
  22. 22.
    Andrew, A.M.: Another efficient algorithm for convex hulls in two dimensions. Inf. Process. Lett. 9(5), 216–219 (1979)CrossRefGoogle Scholar
  23. 23.
    Pers, J., Sulic, V., Kristan, M., Perse, M., Polanec, K., Kovacic, S.: Histograms of optical flow for efficient representation of body motion. Pattern Recogn. Lett. 31(11), 1369–1376 (2010)CrossRefGoogle Scholar
  24. 24.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  25. 25.
    Galluccio, L., Michel, O.J.J., Comon, P., Hero III, A.O.: Graph based k-means clustering. Sig. Process. 92(9), 1970–1984 (2012)CrossRefGoogle Scholar
  26. 26.
    Ferrer, M., Valveny, E., Serratosa, F., Bardají, I., Bunke, H.: Graph-based k-means clustering: a comparison of the set median versus the generalized median graph. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 342–350. Springer, Heidelberg (2009). Scholar
  27. 27.
    Bunke, H., Riesen, K.: Improving vector space embedding of graphs through feature selection algorithms. Pattern Recogn. 44(9), 1928–1940 (2011)CrossRefGoogle Scholar
  28. 28.
    Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: 2011 IEEE International Conference on IEEE Computer Vision (ICCV), pp. 2486–2493 (2011)Google Scholar
  29. 29.
    Campbell, C., Ying, Y.: Learning with support vector machines. Synth. Lect. Artif. Intell. Mach. Learn. 5(1), 1–95 (2011)CrossRefGoogle Scholar
  30. 30.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)Google Scholar
  31. 31.
    Serratosa, F., Cortés, X.: Graph edit distance: moving from global to local structure to solve the graph-matching problem. Pattern Recogn. Lett. 65, 204–210 (2015)CrossRefGoogle Scholar
  32. 32.
    Bilinski, P., Bremond, F.: Statistics of pairwise co-occurring local spatio-temporal features for human action recognition. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 311–320. Springer, Heidelberg (2012). Scholar
  33. 33.
    Bregonzio, M., Xiang, T., Gong, S.: Fusing appearance and distribution information of interest points for action recognition. Pattern Recogn. 45(3), 1220–1234 (2012)CrossRefGoogle Scholar
  34. 34.
    Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC 2008-19th British Machine Vision Conference, pp. 275:1–10. British Machine Vision Association (2008)Google Scholar
  35. 35.
    Zhang, Z., Hu, Y., Chan, S., Chia, L.-T.: Motion context: a new representation for human action recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 817–829. Springer, Heidelberg (2008). Scholar
  36. 36.
    Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xavier Cortés
    • 1
    Email author
  • Donatello Conte
    • 1
  • Hubert Cardot
    • 1
  1. 1.LiFAT, Université de ToursToursFrance

Personalised recommendations