Action Recognition Using Hierarchical Independent Subspace Analysis with Trajectory

  • Vinh D. LuongEmail author
  • Lipo Wang
  • Gaoxi Xiao
Conference paper
Part of the Proceedings in Adaptation, Learning and Optimization book series (PALO, volume 1)


Action recognition in videos is an important and challenging problem in computer vision. One of the most crucial aspects of a successful action recognition system is its feature extraction component. Stacked, convolutional Independent Subspace Analysis (SC-ISA), has the best result among unsupervised learning algorithms for action recognition in Hollywood 2 (53.3%) and Youtube (75.8%). However, its performance still lags behind the current state-of-the-art, which uses computer vision-based feature engineering extraction techniques, by about 10%. In this paper, we improve SC-ISA’s results by incorporating motion information into SC-ISA. By extracting blocks following motion trajectories in videos, we are able to reduce noise and increase the number of training samples without degrading the network’s performance when training and testing SC-ISA. We increase SC-ISA’s result by about 1%.


Independent Component Analysis Action Recognition Independent Component Analysis Convolutional Neural Network Training Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)Google Scholar
  2. 2.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936. IEEE (2009)Google Scholar
  3. 3.
    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: A large video database for human motion recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2556–2563. IEEE (2011)Google Scholar
  4. 4.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)Google Scholar
  5. 5.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2014)Google Scholar
  6. 6.
    Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176. IEEE (2011)Google Scholar
  7. 7.
    Wang, H., Schmid, C.: Action Recognition with Improved Trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558 (2013)Google Scholar
  8. 8.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  9. 9.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013)Google Scholar
  10. 10.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. CoRR abs/1312.6229 (2013)Google Scholar
  11. 11.
    Le, Q.V., Ranzato, M., Monga, R., Devin, M., Corrado, G., Chen, K., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. In: ICML, Omnipress (2012)Google Scholar
  12. 12.
    Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3361–3368. IEEE (2011)Google Scholar
  13. 13.
    Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding 115, 224–241 (2011)CrossRefGoogle Scholar
  15. 15.
    Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28, 976–990 (2010)CrossRefGoogle Scholar
  16. 16.
    Jiang, Y.G., Bhattacharya, S., Chang, S.F., Shah, M.: High-level event recognition in unconstrained videos. IJMIR 2, 73–101 (2013)Google Scholar
  17. 17.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for Large-Scale Image Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Jegou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311. IEEE (2010)Google Scholar
  19. 19.
    Wang, H., Schmid, C.: Lear-inria submission for the thumos workshop. In: ICCV Workshop on Action Recognition with a Large Number of Classes (2013)Google Scholar
  20. 20.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  21. 21.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  22. 22.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR. IEEE Computer Society (2008)Google Scholar
  23. 23.
    Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1996–2003. IEEE (2009)Google Scholar
  25. 25.
    Hyvärinen, A., Hoyer, P.: Emergence of phase-and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Computation 12, 1705–1720 (2000)CrossRefGoogle Scholar
  26. 26.
    Hyvärinen, A., Hurri, J., Hoyer, P.O.: Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, vol. 39. Springer (2009)Google Scholar
  27. 27.
    Comon, P.: Independent component analysis, a new concept? Signal Processing 36, 287–314 (1994)CrossRefzbMATHGoogle Scholar
  28. 28.
    Cardoso, J.: Multidimensional independent component analysis. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 1941–1944. IEEE (1998)Google Scholar
  29. 29.
    Kohonen, T.: Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map. Biological Cybernetics 75, 281–291 (1996)CrossRefzbMATHGoogle Scholar
  30. 30.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616. ACM (2009)Google Scholar
  31. 31.
    Zou, W.Y., Ng, A.Y., Zhu, S., Yu, K.: Deep Learning of Invariant Features via Simulated Fixations in Video. In: NIPS, pp. 3212–3220 (2012)Google Scholar
  32. 32.
    Hinton, G.E.: Connectionist learning procedures. Artificial Intelligence 40, 185–234 (1989)CrossRefGoogle Scholar
  33. 33.
    Mitchison, G.: Removing Time Variation with the Anti-Hebbian Differential Synapse, Neural Computation (1991)Google Scholar
  34. 34.
    Földiák, P.: Learning Invariance from Transformation Sequences. Neural Computation (1991)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations