Advertisement

Action Recognition Based on Learnt Motion Semantic Vocabulary

  • Qiong Zhao
  • Zhiwu Lu
  • Horace H. S. Ip
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6297)

Abstract

This paper presents a novel contextual spectral embedding (CSE) framework for human action recognition, which automatically learns the high-level features (motion semantic vocabulary) from a large vocabulary of abundant mid-level features (i.e. visual words). Our novelty is to exploit the inter-video context between mid-level features for spectral embedding, while the context is captured by the Pearson product moment correlation between mid-level features instead of Gaussian function computed over the vectors of point-wise information as mid-level feature representation. Our goal is to embed the mid-level features into a semantic low-dimensional space, and learn a much compact semantic vocabulary upon the CSE framework. Experiments on two action datasets demonstrate that our approach can achieve significantly improved results with respect to the state of the arts.

Keywords

Visual Word Action Recognition Human Action Recognition Action Dataset Contextual Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 461–468 (2009)Google Scholar
  2. 2.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1996–2003 (2009)Google Scholar
  3. 3.
    Wang, L., Lu, Z., Ip, H.H.S.: Image categorization based on a hierarchical spatial markov model. In: Jiang, X., Petkov, N. (eds.) CAIP 2009. LNCS, vol. 5702, pp. 766–773. Springer, Heidelberg (2009)Google Scholar
  4. 4.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36 (2004)Google Scholar
  5. 5.
    Liu, J., Shah, M.: Learning human actions via information maximization. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)Google Scholar
  6. 6.
    Savarese, S., DelPozo, A., Niebles, J., Fei-Fei, L.: Spatial-temporal correlatons for unsupervised action classification. In: IEEE Workshop on Motion and video Computing, WMVC 2008, pp. 1–8 (2008)Google Scholar
  7. 7.
    Wong, S.F., Kim, T.K., Cipolla, R.: Learning motion categories using both semantic and structural information. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–6 (2007)Google Scholar
  8. 8.
    Lu, Z., Ip, H.H.S.: Image categorization with spatial mismatch kernels. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 397–404 (2009)Google Scholar
  9. 9.
    Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79, 299–318 (2008)CrossRefGoogle Scholar
  10. 10.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)Google Scholar
  11. 11.
    Ballan, L., Bertini, M., Del Bimbo, A., Seidenari, L., Serra, G.: Effective codebooks for human action categorization (2009)Google Scholar
  12. 12.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2, pp. 524–531 (2005)Google Scholar
  13. 13.
    Lafon, S., Lee, A.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1393–1403 (2006)CrossRefGoogle Scholar
  14. 14.
    Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S.: Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 40–51 (2007)CrossRefGoogle Scholar
  15. 15.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: The Tenth IEEE International Conference on Computer Vision (ICCV 2005), pp. 1395–1402 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Qiong Zhao
    • 1
  • Zhiwu Lu
    • 1
  • Horace H. S. Ip
    • 1
  1. 1.Centre for Innovative Applications of Internet And Multimedia Technologies (AIMtech), Department of Computer ScienceCity University of Hong KongKowloonHong Kong

Personalised recommendations