Abstract
We present a method for human action recognition from video, which exploits both form (local shape) and motion (local flow). Inspired by models of the human visual system, the two feature sets are processed independently in separate channels. The form channel extracts a dense local shape representation from every frame, while the motion channel extracts dense optic flow from the frame and its immediate predecessor. The same processing pipeline is applied in both channels: feature maps are pooled locally, down-sampled, and compared to a collection of learnt templates, yielding a vector of similarity scores. In a final step, the two score vectors are merged, and recognition is performed with a discriminative classifier. In an evaluation on two standard datasets our method outperforms the state-of-the-art, confirming that the combination of form and motion improves recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ali, S., Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: Proc. ICCV (2007)
Beintema, J.A., Lappe, M.: Perception of biological motion without local image motion. P. Natl. Acad. Sci. USA 99, 5661–5663 (2002)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. ICCV (2005)
Carlsson, S., Sullivan, J.: Action recognition by shape matching to key frames. In: Proc. Workshop on Models versus Exemplars in Computer Vision (2001)
Casile, A., Giese, M.A.: Critical features for the recognition of biological motion. J. Vision 5, 348–360 (2005)
Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. IEEE T. Pattern Anal. 25(5), 564–575 (2003)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc ICCV, pp. 886–893 (2005)
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Workshop on Performance Evaluation of Tracking and Surveillance (VS-PETS) (2005)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proc. ICCV (2003)
Felleman, D.J., van Essen, D.C.: Distributed hierarchical processing in the primate visual cortex. Cereb. Cortex 1, 1–47 (1991)
Field, D.J.: Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A. 4(12), 2379–2394 (1987)
Fukushima, K.: Neocognitron: a self-organizing neural network model for mechanisms of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)
Gawne, T.J., Martin, J.: Response of primate visual cortical V4 neurons to two simultaneously presented stimuli. J. Neurophysiol. 88, 1128–1135 (2002)
Giese, M.A., Poggio, T.: Neural mechanisms for the recognition of biological movements. Nat. Neurosci. 4, 179–192 (2003)
Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. (Lond) 160, 106–154 (1962)
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: Proc. ICCV (2007)
Lampl, I., Ferster, D., Poggio, T., Riesenhuber, M.: Intracellular measurements of spatial integration and the max operation in complex cells of the cat primary visual cortex. J. Neurophysiol. 92, 2704–2713 (2004)
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: Proc. ICCV (2003)
Niebles, J.C., Fei-Fei, L.: A hierarchical model of shape and appearance for human action classification. In: Proc. CVPR (2007)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatio-temporal words. In: Proc. BMVC (2006)
Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. Int. J. Comput. Vision 50(2), 203–226 (2002)
Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. ICPR (2004)
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Object recognition with cortex-like mechanisms. IEEE T. Pattern Anal. 29(3), 411–426 (2007)
Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: Proc. CVPR (2005)
Wang, L., Suter, D.: Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: Proc. CVPR (2007)
Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. Comput. Vis. Image Und. 72(2), 232–247 (1999)
Zach, C., Pock, T., Bischof, H.: A duality-based approach to realtime TV − L 1 optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schindler, K., van Gool, L. (2008). Combining Densely Sampled Form and Motion for Human Action Recognition. In: Rigoll, G. (eds) Pattern Recognition. DAGM 2008. Lecture Notes in Computer Science, vol 5096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69321-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-69321-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69320-8
Online ISBN: 978-3-540-69321-5
eBook Packages: Computer ScienceComputer Science (R0)