Pipelining Localized Semantic Features for Fine-Grained Action Recognition

Zhou, Yang; Ni, Bingbing; Yan, Shuicheng; Moulin, Pierre; Tian, Qi

doi:10.1007/978-3-319-10593-2_32

Yang Zhou¹⁹,
Bingbing Ni²⁰,
Shuicheng Yan²¹,
Pierre Moulin²² &
…
Qi Tian¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8692))

Included in the following conference series:

European Conference on Computer Vision

24k Accesses
11 Citations

Abstract

In fine-grained action (object manipulation) recognition, it is important to encode object semantic (contextual) information, i.e., which object is being manipulated and how it is being operated. However, previous methods for action recognition often represent the semantic information in a global and coarse way and therefore cannot cope with fine-grained actions. In this work, we propose a representation and classification pipeline which seamlessly incorporates localized semantic information into every processing step for fine-grained action recognition. In the feature extraction stage, we explore the geometric information between local motion features and the surrounding objects. In the feature encoding stage, we develop a semantic-grouped locality-constrained linear coding (SG-LLC) method that captures the joint distributions between motion and object-in-use information. Finally, we propose a semantic-aware multiple kernel learning framework (SA-MKL) by utilizing the empirical joint distribution between action and object type for more discriminative action classification. Extensive experiments are performed on the large-scale and difficult fine-grained MPII cooking action dataset. The results show that by effectively accumulating localized semantic information into the action representation and classification pipeline, we significantly improve the fine-grained action classification performance over the existing methods.

Download to read the full chapter text

Chapter PDF

Fusing $${\mathcal {R}}$$ Features and Local Features with Context-Aware Kernels for Action Recognition

Article 27 October 2015

Chunfeng Yuan, Baoxin Wu, … Fangshi Wang

Human Action Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature

Spatially regularized and locality-constrained linear coding for human action recognition

Article 01 May 2014

Bin Wang, Wen Gai, … Maojun Zhang

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpixels. EPFL 149300 (2010)
Google Scholar
Bach, F.R., Lanckriet, G.R., Jordan, M.I.: Multiple kernel learning, conic duality, and the smo algorithm. In: ICML, pp. 6–13 (2004)
Google Scholar
Cao, L., Mu, Y., Natsev, A., Chang, S.-F., Hua, G., Smith, J.R.: Scene aligned pooling for complex video recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 688–701. Springer, Heidelberg (2012)
Chapter Google Scholar
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. TIST 2(3), 1–27 (2011)
Article Google Scholar
Chao, Y.-W., Yeh, Y.-R., Chen, Y.-W., Lee, Y.-J., Wang, Y.-C.F.: Locality-constrained group sparse representation for robust face recognition. In: ICIP, pp. 761–764 (2011)
Google Scholar
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR, pp. 3273–3280 (2011)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Google Scholar
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR, pp. 524–531 (2005)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. T-PAMI 32(9), 1627–1645 (2010)
Article Google Scholar
Koppula, H.S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. CoRR (2012)
Google Scholar
Lan, T.: Beyond actions: Discriminative models for contextual group activities. Ph.D. thesis, Applied Science: School of Computing Science (2010)
Google Scholar
Lan, T., Wang, Y., Mori, G., Robinovitch, S.N.: Retrieving actions in group contexts. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 181–194. Springer, Heidelberg (2012)
Chapter Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Google Scholar
Lee, H., Battle, A., Raina, R., Ng, A.: Efficient sparse coding algorithms. In: NIPS, pp. 801–808 (2006)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: CVPR, pp. 1996–2003 (2009)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936 (2009)
Google Scholar
Moore, D., Essa, I., Hayes, M.: Exploiting human actions and object context for recognition tasks. In: ICCV, Greece (1999)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Rakotomamonjy, A., Bach, F.R., Canu, S., Grandvalet, Y.: Simplemkl. JMLR 9(11), 2491–2521 (2008)
MATH MathSciNet Google Scholar
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR, pp. 1194–1201 (2012)
Google Scholar
Ullah, M.M., Parizi, S.N., Laptev, I.: Improving bag-of-features action recognition with non-local cues. In: BMVC, vol. 10, pp. 1–11 (2010)
Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR, pp. 3169–3176 (2011)
Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: CVPR, pp. 3360–3367 (2010)
Google Scholar
Wang, Y., Mori, G.: Hidden part models for human action recognition: Probabilistic versus max margin. T-PAMI 33(7), 1310–1323 (2011)
Article Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Wu, J., Osuntogun, A., Choudhury, T., Philipose, M., Rehg, J.: A scalable approach to activity recognition based on object use. In: ICCV, pp. 1–8 (2007)
Google Scholar
Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)
Google Scholar
Yao, B., Khosla, A., Fei-Fei, L.: Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. In: ICML (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Texas at San Antonio, USA
Yang Zhou & Qi Tian
Advanced Digital Sciences Center, Singapore
Bingbing Ni
National University of Singapore, Singapore
Shuicheng Yan
University of Illinois at Urbana-Champaign, USA
Pierre Moulin

Authors

Yang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bingbing Ni
View author publications
You can also search for this author in PubMed Google Scholar
Shuicheng Yan
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Moulin
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
KU Leuven, ESAT - PSI, iMinds, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Ni, B., Yan, S., Moulin, P., Tian, Q. (2014). Pipelining Localized Semantic Features for Fine-Grained Action Recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8692. Springer, Cham. https://doi.org/10.1007/978-3-319-10593-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-10593-2_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10592-5
Online ISBN: 978-3-319-10593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Pipelining Localized Semantic Features for Fine-Grained Action Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Fusing $${\mathcal {R}}$$ Features and Local Features with Context-Aware Kernels for Action Recognition

Human Action Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature

Spatially regularized and locality-constrained linear coding for human action recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pipelining Localized Semantic Features for Fine-Grained Action Recognition

Abstract

Chapter PDF

Similar content being viewed by others

Fusing $${\mathcal {R}}$$ Features and Local Features with Context-Aware Kernels for Action Recognition

Human Action Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature

Spatially regularized and locality-constrained linear coding for human action recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation