Cross-view Action Recognition via Dual-Codebook and Hierarchical Transfer Framework

Zhang, Chengkun; Zheng, Huicheng; Lai, Jianhuang

doi:10.1007/978-3-319-16814-2_38

Chengkun Zhang¹⁷,
Huicheng Zheng¹⁷ &
Jianhuang Lai¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Asian Conference on Computer Vision

1652 Accesses
1 Citations

Abstract

In this paper, we focus on the challenging cross-view action recognition problem. The key to this problem is to find the correspondence between source and target views, which is realized in two stages in this paper. Firstly, we construct a Dual-Codebook for the two views, which is composed of two codebooks corresponding to source and target views, respectively. Each codeword in one codebook has a corresponding codeword in the other codebook, which is different from traditional methods that implement independent codebooks in the two views. We propose an effective co-clustering algorithm based on semi-nonnegative matrix factorization to derive the Dual-Codebook. With the Dual-Codebook, an action can be represented based on Bag-of-Dual-Codes (BoDC) no matter it is in the source view or in the target view. Therefore, the Dual-Codebook establishes a sort of codebook-to-codebook correspondence, which is the foundation for the second stage. In the second stage, we observe that, although the appearance of action samples will change significantly with viewpoints, the temporal relationship between atom actions within an action should be stable across views. Therefore, we further propose a hierarchical transfer framework to obtain the feature-to-feature correspondence at atom-level between source and target views. The framework is based on a temporal structure that can effectively capture the temporal relationship between atom actions within an action. It performs transfer at atom levels of multiple timescales, while most existing methods only perform video-level transfer. We carry out a series of experiments on the IXMAS dataset. The results demonstrate that our method obtained superior performance compared to state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43, 1–43 (2011)
Article Google Scholar
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. TSP 54, 4311–4322 (2006)
Google Scholar
Cheung, G., Baker, S., Kanade, T.: Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In: CVPR (2003)
Google Scholar
Ding, C., Li, T.: Convex and semi-nonnegative matrix factorizations. PAMI 32, 45–55 (2010)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)
Google Scholar
Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV (2003)
Google Scholar
Farhadi, A., Tabrizi, M.K.: Learning to recognize activities from the wrong view point. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 154–166. Springer, Heidelberg (2008)
Chapter Google Scholar
Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: ICCV (2009)
Google Scholar
Gavrila, D., Davis, L.S.: 3D model-based tracking of humans in action: a multi-view approach. In: CVPR (1996)
Google Scholar
Holte, M.B., Moeslund, T.B., Tran, C., Trivedi, M.: Human action recognition using multiple views: a comparative perspective on recent developments. In: HGBU (2011)
Google Scholar
Ji, X., Liu, H.: Advances in view-invariant human motion analysis: a review. TCSVT 40, 13–24 (2010)
Google Scholar
Junejo, I., Dexter, E., Laptev, I., Patrick, P.: View-independent action recognition from temporal self-similarities. PAMI 33, 172–185 (2011)
Article Google Scholar
Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008)
Chapter Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV (2003)
Google Scholar
Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge (2001)
Google Scholar
Li, R., Zickler, T.: Discriminative virtual views for cross-view action recognition. In: CVPR (2012)
Google Scholar
Lin, Z., Jiang, Z., Davis, L.: Recognizing actions by shape-motion prototype trees. In: ICCV (2009)
Google Scholar
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: CVPR (2008)
Google Scholar
Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)
Google Scholar
Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: CVPR (2011)
Google Scholar
Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and viterbi path searching. In: CVPR (2007)
Google Scholar
Paramesmaran, V., Chellappa, R.: View invariance for human action recognition. IJCV 66, 83–101 (2006)
Article Google Scholar
Rao, C., Yilmaz, A., Shah, M.: View-invariant representation and recognition of actions. IJCV 50, 203–226 (2002)
Article MATH Google Scholar
Tropp, J., Gilbert, A.: Signal recovery from random measurements via orthogonal matching pursuit. TIT 53, 4655–4666 (2007)
MATH MathSciNet Google Scholar
Turaga, P., Chellappa, R., Subrahmanian, V., Udrea, O.: Machine recognition of human activities: a survey. TCSVT 18, 1473–1488 (2008)
Google Scholar
Valera, M., Velastin, S.: Intelligent distributed surveillance systems: a review. VISP 152, 192–204 (2005)
Google Scholar
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3D examplars. In: ICCV (2007)
Google Scholar
Weinland, D., Ozuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: ECCV (2010)
Google Scholar
Wright, J., Yang, A., Ganesh, A., Sastry, S., Ma, Y.: Robust face recognition via sparse representation. PAMI 31, 210–227 (2009)
Article Google Scholar
Yan, P., Khan, S.M., Shah, M.: Learning 4D action feature models for arbitrary view action recognition. In: CVPR (2008)
Google Scholar
Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: CVPR (2005)
Google Scholar
Zhang, Z., Wang, Y., Zhang, Z.: Face synthesis from near-infrared to visual light via sparse representation. In: IJCB (2011)
Google Scholar
Zheng, J., Jiang, Z.: Learning view-invariant sparse representations for cross-view action recognition. In: ICCV (2013)
Google Scholar
Zheng, J., Jiang, Z., Phillips, P., Chellappa, R.: Cross-view action recognition via a transferable dictionary pair. In: BMVC (2012)
Google Scholar

Download references

Acknowledgement

This work is supported by National Natural Science Foundation of China (No. 61172141), Key Projects in the National Science & Technology Pillar Program during the 12th Five-Year Plan Period (No. 2012BAK16B06), and Science and Technology Program of Guangzhou, China (2014J4100092).

Author information

Authors and Affiliations

School of Information Science and Technology, Sun Yat-sen University, Guangzhou, 510006, China
Chengkun Zhang, Huicheng Zheng & Jianhuang Lai

Authors

Chengkun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Huicheng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jianhuang Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huicheng Zheng .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (pdf 59 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Zheng, H., Lai, J. (2015). Cross-view Action Recognition via Dual-Codebook and Hierarchical Transfer Framework. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-16814-2_38
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics