Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation

Yang, Yu-Huan; Liu, An-Sheng; Liu, Yu-Hung; Yeh, Tso-Hsin; Li, Zi-Jun; Fu, Li-Chen

doi:10.1007/978-3-030-20890-5_28

Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation

Yu-Huan Yang¹⁸,
An-Sheng Liu¹⁸,
Yu-Hung Liu¹⁸,
Tso-Hsin Yeh¹⁸,
Zi-Jun Li¹⁸ &
…
Li-Chen Fu^18,19

Conference paper
First Online: 02 June 2019

2149 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Abstract

Recognizing human activities from unknown views is a challenging problem since human shapes appear quite differently from different viewpoints. In this paper, we learn a View-Invariant Pose (VIP) feature for depth-based cross-view action recognition. The proposed VIP feature encoder is a deep convolutional neural network that transfers human poses from multiple viewpoints to a shared high-level feature space. Learning such a deep model requires a large corpus of multi-view paired data which is very expensive to collect. Therefore, we generate a synthetic dataset by fitting human physical models with real motion capture data in the simulators and rendering depth images from various viewpoints. The VIP feature is learned from the synthetic data in an unsupervised way. To ensure the transferability from synthetic data to real data, domain adaptation is employed to minimize the domain difference. Moreover, an action can be considered as a sequence of poses and the temporal progress is modeled by recurrent neural network. In the experiments, our method is applied on two benchmark datasets of multi-view 3D human action and has been shown to achieve promising results when compared with the state-of-the-arts.

Supported by NTU Research Center for AI and Advanced Robotics and the Ministry of Science and Technology (MOST) under the Grants No. MOST 107-2634-F-002-018- and MOST 106-2218-E-002-043.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

CMU graphics lab motion capture database. http://mocap.cs.cmu.edu
MakeHuman: open source tool for making 3D characters. http://www.makehuman.org
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)
Article MathSciNet Google Scholar
Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: IEEE Fourth International Conference on 3D Vision (3DV), pp. 479–488 (2016)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the International Conference on Machine Learning (2014)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Gupta, A., Martinez, J., Little, J.J., Woodham, R.J.: 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2601–2608 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hsu, T.W., Yang, Y.H., Yeh, T.H., Liu, A.S., Fu, L.C., Zeng, Y.C.: Privacy free indoor action detection system using top-view depth camera based on key-poses. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4058–4063 (2016)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Liu, J., Shahroudy, A., Xu, D., Chichung, A.K., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3007 (2017)
Article Google Scholar
Liu, Y.C., Chiu, W.C., Wang, S.D., Wang, Y.C.F.: Domain-adaptive generative adversarial networks for sketch-to-photo inversion. In: IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2017)
Google Scholar
Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017)
Article Google Scholar
Ohn-Bar, E., Trivedi, M.M.: Joint angles similarities and HOG2 for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2013
Google Scholar
Rahmani, H., Bennamoun, M.: Learning action recognition model from depth and skeleton videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5832–5841 (2017)
Google Scholar
Rahmani, H., Mahmood, A., Huynh, D., Mian, A.: Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2430–2443 (2016)
Article Google Scholar
Rahmani, H., Mian, A.: Learning a non-linear knowledge transfer model for cross-view action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2458–2466 (2015)
Google Scholar
Rahmani, H., Mian, A.: 3D action recognition from novel viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1506–1515 (2016)
Google Scholar
Rahmani, H., Mian, A., Shah, M.: Learning a deep model for human action recognition from novel viewpoints. IEEE Trans. Pattern Anal. Mach. Intell. 40, 667 (2017)
Article Google Scholar
Seidenari, L., Varano, V., Berretti, S., Del Bimbo, A., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 479–485 (2013)
Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)
Google Scholar
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 4 (2017)
Google Scholar
Varol, G., et al.: Learning from synthetic humans. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 915–922 (2013)
Google Scholar
Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 804–811 (2014)
Google Scholar
Zhang, Z., Wang, C., Xiao, B., Zhou, W., Liu, S., Shi, C.: Cross-view action recognition via a continuous virtual path. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2690–2697 (2013)
Google Scholar
Blender: a 3D modelling and rendering package. http://www.blender.org

Download references

Author information

Authors and Affiliations

National Taiwan University, Taipei, 10617, Taiwan
Yu-Huan Yang, An-Sheng Liu, Yu-Hung Liu, Tso-Hsin Yeh, Zi-Jun Li & Li-Chen Fu
NTU Research Center for AI and Advanced Robotics, Taipei, 10617, Taiwan
Li-Chen Fu

Authors

Yu-Huan Yang
View author publications
You can also search for this author in PubMed Google Scholar
An-Sheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hung Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tso-Hsin Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Zi-Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Li-Chen Fu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li-Chen Fu .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, YH., Liu, AS., Liu, YH., Yeh, TH., Li, ZJ., Fu, LC. (2019). Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-20890-5_28
Published: 02 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20889-9
Online ISBN: 978-3-030-20890-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics