Abstract
Recognizing human activities from unknown views is a challenging problem since human shapes appear quite differently from different viewpoints. In this paper, we learn a View-Invariant Pose (VIP) feature for depth-based cross-view action recognition. The proposed VIP feature encoder is a deep convolutional neural network that transfers human poses from multiple viewpoints to a shared high-level feature space. Learning such a deep model requires a large corpus of multi-view paired data which is very expensive to collect. Therefore, we generate a synthetic dataset by fitting human physical models with real motion capture data in the simulators and rendering depth images from various viewpoints. The VIP feature is learned from the synthetic data in an unsupervised way. To ensure the transferability from synthetic data to real data, domain adaptation is employed to minimize the domain difference. Moreover, an action can be considered as a sequence of poses and the temporal progress is modeled by recurrent neural network. In the experiments, our method is applied on two benchmark datasets of multi-view 3D human action and has been shown to achieve promising results when compared with the state-of-the-arts.
Supported by NTU Research Center for AI and Advanced Robotics and the Ministry of Science and Technology (MOST) under the Grants No. MOST 107-2634-F-002-018- and MOST 106-2218-E-002-043.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
CMU graphics lab motion capture database. http://mocap.cs.cmu.edu
MakeHuman: open source tool for making 3D characters. http://www.makehuman.org
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)
Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: IEEE Fourth International Conference on 3D Vision (3DV), pp. 479–488 (2016)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the International Conference on Machine Learning (2014)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Gupta, A., Martinez, J., Little, J.J., Woodham, R.J.: 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2601–2608 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hsu, T.W., Yang, Y.H., Yeh, T.H., Liu, A.S., Fu, L.C., Zeng, Y.C.: Privacy free indoor action detection system using top-view depth camera based on key-poses. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4058–4063 (2016)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Liu, J., Shahroudy, A., Xu, D., Chichung, A.K., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3007 (2017)
Liu, Y.C., Chiu, W.C., Wang, S.D., Wang, Y.C.F.: Domain-adaptive generative adversarial networks for sketch-to-photo inversion. In: IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2017)
Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017)
Ohn-Bar, E., Trivedi, M.M.: Joint angles similarities and HOG2 for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2013
Rahmani, H., Bennamoun, M.: Learning action recognition model from depth and skeleton videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5832–5841 (2017)
Rahmani, H., Mahmood, A., Huynh, D., Mian, A.: Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2430–2443 (2016)
Rahmani, H., Mian, A.: Learning a non-linear knowledge transfer model for cross-view action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2458–2466 (2015)
Rahmani, H., Mian, A.: 3D action recognition from novel viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1506–1515 (2016)
Rahmani, H., Mian, A., Shah, M.: Learning a deep model for human action recognition from novel viewpoints. IEEE Trans. Pattern Anal. Mach. Intell. 40, 667 (2017)
Seidenari, L., Varano, V., Berretti, S., Del Bimbo, A., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 479–485 (2013)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 4 (2017)
Varol, G., et al.: Learning from synthetic humans. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 915–922 (2013)
Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 804–811 (2014)
Zhang, Z., Wang, C., Xiao, B., Zhou, W., Liu, S., Shi, C.: Cross-view action recognition via a continuous virtual path. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2690–2697 (2013)
Blender: a 3D modelling and rendering package. http://www.blender.org
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, YH., Liu, AS., Liu, YH., Yeh, TH., Li, ZJ., Fu, LC. (2019). Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-20890-5_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20889-9
Online ISBN: 978-3-030-20890-5
eBook Packages: Computer ScienceComputer Science (R0)