Skip to main content

Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Abstract

Recognizing human activities from unknown views is a challenging problem since human shapes appear quite differently from different viewpoints. In this paper, we learn a View-Invariant Pose (VIP) feature for depth-based cross-view action recognition. The proposed VIP feature encoder is a deep convolutional neural network that transfers human poses from multiple viewpoints to a shared high-level feature space. Learning such a deep model requires a large corpus of multi-view paired data which is very expensive to collect. Therefore, we generate a synthetic dataset by fitting human physical models with real motion capture data in the simulators and rendering depth images from various viewpoints. The VIP feature is learned from the synthetic data in an unsupervised way. To ensure the transferability from synthetic data to real data, domain adaptation is employed to minimize the domain difference. Moreover, an action can be considered as a sequence of poses and the temporal progress is modeled by recurrent neural network. In the experiments, our method is applied on two benchmark datasets of multi-view 3D human action and has been shown to achieve promising results when compared with the state-of-the-arts.

Supported by NTU Research Center for AI and Advanced Robotics and the Ministry of Science and Technology (MOST) under the Grants No. MOST 107-2634-F-002-018- and MOST 106-2218-E-002-043.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. CMU graphics lab motion capture database. http://mocap.cs.cmu.edu

  2. MakeHuman: open source tool for making 3D characters. http://www.makehuman.org

  3. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1–2), 151–175 (2010)

    Article  MathSciNet  Google Scholar 

  4. Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: IEEE Fourth International Conference on 3D Vision (3DV), pp. 479–488 (2016)

    Google Scholar 

  5. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

    Google Scholar 

  6. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

    Google Scholar 

  7. Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the International Conference on Machine Learning (2014)

    Google Scholar 

  8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  9. Gupta, A., Martinez, J., Little, J.J., Woodham, R.J.: 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2601–2608 (2014)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)

    Article  Google Scholar 

  12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  13. Hsu, T.W., Yang, Y.H., Yeh, T.H., Liu, A.S., Fu, L.C., Zeng, Y.C.: Privacy free indoor action detection system using top-view depth camera based on key-poses. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4058–4063 (2016)

    Google Scholar 

  14. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  15. Liu, J., Shahroudy, A., Xu, D., Chichung, A.K., Wang, G.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3007 (2017)

    Article  Google Scholar 

  16. Liu, Y.C., Chiu, W.C., Wang, S.D., Wang, Y.C.F.: Domain-adaptive generative adversarial networks for sketch-to-photo inversion. In: IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2017)

    Google Scholar 

  17. Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  18. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  19. McInnes, L., Healy, J., Astels, S.: HDBSCAN: hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017)

    Article  Google Scholar 

  20. Ohn-Bar, E., Trivedi, M.M.: Joint angles similarities and HOG2 for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2013

    Google Scholar 

  21. Rahmani, H., Bennamoun, M.: Learning action recognition model from depth and skeleton videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5832–5841 (2017)

    Google Scholar 

  22. Rahmani, H., Mahmood, A., Huynh, D., Mian, A.: Histogram of oriented principal components for cross-view action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(12), 2430–2443 (2016)

    Article  Google Scholar 

  23. Rahmani, H., Mian, A.: Learning a non-linear knowledge transfer model for cross-view action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2458–2466 (2015)

    Google Scholar 

  24. Rahmani, H., Mian, A.: 3D action recognition from novel viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1506–1515 (2016)

    Google Scholar 

  25. Rahmani, H., Mian, A., Shah, M.: Learning a deep model for human action recognition from novel viewpoints. IEEE Trans. Pattern Anal. Mach. Intell. 40, 667 (2017)

    Article  Google Scholar 

  26. Seidenari, L., Varano, V., Berretti, S., Del Bimbo, A., Pala, P.: Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 479–485 (2013)

    Google Scholar 

  27. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016

    Google Scholar 

  28. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)

    Google Scholar 

  29. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)

    Google Scholar 

  30. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 4 (2017)

    Google Scholar 

  31. Varol, G., et al.: Learning from synthetic humans. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  32. Wang, C., Wang, Y., Yuille, A.L.: An approach to pose-based action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 915–922 (2013)

    Google Scholar 

  33. Yang, X., Tian, Y.: Super normal vector for activity recognition using depth sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 804–811 (2014)

    Google Scholar 

  34. Zhang, Z., Wang, C., Xiao, B., Zhou, W., Liu, S., Shi, C.: Cross-view action recognition via a continuous virtual path. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2690–2697 (2013)

    Google Scholar 

  35. Blender: a 3D modelling and rendering package. http://www.blender.org

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li-Chen Fu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, YH., Liu, AS., Liu, YH., Yeh, TH., Li, ZJ., Fu, LC. (2019). Cross-View Action Recognition Using View-Invariant Pose Feature Learned from Synthetic Data with Domain Adaptation. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20890-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20889-9

  • Online ISBN: 978-3-030-20890-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics