Skip to main content

Residual Gating Fusion Network for Human Action Recognition

  • Conference paper
  • First Online:
  • 3111 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10996))

Abstract

Most of the recent works leverage Two-Stream framework to model the spatiotemporal information for video action recognition and achieve remarkable performance. In this paper, we propose a novel convolution architecture, called Residual Gating Fusion Network (RGFN), to improve their performance by fully exploring spatiotemporal information in residual signals. In order to further exploit the local details of low-level layers, we introduce Multi-Scale Convolution Fusion (MSCF) to implement spatiotemporal fusion at multiple levels. Since RGFN is an end-to-end network, it can be trained on various kinds of video datasets and applicative to other video analysis tasks. We evaluate our RGFN on two standard benchmarks, i.e., UCF101 and HMDB51, and analyze the designs of convolution network. Experiments results demonstrate the advantages of RGFN, achieving the state-of-the-art performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Laptev, I.: On space-time interest points. In: ICCV, vol. 1, pp. 432–439 (2003)

    Google Scholar 

  2. Wang, H.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2014)

    Google Scholar 

  3. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)

    Google Scholar 

  4. Feichtenhofer, C.: Convolutional two-stream network fusion for video action recognition. In: CVPR, pp. 1933–1941 (2016)

    Google Scholar 

  5. Wang, L.: Temporal segment networks: towards good practices for deep action recognition. ACM Trans. Inf. Syst. 22(1), 20–36 (2016)

    Google Scholar 

  6. He, K.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  7. Soomro, K.: UCF101: a dataset of 101 human actions classes from videos in the wild, CRCV-TR-12-01 (2012)

    Google Scholar 

  8. Kuehne, H.: HMDB: a large video database for human motion recognition. In: ICCV (2011)

    Google Scholar 

  9. Bilen, H.: Dynamic image networks for action recognition. In: CVPR, pp. 3034–3042 (2016)

    Google Scholar 

  10. Wang, L.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp. 4305–4314 (2015)

    Google Scholar 

  11. Du, T.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2016)

    Google Scholar 

  12. Varol, G.: Long-term temporal convolutions for action recognition. TPAMI, PP(99), 1 (2016)

    Google Scholar 

  13. Zhu, W.: A key volume mining deep framework for action recognition. In: CVPR, pp. 1991–1999 (2016)

    Google Scholar 

  14. Diba, A.: Deep temporal linear encoding networks. In: CVPR (2017)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grant 61673402, Grant 61273270, and Grant 60802069, in part by the Natural Science Foundation of Guangdong under Grant 2017A030311029, Grant 2016B010109002, Grant 2015B090912001, Grant 2016B010123005, and Grant 2017B090909005, in part by the Science and Technology Program of Guangzhou under Grant 201704020180 and Grant 201604020024, and in part by the Fundamental Research Funds for the Central Universities of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haifeng Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Hu, H. (2018). Residual Gating Fusion Network for Human Action Recognition. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science(), vol 10996. Springer, Cham. https://doi.org/10.1007/978-3-319-97909-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97909-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97908-3

  • Online ISBN: 978-3-319-97909-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics