Skip to main content

Human Action Recognition via Body Part Region Segmented Dense Trajectories

  • Conference paper
  • First Online:
Computer Vision – ACCV 2018 Workshops (ACCV 2018)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11367))

Included in the following conference series:

Abstract

We propose a novel action recognition framework based on trajectory features with human-aware spatial segmentation. Our insight is that the critical features for recognition are appeared in the partial regions of human, thus we segment a video frame into spatial regions based on the human body parts to enhance feature representation. We utilize an object detector and a pose estimator to segment four regions, namely full body, left/right arm, and upper body. From these regions, we extract dense trajectory features and feed them into a shallow RNN to effectively consider the long-term relationships. The evaluation result shows that our framework outperforms previous approaches on the standard two benchmarks, i.e. J-HMDB and MPII Cooking Activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)

    Google Scholar 

  2. Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: ICCV (2015)

    Google Scholar 

  3. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45103-X_50

    Chapter  Google Scholar 

  4. Gkioxari, G., Malik, J.: Finding action tubes. In: CVPR (2015)

    Google Scholar 

  5. He, Y., Shirakabe, S., Satoh, Y., Kataoka, H.: Human action recognition without human. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 11–17. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_2

    Chapter  Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: ICCV (2013)

    Google Scholar 

  8. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)

    Google Scholar 

  9. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  10. Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: action recognition through the motion analysis of tracked features. In: ICCV Workshop (2009)

    Google Scholar 

  11. Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 581–595. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_38

    Chapter  Google Scholar 

  12. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_11

    Chapter  Google Scholar 

  13. Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: CVPR (2012)

    Google Scholar 

  14. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)

    Google Scholar 

  15. Sun, J., Mu, Y., Yan, S., Cheong, L.F.: Activity recognition using dense long-duration trajectories. In: ICME (2010)

    Google Scholar 

  16. Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR (2009)

    Google Scholar 

  17. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)

    Google Scholar 

  18. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)

    Google Scholar 

  19. Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR (2015)

    Google Scholar 

  20. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  21. Yamada, K., Yoshida, T., Sumi, K., Habe, H., Mitsugami, I.: Spatial and temporal segmented dense trajectories for gesture recognition. In: QCAV (2017)

    Google Scholar 

  22. Zhou, Y., Ni, B., Hong, R., Wang, M., Tian, Q.: Interaction part mining: a mid-level approach for fine-grained action recognition. In: CVPR (2015)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by Aoyama Gakuin University-Supported Program “Early Eagle Program”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaho Yamada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yamada, K., Ito, S., Kaneko, N., Sumi, K. (2019). Human Action Recognition via Body Part Region Segmented Dense Trajectories. In: Carneiro, G., You, S. (eds) Computer Vision – ACCV 2018 Workshops. ACCV 2018. Lecture Notes in Computer Science(), vol 11367. Springer, Cham. https://doi.org/10.1007/978-3-030-21074-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21074-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21073-1

  • Online ISBN: 978-3-030-21074-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics