Graph-based method for human-object interactions detection

Xia, Li-min; Wu, Wei

doi:10.1007/s11771-021-4597-x

Graph-based method for human-object interactions detection

基于图的人-物交互行为检测方法

Published: 28 January 2021

Volume 28, pages 205–218, (2021)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

158 Accesses
7 Citations
Explore all metrics

Abstract

Human-object interaction (HOIs) detection is a new branch of visual relationship detection, which plays an important role in the field of image understanding. Because of the complexity and diversity of image content, the detection of HOIs is still an onerous challenge. Unlike most of the current works for HOIs detection which only rely on the pairwise information of a human and an object, we propose a graph-based HOIs detection method that models context and global structure information. Firstly, to better utilize the relations between humans and objects, the detected humans and objects are regarded as nodes to construct a fully connected undirected graph, and the graph is pruned to obtain an HOI graph that only preserving the edges connecting human and object nodes. Then, in order to obtain more robust features of human and object nodes, two different attention-based feature extraction networks are proposed, which model global and local contexts respectively. Finally, the graph attention network is introduced to pass messages between different nodes in the HOI graph iteratively, and detect the potential HOIs. Experiments on V-COCO and HICO-DET datasets verify the effectiveness of the proposed method, and show that it is superior to many existing methods.

摘要

人-物交互行为检测作为视觉关系检测的一个新分支, 在图像理解领域起着重要的作用。由于图像内容复杂多样, 人-物交互行为的检测仍是一大挑战。与当前仅依靠人与物体间的成对信息的方法不同, 本文提出了一种可以对上下文和全局结构信息进行建模的基于图的人-物交互行为检测方法。首先, 为了更好地利用人与物体之间的关系, 将图像中检测到的人和对象视为节点, 构造人-物交互图。其次, 为了获得更鲁棒的人与物体节点的特征表示, 通过两个的特征提取网络, 分别对全局和局部上下文进行建模。最后, 引入图注意力网络, 在人-物交互图中的不同节点间迭代传递信息, 检测潜在的人-物交互行为。在V-COCO 和HICO-DET 数据集上的实验验证了该方法的有效性, 并表明该方法优于现有的许多方法。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

Human-Object Interaction Detection Based on Multi-scale Attention Fusion

DRG: Dual Relation Graph for Human-Object Interaction Detection

References

LIN T Y, DOLLÁR P, GIRSHICK R, HE K M, HARIHARAN B, BELONGIE S. Feature pyramid networks for object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2117–2125. DOI:https://doi.org/10.1109/cvpr.2017.106.
HE K, ZHANG X, REN S, SUN J. Deep residual learning for image recognition [C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016: 770–778. DOI: https://doi.org/10.1109/cvpr.2016.90.
REN S, HE K, GIRSHICK R, SUN J. Faster R-CNN: Towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. DOI: https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
WANG J, XIA L, HU X, XIAO Y. Abnormal event detection with semi-supervised sparse topic model [J]. Neural Computing and Applications, 2019, 31(5): 1607–1617. DOI: https://doi.org/10.1007/s00521-018-3417-1.
Article Google Scholar
WANG P, CHEN P, YUAN Y, HUANG Z, HOU X, COTTRELL G. Understanding Convolution for Semantic Segmentation [C]//2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018: 1451–1460. DOI: https://doi.org/10.1109/WACV.2018.00163.
GAO R, XIONG B, GRAUMAN K. Im2flow: Motion hallucination from static images for action recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 5937–5947. DOI: https://doi.org/10.1109/cvpr.2018.00622.
CHÉRON G, LAPTEV I, SCHMID C. P-CNN: Pose-based CNN features for action recognition [C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 3218–3226. DOI: https://doi.org/10.1109/iccv.2015.368.
LIU J, WANG G, DUAN L Y, ABDIYEVA K, KOT A C. Skeleton-based human action recognition with global context-aware attention LSTM networks [J]. IEEE Transactions on Image Processing, 2017, 27(4): 1586–1599. DOI: https://doi.org/10.1109/tip.2017.2785279.
Article MathSciNet MATH Google Scholar
MAJD M, SAFABAKHSH R. A motion-aware ConvLSTM network for action recognition [J]. Applied Intelligence, 2019, 49(7): 2515–2521. DOI: https://doi.org/10.1007/s10489-018-1395-8.
Article Google Scholar
XIA L M, GUO W T, WANG H. Interaction behavior recognition from multiple views [J]. Journal of Central South University, 2020, 27(1): 101–113. DOI: https://doi.org/10.1007/s11771-020-4281-6.
Article Google Scholar
LI Y, OUYANG W, ZHOU B, WANG K, WANG X. Scene graph generation from objects, phrases and region captions [C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1261–1270. DOI: https://doi.org/10.1109/iccv.2017.142.
XU D, ZHU Y, CHOY C B, LI F F. Scene graph generation by iterative message passing [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 5410–5419. DOI: https://doi.org/10.1109/cvpr.2017.330.
LU C, KRISHNA R, BERNSTEIN M, LI F F. Visual relationship detection with language priors [C]//European Conference on Computer Vision. Springer, Cham, 2016: 852–869. DOI: https://doi.org/10.1007/978-3-319-46448-0_51.
Google Scholar
DAI Y, WANG C, DONG J, SUN C Y. Visual relationship detection based on bidirectional recurrent neural network [J]. Multimedia Tools and Applications, 2019: 1–17. DOI: https://doi.org/10.1007/s11042-019-7732-z.
TENEY D, LIU L, van den HENGEL A. Graph-structured representations for visual question answering [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 1–9. DOI: https://doi.org/10.1109/cvpr.2017.344.
PENG L, YANG Y, BIN Y, XIE N, SHEN F M, JI Y L, XU X. Word-to-region attention network for visual question answering [J]. Multimedia Tools and Applications, 2019, 78(3): 3843–3858. DOI: https://doi.org/10.1007/s11042-018-6389-3.
Article Google Scholar
CHEN X, ZITNICK C L. Mind’s eye: A recurrent visual representation for image caption generation [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 2422–2431. DOI: https://doi.org/10.1109/cvpr.2015.7298856.
JOHNSON J, HARIHARAN B, van der MAATEN L, HOFFMAN J, LI F F, ZITNICK C L, GIRSHICK R. Inferring and executing programs for visual reasoning [C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 2989–2998. DOI: https://doi.org/10.1109/iccv.2017.325.
GUPTA A, KEMBHAVI A, DAVIS L S. Observing human-object interactions: Using spatial and functional compatibility for recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(10): 1775–1789. DOI: https://doi.org/10.1109/tpami.2009.83.
Article Google Scholar
YAO B, LI F F. Modeling mutual context of object and human pose in human-object interaction activities [C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010: 17–24. DOI: https://doi.org/10.1109/cvpr.2010.5540235.
CHAO Y W, LIU Y, LIU X, ZENG H. Learning to detect human-object interactions [C]//2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018: 381–389. DOI: https://doi.org/10.1109/wacv.2018.00048.
FANG H S, CAO J, TAI Y W, LU C. Pairwise body-part attention for recognizing human-object interactions [J]. Lecture Notes in Computer Science, 2018: 52–68. DOI: https://doi.org/10.1007/978-3-030-01249-6_4.
XIA L, LI R. Multi-stream neural network fused with local information and global information for HOI detection [J]. Applied Intelligence, 2020, 50(12): 4495–4505. DOI: https://doi.org/10.1007/s10489-020-01794-1.
Article Google Scholar
HU J F, ZHENG W S, LAI J, GONG S G. Recognising human-object interaction via exemplar based modelling [C]//Proceedings of the IEEE International Conference on Computer Vision. 2013: 3144–3151. DOI: https://doi.org/10.1109/iccv.2013.390.
GUPTA S, MALIK J. Visual semantic role labeling [J]. Computer Science: Computer Vision and Pattern Recognition, 2015: arXiv:1505.04474.
SHEN L, YEUNG S, HOFFMAN J, MORIG, LI F F. Scaling human-object interaction recognition through zero-shot learning [C]//2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018: 1568–1576. DOI: https://doi.org/10.1109/wacv.2018.00181.
QI S, WANG W, JIA B, SHEN J, ZHU S C. Learning human-object interactions by graph parsing neural networks [C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 401–417. DOI: https://doi.org/10.1007/978-3-030-01240-3_25.
OLIVA A, TORRALBA A. The role of context in object recognition [J]. Trends in Cognitive Sciences, 2007, 11(12): 520–527. DOI: https://doi.org/10.1016/j.tics.2007.09.009.
Article Google Scholar
VELICKOVIC P, CUCURULL G, CASANOVA A, ROMERO A, LIÒ P, BENGIO Y. Graph attention networks [C]//International Conference on Learning Representations, 2018. DOI: https://doi.org/10.17863/CAM.48429.
PREST A, SCHMID C, FERRARI V. Weakly supervised learning of interactions between humans and objects [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 34(3): 601–614. DOI: https://doi.org/10.1109/tpami.2011.158.
Article Google Scholar
DESAI C, RAMANAN D, FOWLKES C. Discriminative models for static human-object interactions [C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. IEEE, 2010: 9–16. DOI: https://doi.org/10.1109/cvprw.2010.5543176.
GKIOXARI G, GIRSHICK R, DOLLÁR P, HE K. Detecting and recognizing human-object interactions [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8359–8367. DOI: https://doi.org/10.1109/cvpr.2018.00872.
GORI M, MONFARDINI G, SCARSELLI F. A new model for learning in graph domains [C]//Proceedings of 2005 IEEE International Joint Conference on Neural Networks. IEEE, 2005, 2: 729–734. DOI: https://doi.org/10.1109/ijcnn.2005.1555942.
Google Scholar
KIPF T, WELLING M. Semi-supervised classification with graph convolutional networks [C]//International Conference on Learning Representations. 2017.
JAIN A, ZAMIR A R, SAVARESE S, SAXENA A. Structural-RNN: Deep learning on spatio-temporal graphs [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 5308–5317. DOI: https://doi.org/10.1109/cvpr.2016.573.
CHEN X, LI L J, LI F F, GUPTA A. Iterative visual reasoning beyond convolutions [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7239–7248. DOI: https://doi.org/10.1109/cvpr.2018.00756.
MARINO K, SALAKHUTDINOV R, GUPTA A. The More You Know: Using Knowledge Graphs for Image Classification [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 20–28. DOI: https://doi.org/10.1109/cvpr.2017.10.
HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7132–7141. DOI: https://doi.org/10.1109/cvpr.2018.00745.
PEYRE J, SIVIC J, LAPTEV I, SIVIC J. Weakly-supervised learning of visual relations [C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 5179–5188. DOI: https://doi.org/10.1109/iccv.2017.554.
NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines [C]//International Conference on Machine Learning. 2010: 807–814.
GIRDHAR R, RAMANAN D. Attentional pooling for action recognition [C]//Advances in Neural Information Processing Systems. 2017: 34–45.
KINGMA D P, BA J. Adam: A method for stochastic optimization [J]. arXiv preprint, 2014: arXiv:1412.6980.
LIN T Y, MAIRE M, BELONGIE S, HAYS J, PERONA P, RAMANAN D, DOLLÁR P, ZITNICK C L. Microsoft coco: Common objects in context [C]//European Conference on Computer Vision. Cham: Springer, 2014: 740–755. DOI: https://doi.org/10.1007/978-3-319-10602-1_48.
Google Scholar
KAREN S Y, ANDREW Z M. Very deep convolutional networks for large-scale image recognition [J]. arXiv preprint, 2014: arXiv:1409.1556.

Download references

Author information

Authors and Affiliations

School of Automation, Central South University, Changsha, 410075, China
Li-min Xia (夏利民) & Wei Wu (吴伟)

Authors

Li-min Xia (夏利民)
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wu (吴伟)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XIA Li-min provided the concept and edited the draft of manuscript. WU Wei conducted the literature review and wrote the first draft of the manuscript.

Corresponding author

Correspondence to Li-min Xia (夏利民).

Additional information

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Foundation item

Project(51678075) supported by the National Natural Science Foundation of China; Project(2017GK2271) supported by the Hunan Provincial Science and Technology Department, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xia, Lm., Wu, W. Graph-based method for human-object interactions detection. J. Cent. South Univ. 28, 205–218 (2021). https://doi.org/10.1007/s11771-021-4597-x

Download citation

Received: 25 December 2019
Accepted: 14 June 2020
Published: 28 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11771-021-4597-x

Key words

关键词

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph-based method for human-object interactions detection

Abstract

摘要

Access this article

Similar content being viewed by others

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

Human-Object Interaction Detection Based on Multi-scale Attention Fusion

DRG: Dual Relation Graph for Human-Object Interaction Detection

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Conflict of Interests

Foundation item

Rights and permissions

About this article

Cite this article

Key words

关键词

Navigation

Graph-based method for human-object interactions detection

Abstract

摘要

Access this article

Similar content being viewed by others

Contextual Heterogeneous Graph Network for Human-Object Interaction Detection

Human-Object Interaction Detection Based on Multi-scale Attention Fusion

DRG: Dual Relation Graph for Human-Object Interaction Detection

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Conflict of Interests

Foundation item

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

Search

Navigation