Estimation of Gaze-Following Based on Transformer and the Guiding Offset

Gao, Sheng; Sun, Xiao; Li, Jia

doi:10.1007/978-3-031-20233-9_16

Sheng Gao¹⁵,
Xiao Sun¹⁵ &
Jia Li¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13628))

Included in the following conference series:

Chinese Conference on Biometric Recognition

1133 Accesses

Abstract

Gaze-following is a challenging task in computer vision. With the help of gaze-following, we can understand what other people are looking and predict what they might do. We propose a two-stage solution for the gaze point prediction of the target person. In the first stage, the head image and head position are fed into the gaze pathway to predict the guiding offset, then we generate the multi-scale gaze fields with the guiding offset. In the second stage, we concatenate the multi-scale gaze fields with full image and feed them into the heatmap pathway to predict a heatmap. We leverage the guiding offset to facilitate the training of gaze pathway and we add the channel attention module. We use Transformer to capture the relationship between the person and the predicted target in the heatmap pathway. Experimental results have demonstrated the effectiveness of our solution on GazeFollow dataset and DL Gaze dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? Advances in neural information processing systems, 28 (2015)
Google Scholar
Lian, D., Yu, Z., Gao, S.: Believe it or not, we know what you are looking at! In Asian Conference on Computer Vision, pp. 35–50. Springer (2018)
Google Scholar
Xie, S., Girshick, R., Doll´ar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Jeong, J.E., Choi, Y.S.: Depth-enhanced gaze following method. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing, pp. 1090–1093 (2021)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)
Google Scholar
Krafka, K., et al.: Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)
Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113. IEEE (2009)
Google Scholar
Marin-Jimenez, M.J., Zisserman, A., Eichner, M., Ferrari, V.: Detecting people looking at each other in videos. Int. J. Comput. Vis. 106(3), 282–296 (2014)
Google Scholar
Jin, T., Lin, Z., Zhu, S., Wang, W., Hu, S.: Multiperson gaze-following with numerical coordinate regression. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 01–08. IEEE (2021)
Google Scholar
Chen, W., et al.: Gaze estimation via the joint modeling of multiple cues. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1390–1402 (2021)
Article Google Scholar
Park, H.S., Shi, J.: Social saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4777–4785 (2015)
Google Scholar
Leifman, G., Rudoy, D., Swedish, T., Bayro-Corrochano, E., Raskar, R.: Learning gaze transitions from depth to improve video saliency estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1698–1707 (2017)
Google Scholar
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1913–1921 (2015)
Google Scholar
Pathirana, P., Senarath, S., Meedeniya, D., Jayarathna, S.: Single-user 2d gaze estimation in retail environment using deep learning. In: 2022 2nd International Conference on Advanced Research in Computing (ICARC), pp. 206–211. IEEE (2022)
Google Scholar
Peng, Z., et al.: Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 367–376 (2021)
Google Scholar
Lin, T.-Y., Doll´ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
MohebAli, R., Toroghi, R.M., Zareian, H.: Human action recognition using attention mechanism and gaze information. In: Mediterranean Conference on Pattern Recognition and Artificial Intelligence, pp. 3–17. Springer (2022)
Google Scholar
Kümmerer, M., Theis, L., Bethge, M.: Deep gaze i: boosting saliency prediction with feature maps trained on imagenet. arXiv preprint arXiv:1411.1045 (2014)
Recasens, A., Vondrick, C., Khosla, A., Torralba, A.: Following gaze in video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1435–1443 (2017)
Google Scholar
Dai, L., Liu, J., Zhaojie, J.: Binocular feature fusion and spatial attention mechanism based gaze tracking. IEEE Trans. Hum.-Mach. Syst. 52(2), 302–311 (2022)
Article Google Scholar
Tu, D., Min, X., Duan, H., Guo, G., Zhai, G., Shen, W.: End-to-end human-gaze-target detection with transformers. arXiv preprint arXiv:2203.10433 (2022)

Download references

Acknowledgments

This work is supported by the General Programmer of the National Natural Science Foundation of China (61976078, 62202139), the National Key R&D Programme of China (2019YFA0706203) and the Anhui Provincial Natural Science Foundation (2208085QF191).

Author information

Authors and Affiliations

Hefei University of Technology, Hefei, China
Sheng Gao, Xiao Sun & Jia Li

Authors

Sheng Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jia Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Sun .

Editor information

Editors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Weihong Deng
Tsinghua University, Beijing, China
Jianjiang Feng
Beihang University, Beijing, China
Di Huang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Meina Kan
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Tsinghua University, Beijing, China
Fang Zheng
China Electronics Standardization Institute, Beijing, China
Wenfeng Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaofeng He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, S., Sun, X., Li, J. (2022). Estimation of Gaze-Following Based on Transformer and the Guiding Offset. In: Deng, W., et al. Biometric Recognition. CCBR 2022. Lecture Notes in Computer Science, vol 13628. Springer, Cham. https://doi.org/10.1007/978-3-031-20233-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-20233-9_16
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20232-2
Online ISBN: 978-3-031-20233-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics