Skip to main content

Estimation of Gaze-Following Based on Transformer and the Guiding Offset

  • Conference paper
  • First Online:
Biometric Recognition (CCBR 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13628))

Included in the following conference series:

  • 1133 Accesses

Abstract

Gaze-following is a challenging task in computer vision. With the help of gaze-following, we can understand what other people are looking and predict what they might do. We propose a two-stage solution for the gaze point prediction of the target person. In the first stage, the head image and head position are fed into the gaze pathway to predict the guiding offset, then we generate the multi-scale gaze fields with the guiding offset. In the second stage, we concatenate the multi-scale gaze fields with full image and feed them into the heatmap pathway to predict a heatmap. We leverage the guiding offset to facilitate the training of gaze pathway and we add the channel attention module. We use Transformer to capture the relationship between the person and the predicted target in the heatmap pathway. Experimental results have demonstrated the effectiveness of our solution on GazeFollow dataset and DL Gaze dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? Advances in neural information processing systems, 28 (2015)

    Google Scholar 

  2. Lian, D., Yu, Z., Gao, S.: Believe it or not, we know what you are looking at! In Asian Conference on Computer Vision, pp. 35–50. Springer (2018)

    Google Scholar 

  3. Xie, S., Girshick, R., Doll´ar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)

    Google Scholar 

  4. Jeong, J.E., Choi, Y.S.: Depth-enhanced gaze following method. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing, pp. 1090–1093 (2021)

    Google Scholar 

  5. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141 (2018)

    Google Scholar 

  6. Krafka, K., et al.: Eye tracking for everyone. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2176–2184 (2016)

    Google Scholar 

  7. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113. IEEE (2009)

    Google Scholar 

  8. Marin-Jimenez, M.J., Zisserman, A., Eichner, M., Ferrari, V.: Detecting people looking at each other in videos. Int. J. Comput. Vis. 106(3), 282–296 (2014)

    Google Scholar 

  9. Jin, T., Lin, Z., Zhu, S., Wang, W., Hu, S.: Multiperson gaze-following with numerical coordinate regression. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 01–08. IEEE (2021)

    Google Scholar 

  10. Chen, W., et al.: Gaze estimation via the joint modeling of multiple cues. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1390–1402 (2021)

    Article  Google Scholar 

  11. Park, H.S., Shi, J.: Social saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4777–4785 (2015)

    Google Scholar 

  12. Leifman, G., Rudoy, D., Swedish, T., Bayro-Corrochano, E., Raskar, R.: Learning gaze transitions from depth to improve video saliency estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1698–1707 (2017)

    Google Scholar 

  13. Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1913–1921 (2015)

    Google Scholar 

  14. Pathirana, P., Senarath, S., Meedeniya, D., Jayarathna, S.: Single-user 2d gaze estimation in retail environment using deep learning. In: 2022 2nd International Conference on Advanced Research in Computing (ICARC), pp. 206–211. IEEE (2022)

    Google Scholar 

  15. Peng, Z., et al.: Conformer: local features coupling global representations for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 367–376 (2021)

    Google Scholar 

  16. Lin, T.-Y., Doll´ar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  17. MohebAli, R., Toroghi, R.M., Zareian, H.: Human action recognition using attention mechanism and gaze information. In: Mediterranean Conference on Pattern Recognition and Artificial Intelligence, pp. 3–17. Springer (2022)

    Google Scholar 

  18. Kümmerer, M., Theis, L., Bethge, M.: Deep gaze i: boosting saliency prediction with feature maps trained on imagenet. arXiv preprint arXiv:1411.1045 (2014)

  19. Recasens, A., Vondrick, C., Khosla, A., Torralba, A.: Following gaze in video. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1435–1443 (2017)

    Google Scholar 

  20. Dai, L., Liu, J., Zhaojie, J.: Binocular feature fusion and spatial attention mechanism based gaze tracking. IEEE Trans. Hum.-Mach. Syst. 52(2), 302–311 (2022)

    Article  Google Scholar 

  21. Tu, D., Min, X., Duan, H., Guo, G., Zhai, G., Shen, W.: End-to-end human-gaze-target detection with transformers. arXiv preprint arXiv:2203.10433 (2022)

Download references

Acknowledgments

This work is supported by the General Programmer of the National Natural Science Foundation of China (61976078, 62202139), the National Key R&D Programme of China (2019YFA0706203) and the Anhui Provincial Natural Science Foundation (2208085QF191).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gao, S., Sun, X., Li, J. (2022). Estimation of Gaze-Following Based on Transformer and the Guiding Offset. In: Deng, W., et al. Biometric Recognition. CCBR 2022. Lecture Notes in Computer Science, vol 13628. Springer, Cham. https://doi.org/10.1007/978-3-031-20233-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20233-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20232-2

  • Online ISBN: 978-3-031-20233-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics