Advertisement

PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks

  • Marc Assens
  • Xavier Giro-i-NietoEmail author
  • Kevin McGuinnessEmail author
  • Noel E. O’Connor
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11133)

Abstract

We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples. A visual scanpath is defined as the sequence of fixation points over an image defined by a human observer with its gaze. PathGAN is composed of two parts, the generator and the discriminator. Both parts extract features from images using off-the-shelf networks, and train recurrent layers to generate or discriminate scanpaths accordingly. In scanpath prediction, the stochastic nature of the data makes it very difficult to generate realistic predictions using supervised learning strategies, but we adopt adversarial training as a suitable alternative. Our experiments prove how PathGAN improves the state of the art of visual scanpath prediction on the iSUN and Salient360! datasets.

Keywords

Saliency Scanpath Adversarial training GAN cGAN 

Notes

Acknowledgement

This research was partially supported by the Spanish Ministry of Economy and Competitivity and the European Regional Development Fund (ERDF) under contract TEC2016-75976-R. This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under grant number SFI/15/SIRG/3283. We acknowledge the support of NVIDIA Corporation for the donation of GPUs.

References

  1. 1.
    Porter, G., Troscianko, T., Gilchrist, I.D.: Effort during visual search and counting: insights from pupillometry. Q. J. Exp. Psychol. 60, 211–229 (2007)CrossRefGoogle Scholar
  2. 2.
    Amor, T.A., Reis, S.D., Campos, D., Herrmann, H.J., Andrade Jr., J.S.: Persistence in eye movement during visual search. Sci. Rep. 6, 20815 (2016)CrossRefGoogle Scholar
  3. 3.
    Wilming, N., et al.: An extensive dataset of eye movements during viewing of complex images. Sci. Data 4, 160126 (2017)CrossRefGoogle Scholar
  4. 4.
    Jiang, M., Huang, S., Duan, J., Zhao, Q.: SALICON: saliency in context. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1072–1080. IEEE (2015)Google Scholar
  5. 5.
    Krafka, K., et al.: Eye tracking for everyone. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  6. 6.
    Assens, M., Giro-i Nieto, X., McGuinness, K., OConnor, N.E.: SaltiNet: Scan-path prediction on 360 degree images using saliency volumes. In: 2017 IEEE International Conference on Computer Vision Workshop (ICCVW), pp. 2331–2338. IEEE (2017)Google Scholar
  7. 7.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  8. 8.
    Bylinskii, Z., Recasens, A., Borji, A., Oliva, A., Torralba, A., Durand, F.: Where should saliency models look next? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 809–824. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_49CrossRefGoogle Scholar
  9. 9.
    Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems, pp. 241–248(2008)Google Scholar
  10. 10.
    University of Nantes, Technicolor: Salient360: Visual attention modeling for 360\(^{\circ }\) images grand challenge (2017)Google Scholar
  11. 11.
    Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440 (2015)
  12. 12.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  13. 13.
    Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Neural Information Processing Systems (NIPS) (2006)Google Scholar
  14. 14.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision (ICCV) (2009)Google Scholar
  15. 15.
    Borji, A.: Boosting bottom-up and top-down visual features for saliency estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  16. 16.
    Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  17. 17.
    Pan, J., Sayrol, E., Giró-i Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  18. 18.
    Kümmerer, M., Theis, L., Bethge, M.: DeepGaze I: Boosting saliency prediction with feature maps trained on ImageNet. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  20. 20.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  21. 21.
    Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  22. 22.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  23. 23.
    Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  24. 24.
    Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: International Conference on Pattern Recognition (ICPR) (2016)Google Scholar
  25. 25.
    Riche, N.M.D., Mancas, M., Gosselin, B., Dutoit, T.: Saliency and human fixations. State-of-the-art and study comparison metrics. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  26. 26.
    Kümmerer, M., Theis, L., Bethge, M.: Information-theoretic model comparison unifies saliency metrics. Proc. Natl. Acad. Sci. (PNAS) 112(52), 16054–16059 (2015)CrossRefGoogle Scholar
  27. 27.
    Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1610.01563 (2016)
  28. 28.
    Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  29. 29.
    Rai, Y., Le Callet, P., Guillotel, P.: Which saliency weighting for omni directional image quality assessment? In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE (2017)Google Scholar
  30. 30.
    Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360 sports video. In: CVPR, vol. 1, p. 3 (2017)Google Scholar
  31. 31.
    Zhu, Y., Zhai, G., Min, X.: The prediction of head and eye movement for 360 degree images. Signal Process. Image Commun. (2018)Google Scholar
  32. 32.
    Ling, J., Zhang, K., Zhang, Y., Yang, D., Chen, Z.: A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Process. Image Commun. (2018)Google Scholar
  33. 33.
    Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)Google Scholar
  34. 34.
    Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016)
  35. 35.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  36. 36.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint (2017)Google Scholar
  37. 37.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  38. 38.
    Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 702–716. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_43CrossRefGoogle Scholar
  39. 39.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
  40. 40.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  41. 41.
    Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. Cl. Proj. Stanf. CS231N Convolutional Neural Netw. Vis. Recognit. Winter Semester 2014(5), 2 (2014)Google Scholar
  42. 42.
    Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 318–335. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_20CrossRefGoogle Scholar
  43. 43.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  44. 44.
    Xu, P., Ehinger, K.A., Zhang, Y., Finkelstein, A., Kulkarni, S.R., Xiao, J.: Turkergaze: crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755 (2015)
  45. 45.
    Jarodzka, H., Holmqvist, K., Nyström, M.: A vector-based, multidimensional scanpath similarity measure. In: Proceedings of the 2010 Symposium on Eye-tracking Research & Applications, pp. 211–218. ACM (2010)Google Scholar
  46. 46.
    Gutiérrez, J., David, E., Rai, Y., Le Callet, P.: Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360\(^{\circ }\) still images. Signal Process. Image Commun. (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Dublin City UniversityDublin 9Ireland
  2. 2.Universitat Politecnica de CatalunyaBarcelona, CataloniaSpain

Personalised recommendations