Skip to main content

Gaze Aware Deep Learning Model for Video Summarization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11165))

Abstract

Video summarization is an ideal tool for skimming videos. Previous computational models extract explicit information from the input video, such as visual appearance, motion or audio information, in order to generate informative summaries. Eye gaze information, which is an implicit clue, has proved useful for indicating important content and the viewer’s interest. In this paper, we propose a novel gaze-aware deep learning model for video summarization. In our model, the position and velocity of the observers’ raw eye movements are processed by the deep neural network to indicate the users’ preferences. Experiments on two widely used video summarization datasets show that our model is more proficient than state-of-the-art methods in summarizing video for characterizing general preferences as well as for personal preferences. The results provide an innovative and improved algorithm for using gaze information in video summarization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://fortunelords.com/youtube-statistics/.

References

  1. Chakraborty, P.R., Tjondronegoro, D., Zhang, L., Chandran, V.: Automatic identification of sports video highlights using viewer interest features. In: ICMR, pp. 55–62 (2016)

    Google Scholar 

  2. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM TIST 2(3), 1–27 (2011)

    Article  Google Scholar 

  3. Chuk, T., Chan, A., Hsiao, J.: Hidden markov model analysis reveals better eye movement strategies in face recognition. In: CogSci (2015)

    Google Scholar 

  4. Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

    Google Scholar 

  5. Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: NIPS, pp. 155–161 (1997)

    Google Scholar 

  6. Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_33

    Chapter  Google Scholar 

  7. Gygli, M., Grabner, H., Van Gool, L.: Video summarization by learning submodular mixtures of objectives. In: CVPR (2015)

    Google Scholar 

  8. Holmberg, N., Holmqvist, K., Sandberg, H.: Children’s attention to online adverts is related to low-level saliency factors and individual level of gaze control. JEMR 8(2), 1–10 (2015)

    Google Scholar 

  9. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. CoRR abs/1408.5093 (2014)

    Google Scholar 

  10. Jiang, W., Cotton, C., Loui, A.C.: Automatic consumer video summarization by audio and visual analysis. In: ICMR, pp. 1–6 (2011)

    Google Scholar 

  11. Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV, pp. 3216–3223 (2013)

    Google Scholar 

  12. Liu, Y., Zhong, S.H., Li, W.: Query-oriented multi-document summarization via unsupervised deep learning. In: AAAI, pp. 1699–1705 (2012)

    Google Scholar 

  13. Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: CVPR (2017)

    Google Scholar 

  14. Mishra, A.K., Aloimonos, Y., Cheong, L.F., Kassim, A.: Active visual segmentation. TPAMI 34(4), 639–653 (2012)

    Article  Google Scholar 

  15. Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., Hays, J.: Webgazer: Scalable webcam eye tracking using user interactions. In: IJCAI, pp. 3839–3845 (2016)

    Google Scholar 

  16. Salehin, M.M., Paul, M.: A novel framework for video summarization based on smooth pursuit information from eye tracker data. In: ICMR, pp. 692–697 (2017)

    Google Scholar 

  17. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)

    Google Scholar 

  19. Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: Tvsum: summarizing web videos using titles. In: CVPR, pp. 5179–5187 (2015)

    Google Scholar 

  20. Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM TOMM 3(1), 1–37 (2007)

    Article  Google Scholar 

  21. Wu, J., Zhong, S.H., Jiang, J., Yang, Y.: A novel clustering method for static video summarization. MTAP 76(7), 9625–9641 (2017)

    Google Scholar 

  22. Wu, J., Zhong, S.H., Ma, Z., Heinen, S.J., Jiang, J.: Foveated convolutional neural networks for video summarization. MTAP (2018)

    Google Scholar 

  23. Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: CVPR, pp. 2235–2244 (2015)

    Google Scholar 

  24. Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: CVPR, pp. 982–990 (2016)

    Google Scholar 

  25. Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with enhanced motion vector CNNs. In: CVPR, pp. 2718–2726 (2016)

    Google Scholar 

  26. Zhang, K., Chao, Wei, L., Sha, F., Grauman, K.: Summary transfer: exemplar-based subset selection for video summarization. In: CVPR (2016)

    Google Scholar 

  27. Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47

    Chapter  Google Scholar 

  28. Zhong, S.H., Liu, Y., Li, B., Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. ESWA 42(21), 8146–8155 (2015)

    Google Scholar 

  29. Zhong, S.H., Liu, Y., Liu, Y.: Bilinear deep learning for image classification. In: ACM MM, pp. 343–352 (2011)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61502311, No. 61620106008), the Natural Science Foundation of Guangdong Province (No. 2016A030310053, 2016A030310039, 2017A030310521), the Science and Technology Innovation Commission of Shenzhen under Grant (No. JCYJ2016 0422151736824), Shenzhen Emerging Industries of the Strategic Basic Research Project under Grant (No. JCYJ20160226191842793), the Shenzhen high-level overseas talents program, the Tencent ‘‘Rhinoceros Birds’’- Scientific Research Foundation for Young Teachers of Shenzhen University (2016), the National Institutes of Health Grant (5T32EY025201-03), and the Smith-Kettlewell Eye Research Institute Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, J., Zhong, Sh., Ma, Z., Heinen, S.J., Jiang, J. (2018). Gaze Aware Deep Learning Model for Video Summarization. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11165. Springer, Cham. https://doi.org/10.1007/978-3-030-00767-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00767-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00766-9

  • Online ISBN: 978-3-030-00767-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics