Detecting Snap Points in Egocentric Video with a Web Photo Prior

  • Bo Xiong
  • Kristen Grauman
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8693)


Wearable cameras capture a first-person view of the world, and offer a hands-free way to record daily experiences or special events. Yet, not every frame is worthy of being captured and stored. We propose to automatically predict “snap points” in unedited egocentric video—that is, those frames that look like they could have been intentionally taken photos. We develop a generative model for snap points that relies on a Web photo prior together with domain-adapted features. Critically, our approach avoids strong assumptions about the particular content of snap points, focusing instead on their composition. Using 17 hours of egocentric video from both human and mobile robot camera wearers, we show that the approach accurately isolates those frames that human judges would believe to be intentionally snapped photos. In addition, we demonstrate the utility of snap point detection for improving object detection and keyframe selection in egocentric video.


Ground Truth Object Detection Salient Object Label Image Video Summarization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Chen, C.-Y., Grauman, K.: Clues from the Beaten Path: Location Estimation with Bursty Sequences of Tourist Photos. In: CVPR (2011)Google Scholar
  2. 2.
    Crete-Roffet, F., Dolmiere, T., Ladret, P., Nicolas, M.: The blur effect: Perception and estimation with a new no-reference perceptual blur metric. In: SPIE (2007)Google Scholar
  3. 3.
    Dhar, S., Ordonez, V., Berg, T.L.: High level describable attributes for predicting aesthetics and interestingness. In: CVPR (2011)Google Scholar
  4. 4.
    Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV (2003)Google Scholar
  5. 5.
    Fathi, A., Farhadi, A., Rehg, J.: Understanding Egocentric Activities. In: ICCV (2011)Google Scholar
  6. 6.
    Fathi, A., Hodgins, J., Rehg, J.: Social interactions: a first-person perspective. In: CVPR (2012)Google Scholar
  7. 7.
    Fathi, A., Rehg, J.: Modeling actions through state changes. In: CVPR (2013)Google Scholar
  8. 8.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI 32(9) (2010)Google Scholar
  9. 9.
    Fiss, J., Agarwala, A., Curless, B.: Candid portrait selection from video. In: TOG (2011)Google Scholar
  10. 10.
    Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR (2012)Google Scholar
  11. 11.
    Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Van Gool, L.: The interestingness of images. In: ICCV (2013)Google Scholar
  12. 12.
    Hays, J., Efros, A.: im2gps: estimating geographic information from a single image. In: CVPR (2008)Google Scholar
  13. 13.
    Healey, J., Picard, R.: Startlecam: a cybernetic wearable camera. In: Wearable Computers (1998)Google Scholar
  14. 14.
    Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: A retrospective memory aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Hoiem, D., Efros, A., Hebert, M.: Recovering surface layout from an image. IJCV (2007)Google Scholar
  16. 16.
    Isola, P., Xiao, J., Torralba, A., Oliva, A.: What makes an image memorable? In: CVPR (2011)Google Scholar
  17. 17.
    Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., Hertzmann, A.: Image sequence geolocation with human travel priors. In: ICCV (2009)Google Scholar
  18. 18.
    Ke, Y., Tang, X., Jing, F.: The design of high-level features for photo quality assessment. In: CVPR (2006)Google Scholar
  19. 19.
    Khosla, A., Hamid, R., Lin, C.-J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)Google Scholar
  20. 20.
    Kim, G., Xing, E.: Jointly aligning and segmenting multiple web photo streams for the inference of collective photo storylines. In: CVPR (2013)Google Scholar
  21. 21.
    Kitani, K., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports videos. In: CVPR (2011)Google Scholar
  22. 22.
    Kǒsecká, J., Zhang, W.: Video compass. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 476–490. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  23. 23.
    Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)Google Scholar
  24. 24.
    Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M.: Modeling and recognition of landmark image collections using iconic scene graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  25. 25.
    Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)Google Scholar
  26. 26.
    Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009)Google Scholar
  27. 27.
    Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. PAMI 32(12), 2178–2190 (2010)CrossRefGoogle Scholar
  28. 28.
    Liu, T., Kender, J.R.: Optimization algorithms for the selection of key frame sequences of variable length. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 403–417. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  29. 29.
    Liu, T., Sun, J., Zheng, N., Tang, X., Shum, H.: Learning to detect a salient object. In: CVPR (2007)Google Scholar
  30. 30.
    Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)Google Scholar
  31. 31.
    Mann, S.: Wearcam (the wearable camera): Personal imaging systems for long term use in wearable tetherless computer mediated reality and personal photo/videographic memory prosthesis. In: Wearable Computers (1998)Google Scholar
  32. 32.
    Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)Google Scholar
  33. 33.
    Ren, X., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. In: CVPR (2010)Google Scholar
  34. 34.
    Ryoo, M., Matthies, L.: First-person activity recognition: What are they doing to me? In: CVPR (2013)Google Scholar
  35. 35.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast Pose Estimation with Parameter-Sensitive Hashing. In: ICCV (2003)Google Scholar
  36. 36.
    Simon, I., Seitz, S.: Scene segmentation using the wisdom of crowds. In: ECCV (2008)Google Scholar
  37. 37.
    Spriggs, E., De la Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: Workshop on Egocentric Vision, CVPR (2009)Google Scholar
  38. 38.
    Starner, T., Schiele, B., Pentland, A.: Visual contextual awareness in wearable computing. In: Intl. Symp. on Wearable Comp. (1998)Google Scholar
  39. 39.
    Torralba, A., Efros, A.: Unbiased look at dataset bias. In: CVPR (2011)Google Scholar
  40. 40.
    Torralba, A., Fergus, R., Freeman, W.T.: 80 million Tiny Images: a Large Dataset for Non-Parametric Object and Scene Recognition. PAMI 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  41. 41.
    Weyand, T., Leibe, B.: Discovering favorite views of popular places with iconoid shift. In: ICCV (2011)Google Scholar
  42. 42.
    Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: CVPR (2010)Google Scholar
  43. 43.
    Xiao, J.: Princeton vision toolkit (2013),

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Bo Xiong
    • 1
  • Kristen Grauman
    • 1
  1. 1.University of Texas at AustinUSA

Personalised recommendations