Detecting Snap Points in Egocentric Video with a Web Photo Prior

Xiong, Bo; Grauman, Kristen

doi:10.1007/978-3-319-10602-1_19

Bo Xiong¹⁹ &
Kristen Grauman¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8693))

Included in the following conference series:

European Conference on Computer Vision

22k Accesses
44 Citations

Abstract

Wearable cameras capture a first-person view of the world, and offer a hands-free way to record daily experiences or special events. Yet, not every frame is worthy of being captured and stored. We propose to automatically predict “snap points” in unedited egocentric video—that is, those frames that look like they could have been intentionally taken photos. We develop a generative model for snap points that relies on a Web photo prior together with domain-adapted features. Critically, our approach avoids strong assumptions about the particular content of snap points, focusing instead on their composition. Using 17 hours of egocentric video from both human and mobile robot camera wearers, we show that the approach accurately isolates those frames that human judges would believe to be intentionally snapped photos. In addition, we demonstrate the utility of snap point detection for improving object detection and keyframe selection in egocentric video.

Download to read the full chapter text

Chapter PDF

Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior

Salient Montages from Unconstrained Videos

Predicting Important Objects for Egocentric Video Summarization

Article 07 January 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chen, C.-Y., Grauman, K.: Clues from the Beaten Path: Location Estimation with Bursty Sequences of Tourist Photos. In: CVPR (2011)
Google Scholar
Crete-Roffet, F., Dolmiere, T., Ladret, P., Nicolas, M.: The blur effect: Perception and estimation with a new no-reference perceptual blur metric. In: SPIE (2007)
Google Scholar
Dhar, S., Ordonez, V., Berg, T.L.: High level describable attributes for predicting aesthetics and interestingness. In: CVPR (2011)
Google Scholar
Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV (2003)
Google Scholar
Fathi, A., Farhadi, A., Rehg, J.: Understanding Egocentric Activities. In: ICCV (2011)
Google Scholar
Fathi, A., Hodgins, J., Rehg, J.: Social interactions: a first-person perspective. In: CVPR (2012)
Google Scholar
Fathi, A., Rehg, J.: Modeling actions through state changes. In: CVPR (2013)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI 32(9) (2010)
Google Scholar
Fiss, J., Agarwala, A., Curless, B.: Candid portrait selection from video. In: TOG (2011)
Google Scholar
Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: CVPR (2012)
Google Scholar
Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Van Gool, L.: The interestingness of images. In: ICCV (2013)
Google Scholar
Hays, J., Efros, A.: im2gps: estimating geographic information from a single image. In: CVPR (2008)
Google Scholar
Healey, J., Picard, R.: Startlecam: a cybernetic wearable camera. In: Wearable Computers (1998)
Google Scholar
Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: A retrospective memory aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006)
Chapter Google Scholar
Hoiem, D., Efros, A., Hebert, M.: Recovering surface layout from an image. IJCV (2007)
Google Scholar
Isola, P., Xiao, J., Torralba, A., Oliva, A.: What makes an image memorable? In: CVPR (2011)
Google Scholar
Kalogerakis, E., Vesselova, O., Hays, J., Efros, A., Hertzmann, A.: Image sequence geolocation with human travel priors. In: ICCV (2009)
Google Scholar
Ke, Y., Tang, X., Jing, F.: The design of high-level features for photo quality assessment. In: CVPR (2006)
Google Scholar
Khosla, A., Hamid, R., Lin, C.-J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)
Google Scholar
Kim, G., Xing, E.: Jointly aligning and segmenting multiple web photo streams for the inference of collective photo storylines. In: CVPR (2013)
Google Scholar
Kitani, K., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports videos. In: CVPR (2011)
Google Scholar
Kǒsecká, J., Zhang, W.: Video compass. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 476–490. Springer, Heidelberg (2002)
Chapter Google Scholar
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)
Google Scholar
Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.-M.: Modeling and recognition of landmark image collections using iconic scene graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008)
Chapter Google Scholar
Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: Nonparametric scene parsing: label transfer via dense scene alignment. In: CVPR (2009)
Google Scholar
Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. PAMI 32(12), 2178–2190 (2010)
Article Google Scholar
Liu, T., Kender, J.R.: Optimization algorithms for the selection of key frame sequences of variable length. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 403–417. Springer, Heidelberg (2002)
Chapter Google Scholar
Liu, T., Sun, J., Zheng, N., Tang, X., Shum, H.: Learning to detect a salient object. In: CVPR (2007)
Google Scholar
Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)
Google Scholar
Mann, S.: Wearcam (the wearable camera): Personal imaging systems for long term use in wearable tetherless computer mediated reality and personal photo/videographic memory prosthesis. In: Wearable Computers (1998)
Google Scholar
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)
Google Scholar
Ren, X., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. In: CVPR (2010)
Google Scholar
Ryoo, M., Matthies, L.: First-person activity recognition: What are they doing to me? In: CVPR (2013)
Google Scholar
Shakhnarovich, G., Viola, P., Darrell, T.: Fast Pose Estimation with Parameter-Sensitive Hashing. In: ICCV (2003)
Google Scholar
Simon, I., Seitz, S.: Scene segmentation using the wisdom of crowds. In: ECCV (2008)
Google Scholar
Spriggs, E., De la Torre, F., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: Workshop on Egocentric Vision, CVPR (2009)
Google Scholar
Starner, T., Schiele, B., Pentland, A.: Visual contextual awareness in wearable computing. In: Intl. Symp. on Wearable Comp. (1998)
Google Scholar
Torralba, A., Efros, A.: Unbiased look at dataset bias. In: CVPR (2011)
Google Scholar
Torralba, A., Fergus, R., Freeman, W.T.: 80 million Tiny Images: a Large Dataset for Non-Parametric Object and Scene Recognition. PAMI 30(11), 1958–1970 (2008)
Article Google Scholar
Weyand, T., Leibe, B.: Discovering favorite views of popular places with iconoid shift. In: ICCV (2011)
Google Scholar
Xiao, J., Hays, J., Ehinger, K., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: CVPR (2010)
Google Scholar
Xiao, J.: Princeton vision toolkit (2013), http://vision.princeton.edu/code.html

Download references

Author information

Authors and Affiliations

University of Texas at Austin, USA
Bo Xiong & Kristen Grauman

Authors

Bo Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Kristen Grauman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiong, B., Grauman, K. (2014). Detecting Snap Points in Egocentric Video with a Web Photo Prior. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-10602-1_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detecting Snap Points in Egocentric Video with a Web Photo Prior

Abstract

Chapter PDF

Similar content being viewed by others

Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior

Salient Montages from Unconstrained Videos

Predicting Important Objects for Egocentric Video Summarization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Detecting Snap Points in Egocentric Video with a Web Photo Prior

Abstract

Chapter PDF

Similar content being viewed by others

Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior

Salient Montages from Unconstrained Videos

Predicting Important Objects for Egocentric Video Summarization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation