Salient Montages from Unconstrained Videos

Sun, Min; Farhadi, Ali; Taskar, Ben; Seitz, Steve

doi:10.1007/978-3-319-10584-0_31

Min Sun¹⁹,
Ali Farhadi¹⁹,
Ben Taskar¹⁹ &
…
Steve Seitz¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8695))

Included in the following conference series:

European Conference on Computer Vision

15k Accesses
17 Citations

Abstract

We present a novel method to generate salient montages from unconstrained videos, by finding “montageable moments” and identifying the salient people and actions to depict in each montage. Our method addresses the need for generating concise visualizations from the increasingly large number of videos being captured from portable devices. Our main contributions are (1) the process of finding salient people and moments to form a montage, and (2) the application of this method to videos taken “in the wild” where the camera moves freely. As such, we demonstrate results on head-mounted cameras, where the camera moves constantly, as well as on videos downloaded from YouTube. Our approach can operate on videos of any length; some will contain many montageable moments, while others may have none. We demonstrate that a novel “montageability” score can be used to retrieve results with relatively high precision which allows us to present high quality montages to users.

Download to read the full chapter text

Chapter PDF

Show me where the action is!

Article Open access 02 September 2020

Timothy Callemein, Tom Roussel, … Toon Goedemé

Detecting Snap Points in Egocentric Video with a Web Photo Prior

Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior

Keywords

References

Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, B., Salesin, D., Cohen, M.: Interactive digital photomontage. In: ACM SIGGRAPH (2004)
Google Scholar
Aner, A., Kender, J.R.: Video summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 388–402. Springer, Heidelberg (2002)
Chapter Google Scholar
Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognition 13(2), 111–122 (1981)
Article MATH Google Scholar
Borgo, R., Chen, M., Daubney, B., Grundy, E., Heidemann, G., Hoferlin, B., Hoferlin, M., Janicke, H., Weiskopf, D., Xie, X.: A survey on video-based graphics and video visualization. In: EUROGRAPHICS (2011)
Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3D human pose annotations. In: ICCV (2009)
Google Scholar
Chen, S., Fern, A., Todorovic, S.: Multi-object tracking via constrained sequential labeling. In: CVPR (2014)
Google Scholar
Cremonesi, P., Koren, Y., Turrin, R.: Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 39–46. ACM, New York (2010)
Chapter Google Scholar
Cui, X., Liu, Q., Metaxas, D.: Temporal spectral residual: fast motion saliency detection. ACM Multimedia (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012)
Chapter Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D.: Discriminatively trained deformable part models, release 5, http://www.cs.berkeley.edu/~rbg/latent/voc-release5.tgz
Fragkiadaki, K., Zhang, W., Zhang, G., Shi, J.: Two-granularity tracking: Mediating trajectory and detection graphs for tracking under occlusions. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 552–565. Springer, Heidelberg (2012)
Chapter Google Scholar
Goferman, S., Zelnik-Manor, L., Tal, A.: Contextaware saliency detection. TPAMI (2012)
Google Scholar
Goldman, D., Curless, B., Salesin, D., Seitz, S.: Schematic storyboarding for video visualization and editing. In: SIGGRAPH (2006)
Google Scholar
Gong, Y., Liu, X.: Video summarization using singular value decomposition. In: CVPR (2000)
Google Scholar
Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR (2008)
Google Scholar
Irani, M., Anandan, P., Hsu, S.: Mosaic-based representations of video sequences and their applications. In: ICCV (1995)
Google Scholar
Joshi, N., Metha, S., Drucker, S., Stollnitz, E., Hoppe, H., Uyttendaele, M., Cohen, M.F.: Cliplets: Juxtaposing still and dynamic imagery. In: UIST (2012)
Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: ICCV (2009)
Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. TPAMI (2011)
Google Scholar
Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)
Google Scholar
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)
Google Scholar
Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)
Google Scholar
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: Dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008)
Chapter Google Scholar
Liu, D., Hua, G., Chen, T.: A hierarchical visual model for video object summarization. TPAMI (2010)
Google Scholar
Liu, F., Hen Hu, Y., Gleicher, M.: Discovering panoramas in web video. ACM Multimedia (2008)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
Google Scholar
Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)
Google Scholar
Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Imaging Understanding Workshop (1981)
Google Scholar
Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. TPAMI (2010)
Google Scholar
Massey, M., Bender, W.: Salient stills: Process and practice. IBM Systems Journal 35(3&4), 557–574 (1996)
Article Google Scholar
Milan, A., Schindler, K., Roth, S.: Detection- and trajectory-level exclusion in multiple object tracking. In: CVPR (2013)
Google Scholar
Ngo, C., Ma, Y., Zhan, H.: Video summarization and scene detection by graph modeling. In: CSVT (2005)
Google Scholar
Perazzi, F., Krahenbuhl, P., Pritch, Y., Hornung, A.: Saliency filters: Contrast based filtering for salient region detection. In: CVPR (2012)
Google Scholar
Pirsiavash, H., Ramanan, D., Fowlkes, C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR (2011)
Google Scholar
Pritch, Y., Rav-Acha, A., Gutman, A., Peleg, S.: Webcam synopsis: Peeking around the world. In: ICCV (2007)
Google Scholar
Rav-Acha, A., Pritch, Y., Peleg, S.: Making a long video short. In: CVPR (2006)
Google Scholar
Rudoy, D., Goldman, D.B., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013)
Google Scholar
Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. Journal of Vision (2009)
Google Scholar
Sun, M., Farhadi, A., Seitz, S.: Technical report of salient montage from unconstrained videos, http://homes.cs.washington.edu/~sunmin/projects/at-a-glace/
Sunkavalli, K., Joshi, N., Kang, S.B., Cohen, M.F., Pfister, H.: Video snapshots: Creating high-quality images from video clips. IEEE Transactions on Visualization and Computer Graphics 18(11), 1868–1879 (2012)
Article Google Scholar
Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. IJCV, 1–21
Google Scholar
Yang, B., Nevatia, R.: An online learned crf model for multi-target tracking. In: CVPR (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Washington, Seattle, WA, USA
Min Sun, Ali Farhadi, Ben Taskar & Steve Seitz

Authors

Min Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ali Farhadi
View author publications
You can also search for this author in PubMed Google Scholar
Ben Taskar
View author publications
You can also search for this author in PubMed Google Scholar
Steve Seitz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
PSI, iMinds, KU Leuven, ESAT, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, M., Farhadi, A., Taskar, B., Seitz, S. (2014). Salient Montages from Unconstrained Videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8695. Springer, Cham. https://doi.org/10.1007/978-3-319-10584-0_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-10584-0_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10583-3
Online ISBN: 978-3-319-10584-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Salient Montages from Unconstrained Videos

Abstract

Chapter PDF

Similar content being viewed by others

Show me where the action is!

Detecting Snap Points in Egocentric Video with a Web Photo Prior

Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Salient Montages from Unconstrained Videos

Abstract

Chapter PDF

Similar content being viewed by others

Show me where the action is!

Detecting Snap Points in Egocentric Video with a Web Photo Prior

Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation