Viewpoint-Free Photography for Virtual Reality

Hedman, Peter

doi:10.1007/978-3-030-41816-8_6

Peter Hedman ORCID: orcid.org/0000-0002-2182-0185¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11900))

2264 Accesses

Abstract

Viewpoint-free photography, i.e., interactively controlling the viewpoint of a photograph after capture, is a central challenge for real virtual reality (VR) experiences. In this chapter, we present algorithms that enable viewpoint-free photography from casual capture, i.e., footage easily captured with consumer cameras. We build on extensive work in image-based rendering, which often focuses on full or near-interpolation, where output viewpoints lie directly between captured images, or nearby. However, for 6-DOF VR experiences, it is essential to create viewpoint-free photos with a wide field-of-view and sufficient positional freedom to cover the range of motion a user might experience.

We focus on two VR experiences:

(1)
Seated experiences, where the user can lean in different directions. Since the scene is only observed from a small range of viewpoints, we focus on easy capture—showing how to turn panorama-style capture into 3D photos, a simple representation for viewpoint-free photos, and also how to significantly speed up processing times.
(2)
Room-scale experiences, where the user can explore vastly different perspectives. This is challenging: More input footage is needed, maintaining real-time display rates becomes difficult, view-dependent appearance and object backsides need to be modelled, all while preventing noticeable mistakes. We address these challenges by: (1) creating refined geometry for each input photograph, (2) using a fast tiled rendering algorithm to achieve real-time display rates, and (3) using a convolutional neural network to hide visual mistakes during compositing.

Overall, we provide evidence that viewpoint-free photography is feasible from casual capture—for both seated and room-scale VR experiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://vis.cs.ucl.ac.uk/Download/G.Brostow/Casual3D.
2.
http://www.agisoft.com/.
3.
http://www.capturingreality.com/.
4.
See http://developer.apple.com/videos/play/wwdc2017/507 at 17:20–20:50, Slides 81–89.
5.
http://visual.cs.ucl.ac.uk/pubs/instant3d/supplemental.
6.
http://team.inria.fr/graphdeco/deep-blending, listed as “Heuristic Blending”.
7.
http://team.inria.fr/graphdeco/deep-blending.
8.
https://theta360.com.
9.
https://www.oculus.com/quest/.

References

CR-Play. http://www.cr-play.eu. Accessed 15 Oct 2016
Immersive 3D Spaces for real-world applications—Matterport. http://matterport.com/. Accessed 15 Oct 2016
Agarwal, S., Mierle, K., et al.: Ceres solver (2017). http://ceres-solver.org. Accessed 01 Oct 2018
Agarwala, A., et al.: Interactive digital photomontage. ACM Trans. Graph. 23(3), 294–302 (2004)
Article Google Scholar
Aliaga, D.G., Funkhouser, T., Yanovsky, D., Carlbom, I.: Sea of images. In: Vis, pp. 331–338. IEEE (2002)
Google Scholar
Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. 35(6), 1–13 (2016)
Google Scholar
Bako, S., et al.: Kernel-predicting convolutional networks for denoising Monte Carlo renderings. ACM Trans. Graph. 36(4), 97 (2017)
Article Google Scholar
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24:1–24:11 (2009)
Google Scholar
Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1670–1687 (2015)
Article Google Scholar
Bertel, T., Campbell, N.D.F., Richardt, C.: Megaparallax: casual \(360^{\circ }\) panoramas with motion parallax. IEEE TVCG 25, 1828–1835 (2019)
Google Scholar
Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: SIGGRAPH, pp. 425–432. ACM (2001)
Google Scholar
Chaurasia, G., Duchene, S., Sorkine-Hornung, O., Drettakis, G.: Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graph. 32(3), 30:1–30:12 (2013)
Google Scholar
Chen, S.E., Williams, L.: View interpolation for image synthesis. In: SIGGRAPH, pp. 279–288. ACM (1993)
Google Scholar
Davis, A., Levoy, M., Durand, F.: Unstructured light fields. Comp. Graph. Forum 31(2), 305–314 (2012)
Article Google Scholar
Debevec, P., Yu, Y., Borshukov, G.: Efficient view-dependent image-based rendering with projective texture-mapping. In: Drettakis, G., Max, N. (eds.) Rendering Workshop, pp. 105–116. Springer, Heidelberg (1998). https://doi.org/10.1007/978-3-7091-6453-2_10
Eisemann, M., et al.: Floating textures. Comput. Graph. Forum 27(2), 409–418 (2008)
Article Google Scholar
Firman, M., Aodha, O.M., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: CVPR, pp. 5431–5440. IEEE (2016)
Google Scholar
Fitzgibbon, A., Wexler, Y., Zisserman, A.: Image-based rendering using image-based priors. Int. J. Comput. Vis. 63(2), 141–151 (2005)
Article Google Scholar
Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: CVPR, pp. 5515–5524. IEEE, June 2016
Google Scholar
Fuhrmann, S., Goesele, M.: Floating scale surface reconstruction. ACM Trans. Graph. 33(4) (2014). Article no. 46
Google Scholar
Fuhrmann, S., Langguth, F., Goesele, M.: MVE: a multi-view reconstruction environment. In: Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage (GCH 2014), pp. 11–18 (2014)
Google Scholar
Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends. Comput. Graph. Vis. 9(1–2), 1–148 (2015)
Article Google Scholar
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)
Article Google Scholar
Garland, M., Heckbert, P.S.: Surface simplification using quadric error metrics. In: SIGGRAPH, pp. 209–216. ACM (1997)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Google Scholar
Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multi-view stereo for community photo collections. In: ICCV, pp. 1–8. IEEE (2007)
Google Scholar
Goldlücke, B., Aubry, M., Kolev, K., Cremers, D.: A super-resolution framework for high-accuracy multiview reconstruction. Int. J. Comput. Vis. 106(2), 172–191 (2014)
Article MathSciNet Google Scholar
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: SIGGRAPH, pp. 43–54. ACM (1996)
Google Scholar
Ha, H., Im, S., Park, J., Jeon, H.G., Kweon, I.S.: High-quality depth from uncalibrated small motion clip. In: CVPR (2016)
Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_1
Chapter Google Scholar
Hedman, P., Ritschel, T., Drettakis, G., Brostow, G.: Scalable inside-out image-based rendering. ACM Trans. Graph. 35(6), 231:1–231:11 (2016)
Google Scholar
Heigl, B., Koch, R., Pollefeys, M., Denzler, J., Van Gool, L.: Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In: Förstner, W., Buhmann, J.M., Faber, A., Faber, P. (eds.) Mustererkennung 1999, pp. 94–101. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-60243-6_11
Hirschmuller, H.: Stereo vision in structured environments by consistent semi-global matching. In: CVPR, pp. 2386–2393. IEEE (2006)
Google Scholar
Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)
Article Google Scholar
Im, S., Ha, H., Rameau, F., Jeon, H.G., Choe, G., Kweon, I.S.: All-around depth from small motion with a spherical panoramic camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 156–172. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_10
Ishiguro, H., Yamamoto, M., Tsuji, S.: Omni-directional stereo for making global map. In: ICCV, pp. 540–547. IEEE (1990)
Google Scholar
Jancosek, M., Pajdla, T.: Multi-view reconstruction preserving weakly-supported surfaces. In: CVPR, pp. 3121–3128 (2011)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9906, pp. 694–711. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Kalantari, N.K., Wang, T.C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)
Google Scholar
Kazhdan, M., Hoppe, H.: Screened Poisson surface reconstruction. ACM Trans. Graph. 32(3), 29:1–29:13 (2013)
Google Scholar
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4), 1–13 (2017)
Google Scholar
Konrad, R., Dansereau, D.G., Masood, A., Wetzstein, G.: SpinVR: towards live-streaming 3D virtual reality video. ACM Trans. Graph. 36(6) (2017). Article no. 209
Google Scholar
Kopf, J., Langguth, F., Scharstein, D., Szeliski, R., Goesele, M.: Image-based rendering in the gradient domain. ACM Trans. Graph. 32(6), 199:1–199:9 (2013)
Google Scholar
Kwatra, V., Schödl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: image and video synthesis using graph cuts. ACM Trans. Graph. 22(3), 277–286 (2003)
Article Google Scholar
Labatut, P., Pons, J.P., Keriven, R.: Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In: ICCV, pp. 1–8. IEEE (2007)
Google Scholar
Levoy, M., Hanrahan, P.: Light field rendering. In: SIGGRAPH, pp. 31–42. ACM (1996)
Google Scholar
Li, W., Li, B.: Joint conditional random field of multiple views with online learning for image-based rendering. In: CVPR. IEEE (2008)
Google Scholar
Lin, K., Jiang, N., Cheong, L., Do, M.N., Lu, J.: SEAGULL: seam-guided local alignment for parallax-tolerant image stitching. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9907, pp. 370–385. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_23
McMillan, L., Bishop, G.: Plenoptic modeling: an image-based rendering system. In: SIGGRAPH, pp. 39–46. ACM (1995)
Google Scholar
Michael Bleyer, C.R., Rother, C.: Patchmatch stereo - stereo matching with slanted support windows. In: BMVC, pp. 14.1–14.11 (2011)
Google Scholar
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient transfer learning. arXiv preprint arXiv:1611.06440 (2016)
Ortiz-Cayon, R., Djelouah, A., Drettakis, G.: A Bayesian approach for selective image-based rendering using superpixels. In: 3DV, pp. 469–477. IEEE (2015)
Google Scholar
Overbeck, R.S., Erickson, D., Evangelakos, D., Pharr, M., Debevec, P.: A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Trans. Graph. 37(6), 197:1–197:15 (2018)
Google Scholar
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: CVPR. IEEE (2016)
Google Scholar
Peleg, S., Ben-Ezra, M., Pritch, Y.: Omnistereo: panoramic stereo imaging. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 279–290 (2001)
Article Google Scholar
Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. ACM Trans. Graph. 36(6), 235 (2017)
Article Google Scholar
Perazzi, F., et al.: Panoramic video from unstructured camera arrays. Comput. Graph. Forum 34(2), 57–68 (2015)
Article Google Scholar
Pulli, K., Hoppe, H., Cohen, M., Shapiro, L., Duchamp, T., Stuetzle, W.: View-based rendering: visualizing real objects from scanned range and color data. In: Dorsey, J., Slusallek, P. (eds.) Rendering Techniques 1997. Eurographics, pp. 23–34. Springer, Heidelberg (1997). https://doi.org/10.1007/978-3-7091-6858-5_3
RealityCapture, C.: RealityCapture (2016). https://capturingreality.com. Accessed 01 Oct 2018
Reinhard, E., Ashikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graph. Appl. 21(5), 34–41 (2001)
Article Google Scholar
Richardt, C., Pritch, Y., Zimmer, H., Sorkine-Hornung, A.: Megastereo: constructing high-resolution stereo panoramas. In: CVPR, pp. 1256–1263 (2013)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Article Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Google Scholar
Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9907, pp. 501–518. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR. IEEE (2017)
Google Scholar
Shade, J., Gortler, S., He, L.W., Szeliski, R.: Layered depth images. In: SIGGRAPH (1998)
Google Scholar
Shum, H.Y., Chan, S.C., Kang, S.B.: Image-Based Rendering. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-32668-9
Shum, H.Y., He, L.W.: Rendering with concentric mosaics. In: SIGGRAPH, pp. 299–306. ACM (1999)
Google Scholar
Sinha, S.N., Steedly, D., Szeliski, R.: Piecewise planar stereo for image-based rendering. In: ICCV, pp. 1881–1888. IEEE (2009)
Google Scholar
Srinivasan, P.P., Wang, T., Sreelal, A., Ramamoorthi, R., Ng, R.: Learning to synthesize a 4D RGBD light field from a single image. In: ICCV, vol. 2, p. 6. IEEE (2017)
Google Scholar
Szeliski, R.: Computer Vision: Algorithms and Applications, 1st edn. Springer, New York (2010)
MATH Google Scholar
Ummenhofer, B., Brox, T.: Global, dense multiscale reconstruction for a billion points. In: ICCV (2015)
Google Scholar
Waechter, M., Moehrle, N., Goesele, M.: Let there be color! Large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-319-10602-1_54
Wolff, K., et al.: Point cloud noise and outlier removal for image-based 3D reconstruction. In: 3DV, pp. 118–127 (2016)
Google Scholar
Wood, D.N., et al.: Surface light fields for 3D photography. In: SIGGRAPH, pp. 287–296. ACM (2000)
Google Scholar
Woodford, O., Fitzgibbon, A.W.: Fast image-based rendering using hierarchical image-based priors. In: BMVC, vol. 1, pp. 260–269 (2005)
Google Scholar
Woodford, O.J., Reid, I.D., Fitzgibbon, A.W.: Efficient new-view synthesis using pairwise dictionary priors. In: CVPR, pp. 1–8. IEEE (2007)
Google Scholar
Woodford, O.J., Reid, I.D., Torr, P.H., Fitzgibbon, A.W.: Fields of experts for image-based rendering. In: BMVC, vol. 3, pp. 1109–1108 (2006)
Google Scholar
Wu, B., Zhou, Y., Qian, Y., Gong, M., Huang, H.: Full 3D reconstruction of transparent objects. ACM Trans. Graph. 37(4), 103:1–103:11 (2018)
Google Scholar
Zhang, F., Liu, F.: Parallax-tolerant image stitching. In: CVPR, pp. 3262–3269 (2014)
Google Scholar
Zheng, K.C., Kang, S.B., Cohen, M.F., Szeliski, R.: Layered depth panoramas. In: CVPR, pp. 1–8 (2007)
Google Scholar
Zhou, Q.Y., Koltun, V.: Color map optimization for 3D reconstruction with consumer depth cameras. ACM Trans. Graph. 33(4), 155:1–155:10 (2014)
Google Scholar
Zitnick, C.L., Kang, S.B.: Stereo for image-based rendering using image over-segmentation. Int. J. Comput. Vis. 75(1), 49–65 (2007)
Article Google Scholar
Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S., Szeliski, R.: High-quality video view interpolation using a layered representation. ACM Trans. Graph. 23(3), 600–608 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University College London, London, UK
Peter Hedman

Authors

Peter Hedman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Hedman .

Editor information

Editors and Affiliations

TU Braunschweig, Brunswick, Germany
Marcus Magnor
Facebook Zurich, Zürich, Switzerland
Alexander Sorkine-Hornung

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hedman, P. (2020). Viewpoint-Free Photography for Virtual Reality. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-41816-8_6
Published: 03 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41815-1
Online ISBN: 978-3-030-41816-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics