Abstract
Viewpoint-free photography, i.e., interactively controlling the viewpoint of a photograph after capture, is a central challenge for real virtual reality (VR) experiences. In this chapter, we present algorithms that enable viewpoint-free photography from casual capture, i.e., footage easily captured with consumer cameras. We build on extensive work in image-based rendering, which often focuses on full or near-interpolation, where output viewpoints lie directly between captured images, or nearby. However, for 6-DOF VR experiences, it is essential to create viewpoint-free photos with a wide field-of-view and sufficient positional freedom to cover the range of motion a user might experience.
We focus on two VR experiences:
-
(1)
Seated experiences, where the user can lean in different directions. Since the scene is only observed from a small range of viewpoints, we focus on easy capture—showing how to turn panorama-style capture into 3D photos, a simple representation for viewpoint-free photos, and also how to significantly speed up processing times.
-
(2)
Room-scale experiences, where the user can explore vastly different perspectives. This is challenging: More input footage is needed, maintaining real-time display rates becomes difficult, view-dependent appearance and object backsides need to be modelled, all while preventing noticeable mistakes. We address these challenges by: (1) creating refined geometry for each input photograph, (2) using a fast tiled rendering algorithm to achieve real-time display rates, and (3) using a convolutional neural network to hide visual mistakes during compositing.
Overall, we provide evidence that viewpoint-free photography is feasible from casual capture—for both seated and room-scale VR experiences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
See http://developer.apple.com/videos/play/wwdc2017/507 at 17:20–20:50, Slides 81–89.
- 5.
- 6.
http://team.inria.fr/graphdeco/deep-blending, listed as “Heuristic Blending”.
- 7.
- 8.
- 9.
References
CR-Play. http://www.cr-play.eu. Accessed 15 Oct 2016
Immersive 3D Spaces for real-world applications—Matterport. http://matterport.com/. Accessed 15 Oct 2016
Agarwal, S., Mierle, K., et al.: Ceres solver (2017). http://ceres-solver.org. Accessed 01 Oct 2018
Agarwala, A., et al.: Interactive digital photomontage. ACM Trans. Graph. 23(3), 294–302 (2004)
Aliaga, D.G., Funkhouser, T., Yanovsky, D., Carlbom, I.: Sea of images. In: Vis, pp. 331–338. IEEE (2002)
Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. 35(6), 1–13 (2016)
Bako, S., et al.: Kernel-predicting convolutional networks for denoising Monte Carlo renderings. ACM Trans. Graph. 36(4), 97 (2017)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24:1–24:11 (2009)
Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1670–1687 (2015)
Bertel, T., Campbell, N.D.F., Richardt, C.: Megaparallax: casual \(360^{\circ }\) panoramas with motion parallax. IEEE TVCG 25, 1828–1835 (2019)
Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: SIGGRAPH, pp. 425–432. ACM (2001)
Chaurasia, G., Duchene, S., Sorkine-Hornung, O., Drettakis, G.: Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graph. 32(3), 30:1–30:12 (2013)
Chen, S.E., Williams, L.: View interpolation for image synthesis. In: SIGGRAPH, pp. 279–288. ACM (1993)
Davis, A., Levoy, M., Durand, F.: Unstructured light fields. Comp. Graph. Forum 31(2), 305–314 (2012)
Debevec, P., Yu, Y., Borshukov, G.: Efficient view-dependent image-based rendering with projective texture-mapping. In: Drettakis, G., Max, N. (eds.) Rendering Workshop, pp. 105–116. Springer, Heidelberg (1998). https://doi.org/10.1007/978-3-7091-6453-2_10
Eisemann, M., et al.: Floating textures. Comput. Graph. Forum 27(2), 409–418 (2008)
Firman, M., Aodha, O.M., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: CVPR, pp. 5431–5440. IEEE (2016)
Fitzgibbon, A., Wexler, Y., Zisserman, A.: Image-based rendering using image-based priors. Int. J. Comput. Vis. 63(2), 141–151 (2005)
Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: CVPR, pp. 5515–5524. IEEE, June 2016
Fuhrmann, S., Goesele, M.: Floating scale surface reconstruction. ACM Trans. Graph. 33(4) (2014). Article no. 46
Fuhrmann, S., Langguth, F., Goesele, M.: MVE: a multi-view reconstruction environment. In: Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage (GCH 2014), pp. 11–18 (2014)
Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends. Comput. Graph. Vis. 9(1–2), 1–148 (2015)
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)
Garland, M., Heckbert, P.S.: Surface simplification using quadric error metrics. In: SIGGRAPH, pp. 209–216. ACM (1997)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multi-view stereo for community photo collections. In: ICCV, pp. 1–8. IEEE (2007)
Goldlücke, B., Aubry, M., Kolev, K., Cremers, D.: A super-resolution framework for high-accuracy multiview reconstruction. Int. J. Comput. Vis. 106(2), 172–191 (2014)
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: SIGGRAPH, pp. 43–54. ACM (1996)
Ha, H., Im, S., Park, J., Jeon, H.G., Kweon, I.S.: High-quality depth from uncalibrated small motion clip. In: CVPR (2016)
He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_1
Hedman, P., Ritschel, T., Drettakis, G., Brostow, G.: Scalable inside-out image-based rendering. ACM Trans. Graph. 35(6), 231:1–231:11 (2016)
Heigl, B., Koch, R., Pollefeys, M., Denzler, J., Van Gool, L.: Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In: Förstner, W., Buhmann, J.M., Faber, A., Faber, P. (eds.) Mustererkennung 1999, pp. 94–101. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-60243-6_11
Hirschmuller, H.: Stereo vision in structured environments by consistent semi-global matching. In: CVPR, pp. 2386–2393. IEEE (2006)
Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)
Im, S., Ha, H., Rameau, F., Jeon, H.G., Choe, G., Kweon, I.S.: All-around depth from small motion with a spherical panoramic camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 156–172. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_10
Ishiguro, H., Yamamoto, M., Tsuji, S.: Omni-directional stereo for making global map. In: ICCV, pp. 540–547. IEEE (1990)
Jancosek, M., Pajdla, T.: Multi-view reconstruction preserving weakly-supported surfaces. In: CVPR, pp. 3121–3128 (2011)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9906, pp. 694–711. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Kalantari, N.K., Wang, T.C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)
Kazhdan, M., Hoppe, H.: Screened Poisson surface reconstruction. ACM Trans. Graph. 32(3), 29:1–29:13 (2013)
Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4), 1–13 (2017)
Konrad, R., Dansereau, D.G., Masood, A., Wetzstein, G.: SpinVR: towards live-streaming 3D virtual reality video. ACM Trans. Graph. 36(6) (2017). Article no. 209
Kopf, J., Langguth, F., Scharstein, D., Szeliski, R., Goesele, M.: Image-based rendering in the gradient domain. ACM Trans. Graph. 32(6), 199:1–199:9 (2013)
Kwatra, V., Schödl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: image and video synthesis using graph cuts. ACM Trans. Graph. 22(3), 277–286 (2003)
Labatut, P., Pons, J.P., Keriven, R.: Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In: ICCV, pp. 1–8. IEEE (2007)
Levoy, M., Hanrahan, P.: Light field rendering. In: SIGGRAPH, pp. 31–42. ACM (1996)
Li, W., Li, B.: Joint conditional random field of multiple views with online learning for image-based rendering. In: CVPR. IEEE (2008)
Lin, K., Jiang, N., Cheong, L., Do, M.N., Lu, J.: SEAGULL: seam-guided local alignment for parallax-tolerant image stitching. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9907, pp. 370–385. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_23
McMillan, L., Bishop, G.: Plenoptic modeling: an image-based rendering system. In: SIGGRAPH, pp. 39–46. ACM (1995)
Michael Bleyer, C.R., Rother, C.: Patchmatch stereo - stereo matching with slanted support windows. In: BMVC, pp. 14.1–14.11 (2011)
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient transfer learning. arXiv preprint arXiv:1611.06440 (2016)
Ortiz-Cayon, R., Djelouah, A., Drettakis, G.: A Bayesian approach for selective image-based rendering using superpixels. In: 3DV, pp. 469–477. IEEE (2015)
Overbeck, R.S., Erickson, D., Evangelakos, D., Pharr, M., Debevec, P.: A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Trans. Graph. 37(6), 197:1–197:15 (2018)
Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: CVPR. IEEE (2016)
Peleg, S., Ben-Ezra, M., Pritch, Y.: Omnistereo: panoramic stereo imaging. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 279–290 (2001)
Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. ACM Trans. Graph. 36(6), 235 (2017)
Perazzi, F., et al.: Panoramic video from unstructured camera arrays. Comput. Graph. Forum 34(2), 57–68 (2015)
Pulli, K., Hoppe, H., Cohen, M., Shapiro, L., Duchamp, T., Stuetzle, W.: View-based rendering: visualizing real objects from scanned range and color data. In: Dorsey, J., Slusallek, P. (eds.) Rendering Techniques 1997. Eurographics, pp. 23–34. Springer, Heidelberg (1997). https://doi.org/10.1007/978-3-7091-6858-5_3
RealityCapture, C.: RealityCapture (2016). https://capturingreality.com. Accessed 01 Oct 2018
Reinhard, E., Ashikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graph. Appl. 21(5), 34–41 (2001)
Richardt, C., Pritch, Y., Zimmer, H., Sorkine-Hornung, A.: Megastereo: constructing high-resolution stereo panoramas. In: CVPR, pp. 1256–1263 (2013)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9907, pp. 501–518. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_31
Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR. IEEE (2017)
Shade, J., Gortler, S., He, L.W., Szeliski, R.: Layered depth images. In: SIGGRAPH (1998)
Shum, H.Y., Chan, S.C., Kang, S.B.: Image-Based Rendering. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-32668-9
Shum, H.Y., He, L.W.: Rendering with concentric mosaics. In: SIGGRAPH, pp. 299–306. ACM (1999)
Sinha, S.N., Steedly, D., Szeliski, R.: Piecewise planar stereo for image-based rendering. In: ICCV, pp. 1881–1888. IEEE (2009)
Srinivasan, P.P., Wang, T., Sreelal, A., Ramamoorthi, R., Ng, R.: Learning to synthesize a 4D RGBD light field from a single image. In: ICCV, vol. 2, p. 6. IEEE (2017)
Szeliski, R.: Computer Vision: Algorithms and Applications, 1st edn. Springer, New York (2010)
Ummenhofer, B., Brox, T.: Global, dense multiscale reconstruction for a billion points. In: ICCV (2015)
Waechter, M., Moehrle, N., Goesele, M.: Let there be color! Large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-319-10602-1_54
Wolff, K., et al.: Point cloud noise and outlier removal for image-based 3D reconstruction. In: 3DV, pp. 118–127 (2016)
Wood, D.N., et al.: Surface light fields for 3D photography. In: SIGGRAPH, pp. 287–296. ACM (2000)
Woodford, O., Fitzgibbon, A.W.: Fast image-based rendering using hierarchical image-based priors. In: BMVC, vol. 1, pp. 260–269 (2005)
Woodford, O.J., Reid, I.D., Fitzgibbon, A.W.: Efficient new-view synthesis using pairwise dictionary priors. In: CVPR, pp. 1–8. IEEE (2007)
Woodford, O.J., Reid, I.D., Torr, P.H., Fitzgibbon, A.W.: Fields of experts for image-based rendering. In: BMVC, vol. 3, pp. 1109–1108 (2006)
Wu, B., Zhou, Y., Qian, Y., Gong, M., Huang, H.: Full 3D reconstruction of transparent objects. ACM Trans. Graph. 37(4), 103:1–103:11 (2018)
Zhang, F., Liu, F.: Parallax-tolerant image stitching. In: CVPR, pp. 3262–3269 (2014)
Zheng, K.C., Kang, S.B., Cohen, M.F., Szeliski, R.: Layered depth panoramas. In: CVPR, pp. 1–8 (2007)
Zhou, Q.Y., Koltun, V.: Color map optimization for 3D reconstruction with consumer depth cameras. ACM Trans. Graph. 33(4), 155:1–155:10 (2014)
Zitnick, C.L., Kang, S.B.: Stereo for image-based rendering using image over-segmentation. Int. J. Comput. Vis. 75(1), 49–65 (2007)
Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S., Szeliski, R.: High-quality video view interpolation using a layered representation. ACM Trans. Graph. 23(3), 600–608 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hedman, P. (2020). Viewpoint-Free Photography for Virtual Reality. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-41816-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41815-1
Online ISBN: 978-3-030-41816-8
eBook Packages: Computer ScienceComputer Science (R0)