Skip to main content

Viewpoint-Free Photography for Virtual Reality

  • Chapter
  • First Online:
Real VR – Immersive Digital Reality

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11900))

  • 2264 Accesses

Abstract

Viewpoint-free photography, i.e., interactively controlling the viewpoint of a photograph after capture, is a central challenge for real virtual reality (VR) experiences. In this chapter, we present algorithms that enable viewpoint-free photography from casual capture, i.e., footage easily captured with consumer cameras. We build on extensive work in image-based rendering, which often focuses on full or near-interpolation, where output viewpoints lie directly between captured images, or nearby. However, for 6-DOF VR experiences, it is essential to create viewpoint-free photos with a wide field-of-view and sufficient positional freedom to cover the range of motion a user might experience.

We focus on two VR experiences:

  1. (1)

    Seated experiences, where the user can lean in different directions. Since the scene is only observed from a small range of viewpoints, we focus on easy capture—showing how to turn panorama-style capture into 3D photos, a simple representation for viewpoint-free photos, and also how to significantly speed up processing times.

  2. (2)

    Room-scale experiences, where the user can explore vastly different perspectives. This is challenging: More input footage is needed, maintaining real-time display rates becomes difficult, view-dependent appearance and object backsides need to be modelled, all while preventing noticeable mistakes. We address these challenges by: (1) creating refined geometry for each input photograph, (2) using a fast tiled rendering algorithm to achieve real-time display rates, and (3) using a convolutional neural network to hide visual mistakes during compositing.

Overall, we provide evidence that viewpoint-free photography is feasible from casual capture—for both seated and room-scale VR experiences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://vis.cs.ucl.ac.uk/Download/G.Brostow/Casual3D.

  2. 2.

    http://www.agisoft.com/.

  3. 3.

    http://www.capturingreality.com/.

  4. 4.

    See http://developer.apple.com/videos/play/wwdc2017/507 at 17:20–20:50, Slides 81–89.

  5. 5.

    http://visual.cs.ucl.ac.uk/pubs/instant3d/supplemental.

  6. 6.

    http://team.inria.fr/graphdeco/deep-blending, listed as “Heuristic Blending”.

  7. 7.

    http://team.inria.fr/graphdeco/deep-blending.

  8. 8.

    https://theta360.com.

  9. 9.

    https://www.oculus.com/quest/.

References

  1. CR-Play. http://www.cr-play.eu. Accessed 15 Oct 2016

  2. Immersive 3D Spaces for real-world applications—Matterport. http://matterport.com/. Accessed 15 Oct 2016

  3. Agarwal, S., Mierle, K., et al.: Ceres solver (2017). http://ceres-solver.org. Accessed 01 Oct 2018

  4. Agarwala, A., et al.: Interactive digital photomontage. ACM Trans. Graph. 23(3), 294–302 (2004)

    Article  Google Scholar 

  5. Aliaga, D.G., Funkhouser, T., Yanovsky, D., Carlbom, I.: Sea of images. In: Vis, pp. 331–338. IEEE (2002)

    Google Scholar 

  6. Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. 35(6), 1–13 (2016)

    Google Scholar 

  7. Bako, S., et al.: Kernel-predicting convolutional networks for denoising Monte Carlo renderings. ACM Trans. Graph. 36(4), 97 (2017)

    Article  Google Scholar 

  8. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24:1–24:11 (2009)

    Google Scholar 

  9. Barron, J.T., Malik, J.: Shape, illumination, and reflectance from shading. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1670–1687 (2015)

    Article  Google Scholar 

  10. Bertel, T., Campbell, N.D.F., Richardt, C.: Megaparallax: casual \(360^{\circ }\) panoramas with motion parallax. IEEE TVCG 25, 1828–1835 (2019)

    Google Scholar 

  11. Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: SIGGRAPH, pp. 425–432. ACM (2001)

    Google Scholar 

  12. Chaurasia, G., Duchene, S., Sorkine-Hornung, O., Drettakis, G.: Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graph. 32(3), 30:1–30:12 (2013)

    Google Scholar 

  13. Chen, S.E., Williams, L.: View interpolation for image synthesis. In: SIGGRAPH, pp. 279–288. ACM (1993)

    Google Scholar 

  14. Davis, A., Levoy, M., Durand, F.: Unstructured light fields. Comp. Graph. Forum 31(2), 305–314 (2012)

    Article  Google Scholar 

  15. Debevec, P., Yu, Y., Borshukov, G.: Efficient view-dependent image-based rendering with projective texture-mapping. In: Drettakis, G., Max, N. (eds.) Rendering Workshop, pp. 105–116. Springer, Heidelberg (1998). https://doi.org/10.1007/978-3-7091-6453-2_10

  16. Eisemann, M., et al.: Floating textures. Comput. Graph. Forum 27(2), 409–418 (2008)

    Article  Google Scholar 

  17. Firman, M., Aodha, O.M., Julier, S., Brostow, G.J.: Structured prediction of unobserved voxels from a single depth image. In: CVPR, pp. 5431–5440. IEEE (2016)

    Google Scholar 

  18. Fitzgibbon, A., Wexler, Y., Zisserman, A.: Image-based rendering using image-based priors. Int. J. Comput. Vis. 63(2), 141–151 (2005)

    Article  Google Scholar 

  19. Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: CVPR, pp. 5515–5524. IEEE, June 2016

    Google Scholar 

  20. Fuhrmann, S., Goesele, M.: Floating scale surface reconstruction. ACM Trans. Graph. 33(4) (2014). Article no. 46

    Google Scholar 

  21. Fuhrmann, S., Langguth, F., Goesele, M.: MVE: a multi-view reconstruction environment. In: Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage (GCH 2014), pp. 11–18 (2014)

    Google Scholar 

  22. Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends. Comput. Graph. Vis. 9(1–2), 1–148 (2015)

    Article  Google Scholar 

  23. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)

    Article  Google Scholar 

  24. Garland, M., Heckbert, P.S.: Surface simplification using quadric error metrics. In: SIGGRAPH, pp. 209–216. ACM (1997)

    Google Scholar 

  25. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)

    Google Scholar 

  26. Goesele, M., Snavely, N., Curless, B., Hoppe, H., Seitz, S.M.: Multi-view stereo for community photo collections. In: ICCV, pp. 1–8. IEEE (2007)

    Google Scholar 

  27. Goldlücke, B., Aubry, M., Kolev, K., Cremers, D.: A super-resolution framework for high-accuracy multiview reconstruction. Int. J. Comput. Vis. 106(2), 172–191 (2014)

    Article  MathSciNet  Google Scholar 

  28. Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: SIGGRAPH, pp. 43–54. ACM (1996)

    Google Scholar 

  29. Ha, H., Im, S., Park, J., Jeon, H.G., Kweon, I.S.: High-quality depth from uncalibrated small motion clip. In: CVPR (2016)

    Google Scholar 

  30. He, K., Sun, J., Tang, X.: Guided image filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_1

    Chapter  Google Scholar 

  31. Hedman, P., Ritschel, T., Drettakis, G., Brostow, G.: Scalable inside-out image-based rendering. ACM Trans. Graph. 35(6), 231:1–231:11 (2016)

    Google Scholar 

  32. Heigl, B., Koch, R., Pollefeys, M., Denzler, J., Van Gool, L.: Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In: Förstner, W., Buhmann, J.M., Faber, A., Faber, P. (eds.) Mustererkennung 1999, pp. 94–101. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-642-60243-6_11

  33. Hirschmuller, H.: Stereo vision in structured environments by consistent semi-global matching. In: CVPR, pp. 2386–2393. IEEE (2006)

    Google Scholar 

  34. Hosni, A., Rhemann, C., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 504–511 (2013)

    Article  Google Scholar 

  35. Im, S., Ha, H., Rameau, F., Jeon, H.G., Choe, G., Kweon, I.S.: All-around depth from small motion with a spherical panoramic camera. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 156–172. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_10

  36. Ishiguro, H., Yamamoto, M., Tsuji, S.: Omni-directional stereo for making global map. In: ICCV, pp. 540–547. IEEE (1990)

    Google Scholar 

  37. Jancosek, M., Pajdla, T.: Multi-view reconstruction preserving weakly-supported surfaces. In: CVPR, pp. 3121–3128 (2011)

    Google Scholar 

  38. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9906, pp. 694–711. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46475-6_43

  39. Kalantari, N.K., Wang, T.C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)

    Google Scholar 

  40. Kazhdan, M., Hoppe, H.: Screened Poisson surface reconstruction. ACM Trans. Graph. 32(3), 29:1–29:13 (2013)

    Google Scholar 

  41. Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: benchmarking large-scale scene reconstruction. ACM Trans. Graph. 36(4), 1–13 (2017)

    Google Scholar 

  42. Konrad, R., Dansereau, D.G., Masood, A., Wetzstein, G.: SpinVR: towards live-streaming 3D virtual reality video. ACM Trans. Graph. 36(6) (2017). Article no. 209

    Google Scholar 

  43. Kopf, J., Langguth, F., Scharstein, D., Szeliski, R., Goesele, M.: Image-based rendering in the gradient domain. ACM Trans. Graph. 32(6), 199:1–199:9 (2013)

    Google Scholar 

  44. Kwatra, V., Schödl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: image and video synthesis using graph cuts. ACM Trans. Graph. 22(3), 277–286 (2003)

    Article  Google Scholar 

  45. Labatut, P., Pons, J.P., Keriven, R.: Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In: ICCV, pp. 1–8. IEEE (2007)

    Google Scholar 

  46. Levoy, M., Hanrahan, P.: Light field rendering. In: SIGGRAPH, pp. 31–42. ACM (1996)

    Google Scholar 

  47. Li, W., Li, B.: Joint conditional random field of multiple views with online learning for image-based rendering. In: CVPR. IEEE (2008)

    Google Scholar 

  48. Lin, K., Jiang, N., Cheong, L., Do, M.N., Lu, J.: SEAGULL: seam-guided local alignment for parallax-tolerant image stitching. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9907, pp. 370–385. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_23

  49. McMillan, L., Bishop, G.: Plenoptic modeling: an image-based rendering system. In: SIGGRAPH, pp. 39–46. ACM (1995)

    Google Scholar 

  50. Michael Bleyer, C.R., Rother, C.: Patchmatch stereo - stereo matching with slanted support windows. In: BMVC, pp. 14.1–14.11 (2011)

    Google Scholar 

  51. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient transfer learning. arXiv preprint arXiv:1611.06440 (2016)

  52. Ortiz-Cayon, R., Djelouah, A., Drettakis, G.: A Bayesian approach for selective image-based rendering using superpixels. In: 3DV, pp. 469–477. IEEE (2015)

    Google Scholar 

  53. Overbeck, R.S., Erickson, D., Evangelakos, D., Pharr, M., Debevec, P.: A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Trans. Graph. 37(6), 197:1–197:15 (2018)

    Google Scholar 

  54. Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.: Context encoders: feature learning by inpainting. In: CVPR. IEEE (2016)

    Google Scholar 

  55. Peleg, S., Ben-Ezra, M., Pritch, Y.: Omnistereo: panoramic stereo imaging. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 279–290 (2001)

    Article  Google Scholar 

  56. Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. ACM Trans. Graph. 36(6), 235 (2017)

    Article  Google Scholar 

  57. Perazzi, F., et al.: Panoramic video from unstructured camera arrays. Comput. Graph. Forum 34(2), 57–68 (2015)

    Article  Google Scholar 

  58. Pulli, K., Hoppe, H., Cohen, M., Shapiro, L., Duchamp, T., Stuetzle, W.: View-based rendering: visualizing real objects from scanned range and color data. In: Dorsey, J., Slusallek, P. (eds.) Rendering Techniques 1997. Eurographics, pp. 23–34. Springer, Heidelberg (1997). https://doi.org/10.1007/978-3-7091-6858-5_3

  59. RealityCapture, C.: RealityCapture (2016). https://capturingreality.com. Accessed 01 Oct 2018

  60. Reinhard, E., Ashikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graph. Appl. 21(5), 34–41 (2001)

    Article  Google Scholar 

  61. Richardt, C., Pritch, Y., Zimmer, H., Sorkine-Hornung, A.: Megastereo: constructing high-resolution stereo panoramas. In: CVPR, pp. 1256–1263 (2013)

    Google Scholar 

  62. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  63. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)

    Article  Google Scholar 

  64. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  65. Schönberger, J.L., Zheng, E., Pollefeys, M., Frahm, J.M.: Pixelwise view selection for unstructured multi-view stereo. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, vol. 9907, pp. 501–518. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-319-46487-9_31

  66. Schöps, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR. IEEE (2017)

    Google Scholar 

  67. Shade, J., Gortler, S., He, L.W., Szeliski, R.: Layered depth images. In: SIGGRAPH (1998)

    Google Scholar 

  68. Shum, H.Y., Chan, S.C., Kang, S.B.: Image-Based Rendering. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-32668-9

  69. Shum, H.Y., He, L.W.: Rendering with concentric mosaics. In: SIGGRAPH, pp. 299–306. ACM (1999)

    Google Scholar 

  70. Sinha, S.N., Steedly, D., Szeliski, R.: Piecewise planar stereo for image-based rendering. In: ICCV, pp. 1881–1888. IEEE (2009)

    Google Scholar 

  71. Srinivasan, P.P., Wang, T., Sreelal, A., Ramamoorthi, R., Ng, R.: Learning to synthesize a 4D RGBD light field from a single image. In: ICCV, vol. 2, p. 6. IEEE (2017)

    Google Scholar 

  72. Szeliski, R.: Computer Vision: Algorithms and Applications, 1st edn. Springer, New York (2010)

    MATH  Google Scholar 

  73. Ummenhofer, B., Brox, T.: Global, dense multiscale reconstruction for a billion points. In: ICCV (2015)

    Google Scholar 

  74. Waechter, M., Moehrle, N., Goesele, M.: Let there be color! Large-scale texturing of 3D reconstructions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 836–850. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-319-10602-1_54

  75. Wolff, K., et al.: Point cloud noise and outlier removal for image-based 3D reconstruction. In: 3DV, pp. 118–127 (2016)

    Google Scholar 

  76. Wood, D.N., et al.: Surface light fields for 3D photography. In: SIGGRAPH, pp. 287–296. ACM (2000)

    Google Scholar 

  77. Woodford, O., Fitzgibbon, A.W.: Fast image-based rendering using hierarchical image-based priors. In: BMVC, vol. 1, pp. 260–269 (2005)

    Google Scholar 

  78. Woodford, O.J., Reid, I.D., Fitzgibbon, A.W.: Efficient new-view synthesis using pairwise dictionary priors. In: CVPR, pp. 1–8. IEEE (2007)

    Google Scholar 

  79. Woodford, O.J., Reid, I.D., Torr, P.H., Fitzgibbon, A.W.: Fields of experts for image-based rendering. In: BMVC, vol. 3, pp. 1109–1108 (2006)

    Google Scholar 

  80. Wu, B., Zhou, Y., Qian, Y., Gong, M., Huang, H.: Full 3D reconstruction of transparent objects. ACM Trans. Graph. 37(4), 103:1–103:11 (2018)

    Google Scholar 

  81. Zhang, F., Liu, F.: Parallax-tolerant image stitching. In: CVPR, pp. 3262–3269 (2014)

    Google Scholar 

  82. Zheng, K.C., Kang, S.B., Cohen, M.F., Szeliski, R.: Layered depth panoramas. In: CVPR, pp. 1–8 (2007)

    Google Scholar 

  83. Zhou, Q.Y., Koltun, V.: Color map optimization for 3D reconstruction with consumer depth cameras. ACM Trans. Graph. 33(4), 155:1–155:10 (2014)

    Google Scholar 

  84. Zitnick, C.L., Kang, S.B.: Stereo for image-based rendering using image over-segmentation. Int. J. Comput. Vis. 75(1), 49–65 (2007)

    Article  Google Scholar 

  85. Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S., Szeliski, R.: High-quality video view interpolation using a layered representation. ACM Trans. Graph. 23(3), 600–608 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Hedman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hedman, P. (2020). Viewpoint-Free Photography for Virtual Reality. In: Magnor, M., Sorkine-Hornung, A. (eds) Real VR – Immersive Digital Reality. Lecture Notes in Computer Science(), vol 11900. Springer, Cham. https://doi.org/10.1007/978-3-030-41816-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41816-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41815-1

  • Online ISBN: 978-3-030-41816-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics