Scene Reconstruction for Storytelling in 360 $$^\circ $$ Videos

Pinheiro, Gonçalo; Alves, Nelson; Magalhães, Luis; Agrellos, Luís; Guevara, Miguel

doi:10.1007/978-3-030-16447-8_12

Scene Reconstruction for Storytelling in 360$^\circ $ Videos

Gonçalo Pinheiro²⁰,
Nelson Alves²⁰,
Luis Magalhães²¹,
Luís Agrellos²² &
…
Miguel Guevara²⁰

Conference paper
First Online: 31 March 2019

721 Accesses
1 Altmetric

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 273))

Abstract

In immersive and interactive contents like 360-degrees videos the user has the control of the camera, which poses a challenge to the content producer since the user may look to where he wants. This paper presents the concept and first steps towards the development of a framework that provides a workflow for storytelling in 360-degrees videos. With the proposed framework it will be possible to connect a sound to a source and taking advantage of binaural audio it will help to redirect the user attention to where the content producer wants. To present this kind of audio, the scenario must be mapped/reconstructed so as to understand how the objects contained in it interfere with the sound waves propagation. The proposed system is capable of reconstructing the scenario from a stereoscopic, still or motion 360-degrees video when provided in an equirectangular projection. The system also incorporates a module that detects and tracks people, mapping their motion from the real world to the 3D world. In this document we describe all the technical decisions and implementations of the system. To the best of our knowledge, this system is the only that has shown the capability to reconstruct scenarios in a large variety of 360 footage and allows for the creation of binaural audio from that reconstruction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 60.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

360-degree projection. https://github.com/bingsyslab/360projection. Accessed 11 June 2018
S3A spatial audio. http://www.s3a-spatialaudio.org. Accessed 24 July 2018
Akbarzadeh, A., et al.: Towards urban 3D reconstruction from video. In: Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT 2006). IEEE Computer Society (2006)
Google Scholar
Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008. https://doi.org/10.1109/CVPR.2008.4587583
Breuers, S., Beyer, L., Rafi, U., Leibe, B.: Detection-tracking for efficient person analysis: the DetTA pipeline. CoRR abs/1804.10134 (2018). http://arxiv.org/abs/1804.10134
Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. CoRR abs/1604.03901 (2016). http://arxiv.org/abs/1604.03901
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. CoRR abs/1406.2283 (2014). http://arxiv.org/abs/1406.2283
Ewerth, R., et al.: Estimating relative depth in single images via rankboost. In: 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 919–924, July 2017. https://doi.org/10.1109/ICME.2017.8019434
Geiger, A., Ziegler, J., Stiller, C.: StereoScan: Dense 3D reconstruction in real-time. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 963–968, June 2011. https://doi.org/10.1109/IVS.2011.5940405
Grani, F., et al.: Audio-visual attractors for capturing attention to the screens when walking in cave systems. In: 2014 IEEE VR Workshop: Sonic Interaction in Virtual Environments (SIVE), pp. 3–6, March 2014. https://doi.org/10.1109/SIVE.2014.7006282
Güler, R.A., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. CoRR abs/1802.00434 (2018). http://arxiv.org/abs/1802.00434
Jafari, O.H., Mitzel, D., Leibe, B.: Real-time RGB-D based people detection and tracking for mobile robots and head-worn cameras. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 5636–5643, May 2014. https://doi.org/10.1109/ICRA.2014.6907688
Kim, A., Eustice, R.M.: Active visual slam for robotic area coverage: theory and experiment. Int. J. Robot. Res. 34(4–5), 457–475 (2015). https://doi.org/10.1177/0278364914547893
Article Google Scholar
Kim, H., Hilton, A.: Block world reconstruction from spherical stereo image pairs. Comput. Vis. Image Underst. 139, 104–121 (2015). https://doi.org/10.1016/j.cviu.2015.04.001. http://www.sciencedirect.com/science/article/pii/S1077314215000831
Article Google Scholar
Kim, H., et al.: Acoustic room modelling using a spherical camera for reverberant spatial audio objects. In: Audio Engineering Society Convention 142, May 2017. http://www.aes.org/e-lib/browse.cfm?elib=18583
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. CoRR abs/1708.02002 (2017). http://arxiv.org/abs/1708.02002
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1253–1260, June 2010. https://doi.org/10.1109/CVPR.2010.5539823
Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. CoRR abs/1804.06278 (2018). http://arxiv.org/abs/1804.06278
Polic, M., Förstner, W., Pajdla, T.: Fast and accurate camera covariance computation for large 3D reconstruction (2018)
Google Scholar
Riazuelo, L., Montano, L., Montiel, J.M.M.: Semantic visual SLAM in populated environments. In: 2017 European Conference on Mobile Robots (ECMR), pp. 1–7, Sept 2017. https://doi.org/10.1109/ECMR.2017.8098697
Saurer, O., Pollefeys, M., Hee Lee, G.: Sparse to dense 3D reconstruction from rolling shutter images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3337–3345 (2016)
Google Scholar
Spinello, L., Arras, K.O., Triebel, R., Siegwart, R.: A layered approach to people detection in 3D range data. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, pp. 1625–1630. AAAI Press (2010). http://dl.acm.org/citation.cfm?id=2898607.2898866
Stewart, R., Andriluka, M., Ng, A.Y.: End-to-end people detection in crowded scenes. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
Sturm, P., Triggs, B.: A factorization based algorithm for multi-image projective structure and motion. In: Buxton, B., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1065, pp. 709–720. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61123-1_183
Chapter Google Scholar
Toldo, R., Gherardi, R., Farenzena, M., Fusiello, A.: Hierarchical structure-and-motion recovery from uncalibrated images. Comput. Vis. Image Underst. 140, 127–143 (2015). https://doi.org/10.1016/j.cviu.2015.05.011. http://www.sciencedirect.com/science/article/pii/S1077314215001228
Article Google Scholar
Tome, D., Russell, C., Agapito, L.: Lifting from the deep: convolutional 3D pose estimation from a single image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Google Scholar
Wong, K.H., Chang, M.M.Y.: 3D model reconstruction by constrained bundle adjustment. In: 2004 Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 902–905, Aug 2004. https://doi.org/10.1109/ICPR.2004.1334674
Yu, R., Russell, C., Campbell, N.D.F., Agapito, L.: Direct, dense, and deformable: Template-based non-rigid 3D reconstruction from RGB video. In: The IEEE International Conference on Computer Vision (ICCV), December 2015
Google Scholar
Yu, S., Lhuillier, M.: Incremental reconstruction of manifold surface from sparse visual mapping. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization Transmission, pp. 293–300, October 2012. https://doi.org/10.1109/3DIMPVT.2012.11
Zakharov, A.A., Barinov, A.E.: An algorithm for 3D-object reconstruction from video using stereo correspondences. Pattern Recogn. Image Anal. 25(1), 117–121 (2015). https://doi.org/10.1134/S1054661815010228
Article Google Scholar
Zhang, G., Liu, J., Li, H., Chen, Y.Q., Davis, L.S.: Joint human detection and head pose estimation via multistream networks for RGB-D videos. IEEE Signal Process. Lett. 24(11), 1666–1670 (2017). https://doi.org/10.1109/LSP.2017.2731952
Article Google Scholar
Zhou, H., Zou, D., Pei, L., Ying, R., Liu, P., Yu, W.: StructSLAM: visual SLAM with building structure lines. IEEE Trans. Veh. Technol. 64(4), 1364–1375 (2015). https://doi.org/10.1109/TVT.2015.2388780
Article Google Scholar
Zhuo, W., Salzmann, M., He, X., Liu, M.: Indoor scene structure analysis for single image depth estimation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 614–622, June 2015. https://doi.org/10.1109/CVPR.2015.7298660

Download references

Acknowledgments

This article is a result of the project CHIC - Cooperative Holistic view on Internet and Content (project n$^\circ $ 24498), supported by the European Regional Development Fund (ERDF), through the Competitiveness and Internationalization Operational Program (COMPETE 2020) under the PORTUGAL 2020 Partnership Agreement.

Author information

Authors and Affiliations

Centro de Computação Gráfica, Campus de Azurém, Edifício 14, 4800-058, Guimarães, Portugal
Gonçalo Pinheiro, Nelson Alves & Miguel Guevara
University of Minho, Campus de Azurém, 4800-058, Guimarães, Portugal
Luis Magalhães
GMK, Cais das Pedras n°08, 4050-465, Porto, Portugal
Luís Agrellos

Authors

Gonçalo Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar
Nelson Alves
View author publications
You can also search for this author in PubMed Google Scholar
Luis Magalhães
View author publications
You can also search for this author in PubMed Google Scholar
Luís Agrellos
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Guevara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gonçalo Pinheiro .

Editor information

Editors and Affiliations

Department de Sistemas de Informacao, Universidade do Minho, Guimaraes, Portugal
Paulo Cortez
Department of Information Systems, University of Minho, Guimarães, Portugal
Luís Magalhães
University of Minho, Guimarães, Portugal
Pedro Branco
Department of Information Systems, University of Minho, Guimarães, Portugal
Carlos Filipe Portela
Department of Engineering, University of Trás-os-Montes e Alto Douro, Vila Real, Portugal
Telmo Adão

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pinheiro, G., Alves, N., Magalhães, L., Agrellos, L., Guevara, M. (2019). Scene Reconstruction for Storytelling in 360$^\circ $ Videos. In: Cortez, P., Magalhães, L., Branco, P., Portela, C., Adão, T. (eds) Intelligent Technologies for Interactive Entertainment. INTETAIN 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 273. Springer, Cham. https://doi.org/10.1007/978-3-030-16447-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-16447-8_12
Published: 31 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16446-1
Online ISBN: 978-3-030-16447-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scene Reconstruction for Storytelling in 360\(^\circ \) Videos

Abstract

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Abstract

Buying options

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation