Immersive Virtual Reality Audio Rendering Adapted to the Listener and the Room

  • Hansung KimEmail author
  • Luca Remaggi
  • Philip J. B. Jackson
  • Adrian Hilton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11900)


The visual and auditory modalities are the most important stimuli for humans. In order to maximise the sense of immersion in VR environments, a plausible spatial audio reproduction synchronised with visual information is essential. However, measuring acoustic properties of an environment using audio equipment is a complicated process. In this chapter, we introduce a simple and efficient system to estimate room acoustic for plausible spatial audio rendering using 360\(^{\circ }\) cameras for real scene reproduction in VR. A simplified 3D semantic model of the scene is estimated from captured images using computer vision algorithms and convolutional neural network (CNN). Spatially synchronised audio is reproduced based on the estimated geometric and acoustic properties in the scene. The reconstructed scenes are rendered with synthesised spatial audio.


Audio-visual VR Geometry reconstruction Room acoustic modelling Spatial audio rendering 


  1. 1.
    Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)CrossRefGoogle Scholar
  2. 2.
    Bailey, W., Fazenda, B.M.: The effect of reverberation and audio spatialization on egocentric distance estimation of objects in stereoscopic virtual reality. J. Acoust. Soc. Am. 141(5), 3510 (2017)CrossRefGoogle Scholar
  3. 3.
    Bailey, W., Fazenda, B.M.: The effect of visual cues and binaural rendering method on plausibility in virtual environments. In: Proceedings of the 144th AES Convention, Milan, Italy (2018)Google Scholar
  4. 4.
    Binelli, M., Pinardi, D., Nili, T., Farina, A.: Individualized HRTF for playing VR videos with Ambisonics spatial audio on HMDs. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)Google Scholar
  5. 5.
    Blauert, J.: Communication Acoustics. Springer, Berlin (2005). Scholar
  6. 6.
    Bonneel, N., Suied, C., Viaud-Delmon, I., Drettakis, G.: Bimodal perception of audio-visual material properties for virtual environments. ACM Trans. Appl. Percept. 7(1), 1:1–1:16 (2010)CrossRefGoogle Scholar
  7. 7.
    Bradley, J.S.: Review of objective room acoustics measures and future needs. Appl. Acoust. 72(10), 713–720 (2011)CrossRefGoogle Scholar
  8. 8.
    Brown, K., Paradis, M., Murphy, D.: OpenAirLib: a Javascript library for the acoustics of spaces. In: Audio Engineering Society Convention 142, May 2017.
  9. 9.
    Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of the BMVC (2014)Google Scholar
  10. 10.
    Coleman, P., Franck, A., Jackson, P.J.B., Hughes, R.J., Remaggi, L., Melchior, F.: Object-based reverberation for spatial audio. J. Audio Eng. Soc. 65(1/2), 66–77 (2017)CrossRefGoogle Scholar
  11. 11.
    Coleman, P., Franck, A., Menzies, D., Jackson, P.J.B.: Object-based reverberation encoding from first-order Ambisonic RIRs. In: Proceedings of the 142nd AES Convention, Berlin, Germany (2017)Google Scholar
  12. 12.
    Cox, T.: Gun shot in anechoic chamber. Freesound (2013).
  13. 13.
    Dou, M., Guan, L., Frahm, J.-M., Fuchs, H.: Exploring high-level plane primitives for indoor 3D reconstruction with a hand-held RGB-D camera. In: Park, J.-I., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7729, pp. 94–108. Springer, Heidelberg (2013). Scholar
  14. 14.
    Farina, A.: Simultaneous measurement of impulse response and distortion with a swept-sine technique. In: Proceedings of the AES Convention (2000)Google Scholar
  15. 15.
    Franck, A., Fazi, F.M.: VISR: a versatile open software framework for audio signal processing. In: Proceedings of the AES International Conference on Spatial Reproduction - Aesthetics and Science, Tokyo, Japan (2018)Google Scholar
  16. 16.
    Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallet, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic phonetic continuous speech corpus CDROM. Technical report, NIST Interagency (1993)Google Scholar
  17. 17.
    Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Pearson, London (2017)Google Scholar
  18. 18.
  19. 19.
  20. 20.
    Gupta, A., Efros, A.A., Hebert, M.: Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 482–496. Springer, Heidelberg (2010). Scholar
  21. 21.
    Hoeg, W., Christensen, L., Walker, R.: Subjective assessment of audio quality - the means and methods within the EBU. Technical report, EBU Technical Review (1997)Google Scholar
  22. 22.
  23. 23.
    Hulusic, V., et al.: Acoustic rendering and auditory-visual cross-modal perception and interaction. J. Comput. Graph. Forum 31(1), 102–131 (2012)CrossRefGoogle Scholar
  24. 24.
    Insta360: Insta360 ONE X (2018).
  25. 25.
    Jeong, C.H., Marbjerg, G., Brunskog, J.: Uncertainty of input data for room acoustic simulations. In: Proceedings of Bi-annual Baltic-Nordic Acoustic Meeting (2016)Google Scholar
  26. 26.
    Judd, D.B.: Chromaticity sensibility to stimulus differences. J. Opt. Soc. Am. 22(2), 72 (1932)CrossRefGoogle Scholar
  27. 27.
    Kim, H., Campos, T., Hilton, A.: Room layout estimation with object and material attributes information using a spherical camera. In: Proceedings of the 3DV (2016)Google Scholar
  28. 28.
    Kim, H., Hilton, A.: 3D scene reconstruction from multiple spherical stereo pairs. Int. J. Comput. Vis. 104(1), 94–116 (2013)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Kim, H., et al.: Acoustic room modelling using a spherical camera for reverberant spatial audio objects. In: Audio Engineering Society Convention 142, Berlin, Germany (2017).
  30. 30.
    Kim, H., Hernaggi, L., Jackson, P.J., Hilton, A.: Immersive spatial audio reproduction for VR/AR using room acoustic modelling from 360 images. In: Proceedings of the IEEE VR Conference (2019)Google Scholar
  31. 31.
    Kim, H., Sohn, K.: 3D reconstruction from stereo images for interactions between real and virtual objects. Sig. Process. Image Commun. 20(1), 61–75 (2005)CrossRefGoogle Scholar
  32. 32.
    Kwon, S.W., Bosche, F., Kim, C., Haas, C., Liapi, K.: Fitting range data to primitives for rapid local 3D modeling using sparse range point clouds. Autom. Constr. 13(1), 67–81 (2004)CrossRefGoogle Scholar
  33. 33.
    Larsson, P., Väljamäe, A., Västfjäll, D., Tajadura-Jiménez, A., Kleiner, M.: Auditory-induced presence in mixed reality environments and related technology. In: Dubois, E., Gray, P., Nigay, L. (eds.) The Engineering of Mixed Reality Systems. HCIS, pp. 143–163. Springer, London (2010). Scholar
  34. 34.
    Li, M., Nan, L., Liu, S.: Fitting boxes to Manhattan scenes using linear integer programming. Int. J. Digit. Earth 9, 806–817 (2016)CrossRefGoogle Scholar
  35. 35.
    Lindau, A., Kosanke, L., Weinzierl, S.: Perceptual evaluation of model- and signal-based predictors of the mixing time in binaural room impulse responses. J. Audio Eng. Soc. 60(11), 887–898 (2012)Google Scholar
  36. 36.
    Lindau, A., Weinzierl, S.: Assessing the plausibility of virtual acoustic environments. Acta Acust. United Acust. 98(5), 804–810 (2012)CrossRefGoogle Scholar
  37. 37.
    Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)CrossRefGoogle Scholar
  38. 38.
    Matas, J., Galambos, C., Kittler, J.: Robust detection of lines using the progressive probabilistic Hough transform. Comput. Vis. Image Underst. 78, 119–137 (2000)CrossRefGoogle Scholar
  39. 39.
    McArthur, A., Sandler, M., Stewart, R.: Perception of mismatched auditory distance - cinematic VR. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)Google Scholar
  40. 40.
    McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264(5588), 746–748 (1976)CrossRefGoogle Scholar
  41. 41.
    Meng, Z., Zhao, F., He, M.: The just noticeable difference of noise length and reverberation perception. In: Proceedings of the International Symposium on Communications and Information Technologies, Bangkok, Thailand (2006)Google Scholar
  42. 42.
    Naylor, P.A., Kounoudes, A., Gudnason, J., Brookes, M.: Estimation of glottal closure instants in voiced speech using the DYPSA algorithm. IEEE Trans. Audio Speech Lang. Process. 15(1), 34–43 (2007)CrossRefGoogle Scholar
  43. 43.
    Neidhardt, A., Tommy, A.I., Pereppadan, A.D.: Plausibility of an interactive approaching motion towards a virtual sound source based on simplified BRIR sets. In: Proceedings of the 144th AES Convention, Milan, Italy (2018)Google Scholar
  44. 44.
    Nguatem, W., Drauschke, M., Mayer, H.: Finding cuboid-based building models in point clouds. In: Proceedings of ISPRS, pp. 149–154 (2012)Google Scholar
  45. 45.
    Oculus: Oculus SDK (2017).
  46. 46.
  47. 47.
    Politis, A., Tervo, S., Lokki, T., Pulkki, V.: Parametric multidirectional decomposition of microphone recordings for broadband high-order Ambisonic encoding. In: Proceedings of the 144th AES Convention, Milan, Italy (2018)Google Scholar
  48. 48.
    Pulkki, V.: Spatial sound reproduction with directional audio coding. J. Audio Eng. Soc. 55(6), 503–516 (2007)Google Scholar
  49. 49.
    Remaggi, L., Jackson, P.J.B., Coleman, P.: Estimation of room reflection parameters for a reverberant spatial audio object. In: Proceedings of the 138th AES Convention, Warsaw, Poland (2015)Google Scholar
  50. 50.
    Remaggi, L., Jackson, P.J.B., Coleman, P., Wang, W.: Acoustic reflector localization: novel image source reversion and direct localization methods. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 296–309 (2017)CrossRefGoogle Scholar
  51. 51.
    Remaggi, L., Kim, H., Neidhardt, A., Hilton, A., Jackson, P.J.B.: Perceived quality and spatial impression of room reverberation in VR reproduction from measured images and acoustics. In: Proceedings of the ICA (2019)Google Scholar
  52. 52.
    Ricoh: Ricoh Theta V (2018).
  53. 53.
    Rix, J., Haas, S., Teixeira, J.: Virtual Prototyping: Virtual Environments and the Product Design Process. Springer, Boston (2016)Google Scholar
  54. 54.
    Rummukainen, O., Robotham, T., Schlecht, S.J., Plinge, A., Herre, J., Habets, E.A.P.: Audio quality evaluation in virtual reality: multiple stimulus ranking with behavior tracking. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)Google Scholar
  55. 55.
    Rumsey, F.: Spatial quality evaluation for reproduced sound: terminology, meaning, and a scene-based paradigm. J. Audio Eng. Soc. 50(9), 651–666 (2002)Google Scholar
  56. 56.
    Schissler, C., Loftin, C., Manocha, D.: Acoustic classification and optimization for multi-modal rendering of real-world scenes. IEEE Trans. Vis. Comput. Graph. 24(3), 1246–1259 (2018)CrossRefGoogle Scholar
  57. 57.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  58. 58.
    Song, S., Lichtenberg, S., Xiao, J.: SUN RGB-D: a RGB-D scene understanding benchmark suite. In: Proceedings of the CVPR (2015)Google Scholar
  59. 59.
    Spheron: Spheron VR (2018).
  60. 60.
    Stan, G.B., Embrechts, J.J., Archambeau, D.: Comparison of different impulse response measurement techniques. J. Audio Eng. Soc. 50(4), 249–262 (2002)Google Scholar
  61. 61.
    Stecker, G.C., Moore, T.M., Folkerts, M., Zotkin, D., Duraiswami, R.: Toward objective measure of auditory co-immersion in virtual and augmented reality. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)Google Scholar
  62. 62.
    Stenzel, H., Jackson, P.J.B.: Perceptual thresholds of audio-visual spatial coherence for a variety of audio-visual objects. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)Google Scholar
  63. 63.
    Sun, B., Saenko, K.: From virtual to reality: fast adaptation of virtual object detectors to real domains. In: Proceedings of the BMVC, Nottingham, UK (2014)Google Scholar
  64. 64.
    McKenzie, T., Murphy, D., Kearney, G.: Directional bias equalisation of first-order binaural Ambisonic rendering. In: Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, Redmond, USA (2018)Google Scholar
  65. 65.
    Unity Technologies: Unity (2018).
  66. 66.
    Tervo, S., Patynen, J., Kuusinen, A., Lokki, T.: Spatial decomposition method for room impulse responses. J. Audio Eng. Soc. 61(1/2), 17–28 (2013)Google Scholar
  67. 67.
    Tsingos, N., Funkhouser, T., Ngan, A., Carlbom, I.: Modeling acoustics in virtual environments using the uniform theory of diffraction. In: Proceedings of the ACM SIGGRAPH, pp. 545–552, Aug 2001Google Scholar
  68. 68.
    Turk, M.: Multimodal interaction: a review. Pattern Recogn. Lett. 36, 189–195 (2014)CrossRefGoogle Scholar
  69. 69.
    Välimäki, V., Parker, J.D., Savioja, L., Smith, J.O., Abel, J.S.: Fifty years of artificial reverberation. IEEE TASLP 20(5), 1421–1448 (2012)Google Scholar
  70. 70.
    Valve: Steamaudio SDK (2017).
  71. 71.
    Vorländer, M.: Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality. Springer, Berlin (2008). Scholar
  72. 72.
    Vorländer, M.: Virtual acoustics: opportunities and limits of spatial sound reproduction. Arch. Acoust. 33(4), 413–422 (2008)Google Scholar
  73. 73.
    Vorländer, M.: International round robin on room acoustical computer simulations. In: Proceedings of the ICA, Trondheim, Norway (1995)Google Scholar
  74. 74.
    Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)CrossRefGoogle Scholar
  75. 75.
    Zheng, S., et al.: Dense semantic image segmentation with objects and attributes. In: Proceedings of the CVPR (2014)Google Scholar
  76. 76.
    Zhu, H., Meng, F., Cai, J., Lu, S.: Beyond pixels: a comprehensive survey from bottom-up to semantic image segmentation and cosegmentation. J. Vis. Commun. Image Represent. 34, 12–27 (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.CVSSPUniversity of SurreyGuildfordUK
  2. 2.Creative Tech UK, Creative LabsStaines-upon-ThamesUK

Personalised recommendations