UnrealROX: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation

Abstract

Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer amount of data in the real world is a time-consuming and error-prone task. These problems limit scale and quality. Synthetic data generation has become increasingly popular since it is faster to generate and automatic to annotate. However, most of the current datasets and environments lack realism, interactions, and details from the real world. UnrealROX is an environment built over Unreal Engine 4 which aims to reduce that reality gap by leveraging hyperrealistic indoor scenes that are explored by robot agents which also interact with objects in a visually realistic manner in that simulated world. Photorealistic scenes and robots are rendered by Unreal Engine into a virtual reality headset which captures gaze so that a human operator can move the robot and use controllers for the robotic hands; scene information is dumped on a per-frame basis so that it can be reproduced offline to generate raw data and ground truth annotations. This virtual reality environment enables robotic vision researchers to generate realistic and visually plausible data with full ground truth for a wide variety of problems such as class and instance semantic segmentation, object detection, depth estimation, visual grasping, and navigation.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Notes

  1. 1.

    http://gazebosim.org/.

  2. 2.

    https://github.com/3dperceptionlab/unrealrox.

  3. 3.

    https://docs.unrealengine.com/en-US/Programming/UnrealArchitecture/Reference/Interfaces.

  4. 4.

    https://www.softbankrobotics.com/emea/en/robots/pepper.

  5. 5.

    https://www.rethinkrobotics.com/baxter/.

  6. 6.

    https://github.com/3dperceptionlab/unrealrox.

References

  1. Bhoi A (2019) Monocular depth estimation: a survey. arXiv preprint arXiv:1901-09402

  2. Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K et al (2017) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. arXiv preprint arXiv:1709.07857

  3. Brodeur S, Perez E, Anand A, Golemo F, Celotti L, Strub F, Rouat J, Larochelle H, Courville A (2017) Home: a household multimodal environment. arXiv preprint arXiv:1711.11017

  4. Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: Proceedings of the European conference on computer vision (ECCV), pp 611–625

  5. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2650–2658

  6. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems (NIPS), pp 2366–2374

  7. Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4340–4349

  8. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 2961–2969

  9. Kolve E, Mottaghi R, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: an interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474

  10. Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N (2016) Deeper depth prediction with fully convolutional residual networks. In: Proceedings of the IEEE conference on 3D vision (3DV), pp 239–248

  11. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  12. Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724

    Article  Google Scholar 

  13. Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4–5):421–436

    Article  Google Scholar 

  14. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

  15. Looman T (2017) Vr template. https://wiki.unrealengine.com/VR_Template. Accessed 1 Sept 2018

  16. Mahler J, Liang J, Niyaz S, Laskey M, Doan R, Liu X, Ojea JA, Goldberg K (2017) Dex-net 2.0: deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics. arXiv preprint arXiv:1703.09312

  17. McCormac J, Handa A, Leutenegger S, Davison AJ (2016) Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth. arXiv preprint arXiv:1612.05079

  18. Oculus (2017a) Distance grab sample now available in oculus unity sample framework. https://developer.oculus.com/blog/distance-grab-sample-now-available-in-oculus-unity-sample-framework/. Accessed 1 Sept 2018

  19. Oculus (2017b) Oculus first contact. https://www.oculus.com/experiences/rift/1217155751659625/. Accessed 1 Sept 2018

  20. Pashevich A, Strudel R, Kalevatykh I, Laptev I, Schmid C (2019) Learning to augment synthetic images for sim2real policy transfer. arXiv preprint arXiv:1903.07740

  21. Qiu W, Yuille A (2016) Unrealcv: connecting computer vision to unreal engine. In: Proceedings of the European conference on computer vision (ECCV), pp 909–916

  22. Qiu W, Zhong F, Zhang Y, Qiao S, Xiao Z, Kim TS, Wang Y (2017) Unrealcv: virtual worlds for computer vision. In: Proceedings of the 2017 ACM on multimedia conference (ACMMM), pp 1221–1224

  23. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings—30th IEEE conference on computer vision and pattern recognition, CVPR 2017 2017-Janua, pp 6517–6525. https://doi.org/10.1109/CVPR.2017.690

  24. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.91

  25. Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3234–3243

  26. Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) Minos: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931

  27. Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: Proceedings of the European conference on computer vision (ECCV), pp 746–760

  28. Tekin B, Sinha SN, Fua P (2018) Real-time seamless single shot 6d object pose prediction. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 292–301. https://doi.org/10.1109/CVPR.2018.00038

  29. To T, Tremblay J, McKay D, Yamaguchi Y, Leung K, Balanon A, Cheng J, Birchfield S (2018) NDDS: NVIDIA deep learning dataset synthesizer. https://github.com/NVIDIA/Dataset_Synthesizer

  30. Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017a) Domain randomization for transferring deep neural networks from simulation to the real world. In: Proceedings of the IEEE international conference on intelligent robots and systems (IROS), pp 23–30

  31. Tobin J, Zaremba W, Abbeel P (2017b) Domain randomization and generative models for robotic grasping. arXiv preprint arXiv:1710.06425

  32. Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. arXiv preprint arXiv:1804.06516

  33. Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5038–5047

  34. Xia F, Zamir RA, He ZY, Sax A, Malik J, Savarese S (2018) Gibson env: real-world perception for embodied agents. In: Proceedings of the IEEE computer vision and pattern recognition (CVPR)

  35. Xu D, Wang W, Tang H, Liu H, Sebe N, Ricci E (2018) Structured attention guided convolutional neural fields for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3917–3925

  36. Yan C, Misra D, Bennnett A, Walsman A, Bisk Y, Artzi Y (2018) Chalet: cornell house agent learning environment. arXiv preprint arXiv:1801.07357

Download references

Acknowledgements

This work has been funded by the Spanish Government TIN2016-76515-R Grant for the COMBAHO project, supported with Feder funds. This work has also been supported by three Spanish national grants for Ph.D. studies (FPU15/04516, FPU17/00166, and ACIF/2018/197), by the University of Alicante Project GRE16-19, and by the Valencian Government Project GV/2018/022. Experiments were made possible by a generous hardware donation from NVIDIA. We would also like to thank Zuria Bauer for her collaboration in the depth estimation experiments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alberto Garcia-Garcia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Martinez-Gonzalez, P., Oprea, S., Garcia-Garcia, A. et al. UnrealROX: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. Virtual Reality 24, 271–288 (2020). https://doi.org/10.1007/s10055-019-00399-5

Download citation

Keywords

  • Robotics
  • Synthetic data
  • Grasping