UnrealGT: Using Unreal Engine to Generate Ground Truth Datasets

  • Thomas PollokEmail author
  • Lorenz Junglas
  • Boitumelo Ruf
  • Arne Schumann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11844)


Large amounts of data have become an essential requirement in the development of modern computer vision algorithms, e.g. the training of neural networks. Due to data protection laws, overflight permissions for UAVs or expensive equipment, data collection is often a costly and time-consuming task. Especially, if the ground truth is generated by manually annotating the collected data. By means of synthetic data generation, large amounts of image- and metadata can be extracted directly from a virtual scene, which in turn can be customized to meet the specific needs of the algorithm or the use-case. Furthermore, the use of virtual objects avoids problems that might arise due to data protection issues and does not require the use of expensive sensors. In this work we propose a framework for synthetic test data generation utilizing the Unreal Engine. The Unreal Engine provides a simulation environment that allows one to simulate complex situations in a virtual world, such as data acquisition with UAVs or autonomous diving. However, our process is agnostic to the computer vision task for which the data is generated and, thus, can be used to create generic datasets. We evaluate our framework by generating synthetic test data, with which a CNN for object detection as well as a V-SLAM algorithm are trained and evaluated. The evaluation shows that our generated synthetic data can be used as an alternative to real data.


Simulation Unreal Engine Ground truth Annotated data Object detection SLAM 



This work has received funding from the European Union’s Horizon 2020 research and innovation program in the context of the VICTORIA project under grant agreement No. 740754.


  1. 1.
    Bridson, R.: Fast Poisson disk sampling in arbitrary dimensions. In: Proceedings of ACM SIGGRAPH Sketches (2007)Google Scholar
  2. 2.
    Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2D human pose estimation: a survey. Tsinghua Sci. Technol. 24(6), 663–676 (2019)CrossRefGoogle Scholar
  3. 3.
    Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Proceedings of Annual Conference on Robot Learning, pp. 1–16 (2017)Google Scholar
  4. 4.
    Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag. 13(2), 99–110 (2006)CrossRefGoogle Scholar
  5. 5.
    Eberly, D.: 3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics. CRC Press, Boca Raton (2006)Google Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  7. 7.
    Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. Int. J. Multimedia Inf. Retrieval 7(2), 87–93 (2018)CrossRefGoogle Scholar
  8. 8.
    Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of IEEE International Conference on Computer Vision Workshops, pp. 554–561 (2013)Google Scholar
  9. 9.
    Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Proceedings of European Conference on Computer Vision, pp. 740–755 (2014)Google Scholar
  10. 10.
    Liu, L., et al.: Deep learning for generic object detection: a survey. arXiv preprint arXiv:1809.02165 (2018)
  11. 11.
    Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)CrossRefGoogle Scholar
  12. 12.
    Qiu, W., Yuille, A.: UnrealCV: connecting computer vision to unreal engine. In: Proceedings of European Conference on Computer Vision, pp. 909–916 (2016)Google Scholar
  13. 13.
    Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
  14. 14.
    Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). Scholar
  15. 15.
    Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Proceedings of Field and Service Robotics, pp. 621–635 (2018)Google Scholar
  16. 16.
    Tremblay, J., et al.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1082–10828 (2018)Google Scholar
  17. 17.
    Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Thomas Pollok
    • 1
    Email author
  • Lorenz Junglas
    • 1
  • Boitumelo Ruf
    • 1
    • 2
  • Arne Schumann
    • 1
  1. 1.Fraunhofer IOSB, Video Exploitation SystemsKarlsruheGermany
  2. 2.Institute of Photogrammetry and Remote SensingKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations