Advertisement

On Pre-trained Image Features and Synthetic Images for Deep Learning

  • Stefan HinterstoisserEmail author
  • Vincent LepetitEmail author
  • Paul WohlhartEmail author
  • Kurt KonoligeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11129)

Abstract

Deep Learning methods usually require huge amounts of training data to perform at their full potential, and often require expensive manual labeling. Using synthetic images is therefore very attractive to train object detectors, as the labeling comes for free, and several approaches have been proposed to combine synthetic and real images for training. In this paper, we evaluate if ‘freezing’ the layers responsible for feature extraction to generic layers pre-trained on real images, and training only the remaining layers with plain OpenGL rendering may allow for training with synthetic images only. Our experiments with very recent deep architectures for object recognition (Faster-RCNN, R-FCN, Mask-RCNN) and image feature extractors (InceptionResnet and Resnet) show this simple approach performs surprisingly well.

Notes

Acknowledgments

The authors thank Google’s VALE team for tremendous support using the Google Object Detection API, especially Jonathan Huang, Alireza Fathi, Vivek Rathod, and Chen Sun. In addition, we thank Kevin Murphy, Vincent Vanhoucke, and Alexander Toshev for valuable discussions and feedback.

Supplementary material

478770_1_En_42_MOESM1_ESM.mp4 (36 mb)
Supplementary material 1 (mp4 36819 KB)

References

  1. 1.
    Huang, J., et al.: Speed and accuracy trade-offs for modern convolutional object detectors. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  2. 2.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  3. 3.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  4. 4.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems (2016)Google Scholar
  5. 5.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  6. 6.
    Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: International Conference on Intelligent Robots and Systems (2017)Google Scholar
  7. 7.
    Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: surprisingly easy synthesis for instance detection. arXiv Preprint (2017)Google Scholar
  8. 8.
    Georgakis, G., Mousavian, A., Berg, A.C., Kosecka, J.: Synthesizing training data for object detection in indoor scenes. In: Robotics: Science and Systems Conference (2017)Google Scholar
  9. 9.
    Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: International Conference on Computer Vision (2017)Google Scholar
  10. 10.
    Rozantsev, A., Salzmann, M., Fua, P.: Beyond sharing weights for deep domain adaptation. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  11. 11.
    Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. In: Advances in Neural Information Processing Systems (2016)Google Scholar
  12. 12.
    Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. (2016)Google Scholar
  13. 13.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  14. 14.
    Alhaija, H.A., Mustikovela, S.K., Mescheder, L., Geiger, A., Rother, C.: Augmented reality meets deep learning for car instance segmentation in urban scenes. In: British Machine Vision Conference (2017)Google Scholar
  15. 15.
    Varol, G., et al.: Learning from synthetic humans. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  16. 16.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  17. 17.
    Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  18. 18.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference for Learning Representations (2015)Google Scholar
  19. 19.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. arXiv Preprint (2017)Google Scholar
  20. 20.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-V4, inception-resnet and the impact of residual connections on learning. In: American Association for Artificial Intelligence Conference (2017)Google Scholar
  21. 21.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  22. 22.
    Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV (2015)Google Scholar
  23. 23.
    Movshovitz-attias, Y., Kanade, T., Sheikh, Y.: How useful is photo-realistic rendering for visual learning? In: European Conference on Computer Vision (2016)Google Scholar
  24. 24.
    Mitash, C., Bekris, K.E., Boularias, A.: A self-supervised learning system for object detection using physics simulation and multi-view pose estimation. In: International Conference on Intelligent Robots and Systems (2017)Google Scholar
  25. 25.
    Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_7CrossRefGoogle Scholar
  26. 26.
    Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  27. 27.
    Ouyang, W., Wang, X., Zhang, C., Yang, X.: Factors in finetuning deep model for object detection with long-tail distribution. In: Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  28. 28.
    Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: International Conference on Computer Vision (2015)Google Scholar
  29. 29.
    Phong, B.T.: Illumination for computer generated pictures. Commun. ACM 18, 311–317 (1975)CrossRefGoogle Scholar
  30. 30.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37331-2_42CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.XMountain ViewUSA
  2. 2.University of BordeauxBordeauxFrance

Personalised recommendations