Skip to main content

Transfer Learning by Finetuning Pretrained CNNs Entirely with Synthetic Images

  • Conference paper
  • First Online:
Book cover Computer Vision, Pattern Recognition, Image Processing, and Graphics (NCVPRIPG 2017)

Abstract

We show that finetuning pretrained CNNs entirely on synthetic images is an effective strategy to achieve transfer learning. We apply this strategy for detecting packaged food products clustered in refrigerator scenes. A CNN pretrained on the COCO dataset and fine-tuned with our 4000 synthetic images achieves mean average precision (mAP @ 0.5-IOU) of 52.59 on a test set of real images (150 distinct products as objects of interest and 25 distractor objects) in comparison to a value of 24.15 achieved without such finetuning. The synthetic images were rendered with freely available 3D models with variations in parameters like color, texture and viewpoint without a high emphasis on photorealism. We analyze factors like training data set size, cue variances, 3D model dictionary size and network architecture for their influence on the transfer learning performance. Additionally, training strategies like fine-tuning with selected layers and early stopping which affect transfer learning from synthetic scenes to real scenes were explored. This approach is promising in scenarios where limited training data is available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Trained network weights and synthetic dataset are available at https://github.com/paramrajpura/Syn2Real.

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)

    Google Scholar 

  2. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 12 June, vol. 07, pp. 1–9. IEEE, June 2015

    Google Scholar 

  3. Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2014)

    Article  Google Scholar 

  4. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  5. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE, June 2009

    Google Scholar 

  6. Li, W., Duan, L., Xu, D., Tsang, I.W.: Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1134–1148 (2014)

    Article  Google Scholar 

  7. Hoffman, J., Rodner, E., Donahue, J., Darrell, T., Saenko, K.: Efficient learning of domain-invariant image representations. In: ICLR, pp. 1–9, January 2013

    Google Scholar 

  8. Hoffman, J., Guadarrama, S., Tzeng, E., Hu, R., Donahue, J., Girshick, R., Darrell, T., Saenko, K.: LSDA: large scale detection through adaptation. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3536–3544. MIT Press (2014)

    Google Scholar 

  9. Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1785–1792. IEEE, June 2011

    Google Scholar 

  10. Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, pp. 97–105 (2015)

    Google Scholar 

  11. Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3320–3328. MIT Press (2014)

    Google Scholar 

  12. Li, Y., Wang, N., Shi, J., Liu, J., Hou, X.: Revisiting batch normalization for practical domain adaptation. arXiv Preprint arXiv:1603.04779, March 2016. https://doi.org/10.1016/B0-7216-0423-4/50051-2

  13. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. arXiv Preprint arxiv:1604.06646, April 2016. https://doi.org/10.1109/CVPR.2016.254

  14. Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. arXiv Preprint arXiv:1412.3474, December 2014

  15. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: DeCAF: a deep convolutional activation feature for generic visual recognition. In: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML 2014, pp. I-647–I-655. JMLR.org (2014)

    Google Scholar 

  16. Tommasi, T., Patricia, N., Caputo, B., Tuytelaars, T.: A deeper look at dataset bias. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 504–516. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_42

    Chapter  Google Scholar 

  17. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 1717–1724. IEEE Computer Society, Washington, DC (2014)

    Google Scholar 

  18. Peng, X., Sun, B., Ali, K., Saenko, K.: Learning deep object detectors from 3D models. In: Proceedings of the IEEE International Conference on Computer Vision 2015 Inter, pp. 1278–1286, December 2015

    Google Scholar 

  19. Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE International Conference on Computer Vision 2015 Inter, pp. 2686–2694, May 2015

    Google Scholar 

  20. Georgakis, G., Mousavian, A., Berg, A.C., Kosecka, J.: Synthesizing training data for object detection in indoor scenes. arXiv Preprint arXiv:1702.07836, February 2017

  21. Singh, A., Sha, J., Narayan, K.S., Achim, T., Abbeel, P.: BigBIRD: a large-scale 3D database of object instances. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 509–516. IEEE, May 2014

    Google Scholar 

  22. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. arXiv Preprint arXiv:1703.06907, March 2017

  23. Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3234–3243. IEEE, Jun 2016

    Google Scholar 

  24. Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: SceneNet: Understanding Real World Indoor Scenes With Synthetic Data. arXiv Preprint arXiv:1511.07041, November 2015. https://doi.org/10.1109/CVPR.2016.442

  25. Yao, Y., Rosasco, L., Caponnetto, A.: On early stopping in gradient descent learning. Constr. Approximation 26(2), 289–315 (2007)

    Article  MathSciNet  Google Scholar 

  26. Sharp, T.: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16–21 June 2012. IEEE (2012)

    Google Scholar 

  27. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: an information-rich 3D model repository. arXiv Preprint arXiv:1512.03012, December 2015. https://doi.org/10.1145/3005274.3005291

  28. Archive 3D (2015)

    Google Scholar 

  29. Barker, J., Sarathy, S., July, A.T.: DetectNet: deep neural network for object detection in DIGITS (2016)

    Google Scholar 

  30. Vlastelica, M.P., Hayrapetyan, S., Tapaswi, M., Stiefelhagen, R.: Kit at MediaEval 2015 - evaluating visual cues for affective impact of movies task. In: CEUR Workshop Proceedings, vol. 1436, pp. 675–678. ACM Press, New York (2015)

    Google Scholar 

  31. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  32. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  33. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  34. Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 340–353. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_25

    Chapter  Google Scholar 

Download references

Acknowledgment

We acknowledge funding support from Innit Inc. consultancy grant CNS/INNIT/EE/P0210/1617/0007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravi Hegde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rajpura, P. et al. (2018). Transfer Learning by Finetuning Pretrained CNNs Entirely with Synthetic Images. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds) Computer Vision, Pattern Recognition, Image Processing, and Graphics. NCVPRIPG 2017. Communications in Computer and Information Science, vol 841. Springer, Singapore. https://doi.org/10.1007/978-981-13-0020-2_45

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0020-2_45

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0019-6

  • Online ISBN: 978-981-13-0020-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics