Advertisement

Journal of the Indian Institute of Science

, Volume 99, Issue 2, pp 177–199 | Cite as

Beyond Supervised Learning: A Computer Vision Perspective

  • Lovish ChumEmail author
  • Anbumani Subramanian
  • Vineeth N. Balasubramanian
  • C. V. Jawahar
Review Article
  • 182 Downloads

Abstract

Fully supervised deep learning-based methods have created a profound impact in various fields of computer science. Compared to classical methods, supervised deep learning-based techniques face scalability issues as they require huge amounts of labeled data and, more significantly, are unable to generalize to multiple domains and tasks. In recent years, a lot of research has been targeted towards addressing these issues within the deep learning community. Although there have been extensive surveys on learning paradigms such as semi-supervised and unsupervised learning, there are a few timely reviews after the emergence of deep learning. In this paper, we provide an overview of the contemporary literature surrounding alternatives to fully supervised learning in the deep learning context. First, we summarize the relevant techniques that fall between the paradigm of supervised and unsupervised learning. Second, we take autonomous navigation as a running example to explain and compare different models. Finally, we highlight some shortcomings of current methods and suggest future directions.

Keywords

Deep learning Synthetic data Domain adaptation Weakly supervised learning Few-shot learning Self-supervised learning 

Notes

References

  1. 1.
    Abadi M, Andersen DG (2016) Learning to protect communications with adversarial neural cryptography. CoRR. arXiv:1610.06918
  2. 2.
    Abu-El-Haija S, Kothari N, Lee J, Natsev AP, Toderici G, Varadarajan B, Vijayanarasimhan S (2016) Youtube-8m: a large-scale video classification benchmark. arXiv:1609.08675v1
  3. 3.
    Agrawal P, Carreira J, Malik J (2015) Learning to see by moving. In: International conference on computer vision (CVPR), Boston, MA, USAGoogle Scholar
  4. 4.
    Akata Z, Perronnin F, Harchaoui Z, Schmid C (2013) Label-embedding for attribute-based classification. In: Computer vision and pattern recognition (CVPR), Portland, OR, USAGoogle Scholar
  5. 5.
    Alhaija H, Mustikovela S, Mescheder L, Geiger A, Rother C (2018) Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int J Comput Vis 126(9):961–972Google Scholar
  6. 6.
    Andrychowicz M, Denil M, Gomez S, Hoffman MW, Pfau D, Schaul T, Shillingford B, De Freitas N (2016) Learning to learn by gradient descent by gradient descent. In: Advances in neural information processing systems (NIPS), Barcelona, SpainGoogle Scholar
  7. 7.
    Antoniou A, Storkey A, Edwards H (2017) Data augmentation generative adversarial networks. CoRR. arXiv:1711.04340
  8. 8.
    Arandjelovic R, Zisserman A (2017) Look, listen and learn. In: International conference on computervision (ICCV), Venice, ItalyGoogle Scholar
  9. 9.
    Arpit D, Jastrzębskis S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y, et al (2017) A closer look at memorization in deep networks. In: International conference on machine learning (ICML), Sydney, AustraliaGoogle Scholar
  10. 10.
    Aubry M, Russell BC (2015) Understanding deep features with computer-generated imagery. In:International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  11. 11.
    Aubry M, Maturana D, Efros AA, Russell BC, Sivic J (2014) Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of cad models. In: Computer vision and pattern recognition (CVPR), Columbus, OH, USAGoogle Scholar
  12. 12.
    Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. Trans Pattern Anal Mach Intell 39(12):2481–2495Google Scholar
  13. 13.
    Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations (ICLR), San Diego, CA, USAGoogle Scholar
  14. 14.
    Bansal A, Sikka K, Sharma G, Chellappa R, Divakaran A (2018) Zero-shot object detection. In: European conference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  15. 15.
    Bearman A, Russakovsky O, Ferrari V, Fei-Fei L (2016) Whats the point: Semantic segmentation with pointsupervision. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  16. 16.
    Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW (2010) A theory of learning from different domains. Mach Learn 79(1–2):151–175Google Scholar
  17. 17.
    Bilen H, Vedaldi A (2016) Weakly supervised deep detection networks. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  18. 18.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Computational learning theory (CoLT), Madison, Wisconsin, USAGoogle Scholar
  19. 19.
    Bousmalis K, Trigeorgis G, Silberman N, Krishnan D, Erhan D (2016) Domain separation networks. In: Advances in neural information processing systems (NIPS), Barcelona, SpainGoogle Scholar
  20. 20.
    Bousmalis K, Silberman N, Dohan D, Erhan D, Krishnan D (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  21. 21.
    Busto PP, Gall J (2017) Open set domain adaptation. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  22. 22.
    Butler DJ, Wulff J, Stanley GB, Black MJ (2012) A naturalistic open source movie for optical flow evaluation. In: European conference on computer vision (ECCV), Firenze, ItalyGoogle Scholar
  23. 23.
    Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  24. 24.
    Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (Chapelle O. et al., eds.; 2006) [book reviews]. IEEE Trans Neural Netw 20(3):542Google Scholar
  25. 25.
    Chattopadhyay R, Sun Q, Fan W, Davidson I, Panchanathan S, Ye J (2012) Multi-source domain adaptation and its application to early detection of fatigue. Trans Knowl Discov Data (TKDD) 6(4):18Google Scholar
  26. 26.
    Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  27. 27.
    Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. CoRR. arXiv:1706.05587
  28. 28.
    Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. Pattern Anal Mach Intell 40(4):834–848Google Scholar
  29. 29.
    Chen TH, Liao YH, Chuang CY, Hsu WT, Fu J, Sun M (2017) Show, adapt and tell: adversarial training of cross-domain image captioner. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  30. 30.
    Chen X, Gupta A (2015) Webly supervised learning of convolutional networks. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  31. 31.
    Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster R-CNN for object detection in the wild. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  32. 32.
    Chen YH, Chen WY, Chen YT, Tsai BC, Wang YCF, Sun M (2017) No more discrimination: cross city adaptation of road scene segmenters. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  33. 33.
    Chen Z, Liu B (2016) Lifelong machine learning. Synth Lect Artif Intell Mach Learn 10(3):1–145Google Scholar
  34. 34.
    Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145Google Scholar
  35. 35.
    Cordts M, Omran M, Ramos S, Scharwächter T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2015) The cityscapes dataset. In: CVPR workshop on the future of datasets in vision (CVPRW), Boston, MA, USAGoogle Scholar
  36. 36.
    Courty N, Flamary R, Habrard A, Rakotomamonjy A (2017) Joint distribution optimal transportation for domain adaptation. In: Advances in neural information processing systems (NIPS), Long Beach, CA, USAGoogle Scholar
  37. 37.
    Csurka G (2017) Domain adaptation for visual applications: a comprehensive survey. CoRR. arXiv:1702.05374
  38. 38.
    Damodaran BB, Kellenberger B, Flamary R, Tuia D, Courty N (2018) Deepjdot: deep joint distribution optimal transport for unsupervised domain adaptation. In: European conference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  39. 39.
    Daumé III H (2007) Frustratingly easy domain adaptation. In: Association of computational linguistics (ACL), Prague, Czech RepublicGoogle Scholar
  40. 40.
    Day O, Khoshgoftaar TM (2017) A survey on heterogeneous transfer learning. J Big Data 4(1):29Google Scholar
  41. 41.
    De Souza CR, Gaidon A, Cabon Y, Peña AML (2017) Procedural generation of videos to train deep action recognition networks. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  42. 42.
    Deng W, Zheng L, Kang G, Yang Y, Ye Q, Jiao J (2018) Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person reidentification. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  43. 43.
    Divvala SK, Farhadi A, Guestrin C (2014) Learning everything about anything: webly-supervised visual concept learning. In: Computer vision and pattern recognition (CVPR), Columbus, OH, USAGoogle Scholar
  44. 44.
    Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  45. 45.
    Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) DeCAF: a deep convolutional activation feature for generic visual recognition. In: International conference on machine learning (ICML), Beijing, ChinaGoogle Scholar
  46. 46.
    Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  47. 47.
    Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) FlowNet: learning optical flow with convolutional networks. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  48. 48.
    Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Conference on robot learning (CoRL), Mountain View, California, USAGoogle Scholar
  49. 49.
    Duan L, Xu D, Tsang I (2011) Learning with augmented features for heterogeneous domain adaptation. In: International conference on machine learning (ICML), Edinburgh, ScotlandGoogle Scholar
  50. 50.
    Duan L, Tsang IW, Xu D (2012) Domain transfer multiple kernel learning. Trans Pattern Anal Mach Intell 34(3):465–479Google Scholar
  51. 51.
    Duchenne O, Audibert JY, Keriven R, Ponce J, Ségonne F (2008) Segmentation by transduction. In: Computer vision and pattern recognition (CVPR), Anchorage, AL, USAGoogle Scholar
  52. 52.
    Dwibedi D, Misra I, Hebert M (2017) Cut, paste and learn: surprisingly easy synthesis for instance detection. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  53. 53.
    Elhamifar E, Sapiro G, Yang A, Shankar Sasrty S (2013) A convex optimization framework for active learning. In: International conference on computer vision (ICCV), Sydney, AustraliaGoogle Scholar
  54. 54.
    Fan J, Shen Y, Zhou N, Gao Y (2010) Harvesting large-scale weakly-tagged image databases from the web. In: Computer vision and pattern recognition (CVPR), San Francisco, CA, USAGoogle Scholar
  55. 55.
    Fang M, Li Y, Cohn T (2017) Learning how to active learn: a deep reinforcement learning approach. In: Association of computational linguistics (ACL), Vancouver, CanadaGoogle Scholar
  56. 56.
    Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes. In: Computer vision and pattern recognition (CVPR), Miami, FL, USAGoogle Scholar
  57. 57.
    Fei-Fei L, Fergus R, Perona P (2006) One-shot learning of object categories. Trans Pattern Anal Mach Intell 28(4):594–611Google Scholar
  58. 58.
    Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  59. 59.
    Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference of machine learning (ICML), Sydney, AustraliaGoogle Scholar
  60. 60.
    Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. Trans Neural Netw Learn Syst 25(5):845–869Google Scholar
  61. 61.
    Freytag A, Rodner E, Denzler J (2014) Selecting influential examples: active learning with expected model output changes. In: European conference on computer vision (ECCV), Zurich, SwitzerlandGoogle Scholar
  62. 62.
    Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J, Greenspan H (2018) GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. CoRR. arXiv:1803.01229
  63. 63.
    Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Mikolov T et al (2013) Devise: a deep visual-semantic embedding model. In: Advances in neural information processing systems (NIPS), Stateline, NA, USAGoogle Scholar
  64. 64.
    Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  65. 65.
    Gal Y, Islam R, Ghahramani Z (2017) Deep Bayesian active learning with image data. In: Advances in neural information processing systems workshops, Long Beach, CA, USAGoogle Scholar
  66. 66.
    Gan C, Sun C, Duan L, Gong B (2016) Webly-supervised video recognition by mutually voting for relevant web images and web video frames. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  67. 67.
    Gan C, Yao T, Yang K, Yang Y, Mei T (2016) You lead, we exceed: labor-free video concept learning by jointly exploiting web videos and images. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  68. 68.
    Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2096–2030Google Scholar
  69. 69.
    Gao M, Li A, Yu R, Morariu VI, Davis LS (2018) C-WSL: count-guided weakly supervised localization. In: Europeanconference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  70. 70.
    Gebru T, Hoffman J, Fei-Fei L (2017) Fine-grained recognition in the wild: a multi-task domain adaptation approach. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  71. 71.
    Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the KITTI dataset. Int J Robot Res 32(11):1231–1237Google Scholar
  72. 72.
    Ghifary M, Kleijn WB, Zhang M, Balduzzi D, Li W (2016) Deep reconstruction-classification networks for unsupervised domain adaptation. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  73. 73.
    Ghosh A, Kumar H, Sastry P (2017) Robust loss functions under label noise for deep neural networks. In: AAAI, San Francisco, CA, USAGoogle Scholar
  74. 74.
    Girdhar R, Ramanan D, Gupta A, Sivic J, Russell B (2017) ActionVLAD: learning spatio-temporal aggregation for action classification. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  75. 75.
    Girshick R (2015) Fast R-CNN. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  76. 76.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer vision and pattern recognition (CVPR), Columbus, OH, USAGoogle Scholar
  77. 77.
    Gomez L, Patel Y, Rusiñol M, Karatzas D, Jawahar C (2017) Self-supervised learning of visual features through embedding images into text topic spaces. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  78. 78.
    Gong B, Shi Y, Sha F, Grauman K (2012) Geodesic flow kernel for unsupervised domain adaptation. In: Computer vision and pattern recognition (CVPR), Providence, RI, USAGoogle Scholar
  79. 79.
    Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems (NIPS), Montreal, CanadaGoogle Scholar
  80. 80.
    Gopalan R, Li R, Chellappa R (2011) Domain adaptation for object recognition: an unsupervised approach. In: International conference on computer vision (ICCV), Barcelona, SpainGoogle Scholar
  81. 81.
    Goyal Y, Khot T, Summers-Stay D, Batra D, Parikh D (2017) Making the V in VQA matter: elevating the role of image understanding in Visual Question Answering. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  82. 82.
    Graves A (2013) Generating sequences with recurrent neural networks. CoRR. arXiv:1308.0850
  83. 83.
    Graves A, Jaitly N (2014) Towards end-to-end speech recognition with recurrent neural networks. In: International conference on machine learning (ICML), Beijing, ChinaGoogle Scholar
  84. 84.
    Gu J, Neubig G, Cho K, Li VO (2017) Learning to translate in real-time with neural machine translation. In: Association of computational linguistics (ACL), Vancouver, CanadaGoogle Scholar
  85. 85.
    Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  86. 86.
    Habibian A, Mensink T, Snoek CG (2014) Composite concept discovery for zero-shot video event detection. In: International conference on multimedia retrieval (ICMR), Glasgow, UKGoogle Scholar
  87. 87.
    Haeusser P, Frerix T, Mordvintsev A, Cremers D (2017) Associative domain adaptation. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  88. 88.
    Handa A, Whelan T, McDonald J, Davison AJ (2014) A benchmark for RGB-D visual odometry, 3D reconstruction and slam. In: International conference on robotics and automation (ICRA), Hong KongGoogle Scholar
  89. 89.
    Handa A, Patraucean V, Badrinarayanan V, Stent S, Cipolla R (2016) Understanding real world indoor scenes with synthetic data. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  90. 90.
    Hariharan B, Girshick RB (2017) Low-shot visual recognition by shrinking and hallucinating features. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  91. 91.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  92. 92.
    He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: International conference on computer vision (ICCV), Honolulu, HI, USAGoogle Scholar
  93. 93.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507Google Scholar
  94. 94.
    Hoffman J, Gupta S, Leong J, Guadarrama S, Darrell T (2016) Cross-modal adaptation for RGB-D detection. In: International conference on robotics and automation (ICRA), Stockholm, SwedenGoogle Scholar
  95. 95.
    Hoffman J, Wang D, Yu F, Darrell T (2016) FCNs in the wild: pixel-level adversarial and constraint-based adaptation. CoRR. arXiv:1612.02649
  96. 96.
    Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  97. 97.
    Huang J, Gretton A, Borgwardt KM, Schölkopf B, Smola AJ (2007) Correcting sample selection bias by unlabeled data. In: Advances in neural information processing systems (NIPS), Vancouver, CanadaGoogle Scholar
  98. 98.
    Huang Z, Wang X, Wang J, Liu W, Wang J (2018) Weakly-supervised semantic segmentation network with deep seeded region growing. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  99. 99.
    Huh M, Liu A, Owens A, Efros AA (2018) Fighting fake news: image splice detection via learned self-consistency. In: European conference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  100. 100.
    Ilse M, Tomczak JM, Welling M (2018) Attention-based deep multiple instance learning. In: International conference on machine learning (ICML), New Orleans, LA, USAGoogle Scholar
  101. 101.
    Inoue N, Furuta R, Yamasaki T, Aizawa K (2018) Cross-domain weakly-supervised object detection through progressive domain adaptation. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  102. 102.
    Janai J, Güney F, Behl A, Geiger A (2017) Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. CoRR. arXiv:1704.05519
  103. 103.
    Jayaraman D, Grauman K (2015) Learning image representations tied to ego-motion. In: International conference on computer vision (CVPR), Boston, MA, USAGoogle Scholar
  104. 104.
    Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. Trans Pattern Anal Mach Intell 35(1):221–231Google Scholar
  105. 105.
    Jiang H, Larsson G, Maire M, Shakhnarovich G, Learned-Miller E (2018) Self-supervised relative depth learning for urban scene understanding. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  106. 106.
    Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G et al (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. In: Association of computational linguistics (ACL), Vancouver, CanadaGoogle Scholar
  107. 107.
    Joulin A, van der Maaten L, Jabri A, Vasilache N (2016) Learning visual features from large weakly supervised data. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  108. 108.
    Kaneva B, Torralba A, Freeman WT (2011) Evaluation of image features using a photorealistic virtual world. In: International conference on computer vision (ICCV), Barcelona, SpainGoogle Scholar
  109. 109.
    Kapoor A, Hua G, Akbarzadeh A, Baker S (2009) Which faces to tag: adding prior constraints into active learning. In: International conference on computer vision (ICCV), Kyoto, JapanGoogle Scholar
  110. 110.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition (CVPR), Columbus, OH, USAGoogle Scholar
  111. 111.
    Khoreva A, Benenson R, Hosang JH, Hein M, Schiele B (2017) Simple does it: weakly supervised instance and semantic segmentation. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  112. 112.
    Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: International conference on machine learning (ICML), Sydney, AustraliaGoogle Scholar
  113. 113.
    Kingma DP, Welling M (2013) Auto-encoding variational Bayes. In: International conference on learning representations (ICLR), Scottsdale, AZ, USAGoogle Scholar
  114. 114.
    Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML deep learning workshop, Lille, FranceGoogle Scholar
  115. 115.
    Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73Google Scholar
  116. 116.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS), Stateline, NV, USAGoogle Scholar
  117. 117.
    Kulis B, Saenko K, Darrell T (2011) What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: Computer vision and pattern recognition (CVPR), Colorado Springs, CO, USAGoogle Scholar
  118. 118.
    Kumar MP, Packer B, Koller D (2010) Self-paced learning for latent variable models. In: Advances in neural information processing systems (NIPS), Vancouver, CanadaGoogle Scholar
  119. 119.
    Kurakin A, Goodfellow I, Bengio S (2015) Adversarial examples in the physical world. In: International conference on learning representations (ICLR), San Diego, CA, USAGoogle Scholar
  120. 120.
    Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Duerig T, Ferrari V (2018) The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. CoRR. arXiv:1811.00982
  121. 121.
    Lake BM, Salakhutdinov RR, Tenenbaum J (2013) One-shot learning by inverting a compositional causal process. In: Advances in neural information processing systems (NIPS), Stateline, NA, USAGoogle Scholar
  122. 122.
    Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338Google Scholar
  123. 123.
    Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer. In: Computer vision and pattern recognition, 2009 (CVPR), Miami, FL, USAGoogle Scholar
  124. 124.
    Lampert CH, Nickisch H, Harmeling S (2014) Attribute-based classification for zero-shot visual object categorization. Trans Pattern Anal Mach Intell 36(3):453–465Google Scholar
  125. 125.
    Larsson G, Maire M, Shakhnarovich G (2017) Colorization as a proxy task for visual understanding. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  126. 126.
    Le Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data, Riva del Garda, ItalyGoogle Scholar
  127. 127.
    Lee HY, Huang JB, Singh M, Yang MH (2017) Unsupervised representation learning by sorting sequences. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  128. 128.
    Levinkov E, Fritz M (2013) Sequential Bayesian model update under structured scene prior for semantic road scenes labeling. In: International conference on computer vision (ICCV), Sydney, AustraliaGoogle Scholar
  129. 129.
    Li K, Li Y, You S, Barnes N (2017) Photo-realistic simulation of road scene for data-driven methods in bad weather. In: Conference on computer vision and pattern recognition workshop (CVPRW), Honolulu, HI, USAGoogle Scholar
  130. 130.
    Li W, Duan L, Xu D, Tsang IW (2014) Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. Trans Pattern Anal Mach Intell 36(6):1134–1148Google Scholar
  131. 131.
    Li Y, Wang N, Shi J, Liu J, Hou X (2016) Revisiting batch normalization for practical domain adaptation. In: International conference on learning representations workshops, Toulon, FranceGoogle Scholar
  132. 132.
    Lin D, Dai J, Jia J, He K, Sun J (2016) ScribbleSup: scribble-supervised convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  133. 133.
    Lin G, Milan A, Shen C, Reid ID (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  134. 134.
    Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision (ECCV), Zurich, SwitzerlandGoogle Scholar
  135. 135.
    Liu B, Ferrari V (2017) Active learning for human pose estimation. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  136. 136.
    Liu MY, Tuzel O (2016) Coupled generative adversarial networks. In: Advances in neural information processing systems (NIPS), Barcelona, SpainGoogle Scholar
  137. 137.
    Liu X, Song L, Wu X, Tan T (2016) Transferring deep representation for NIR-VIS heterogeneous face recognition. In: International conference on biometrics (ICB), Halmstad, SwedenGoogle Scholar
  138. 138.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  139. 139.
    Lu H, Zhang L, Cao Z, Wei W, Xian K, Shen C, van den Hengel A (2017) When unsupervised domain adaptation meets tensor representations. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  140. 140.
    Lu Y, Tai YW, Tang CK (2018) Attribute-guided face generation using conditional CycleGAN. In: European conference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  141. 141.
    Ma F, Cavalheiro GV, Karaman S (2018) Self-supervised sparse-to-dense: self-supervised depth completion from LiDAR and monocular camera. In: International conference on robotics and automation (ICRA), Brisbane, AustraliaGoogle Scholar
  142. 142.
    Maninis KK, Caelles S, Pont-Tuset J, Van Gool L (2017) Deep extreme cut: from extreme points to object segmentation. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  143. 143.
    Mehrotra A, Dukkipati A (2017) Generative adversarial residual pairwise networks for one shot learning. CoRR. arXiv:1703.08033
  144. 144.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Advances in neural information processing systems (NIPS), Stateline, NA, USAGoogle Scholar
  145. 145.
    Mishra N, Rohaninejad M, Chen X, Abbeel P (2018) A simple neural attentive meta-learner. In: International conference on learning representations (ICLR), New Orleans, LA, USAGoogle Scholar
  146. 146.
    Misra I, Lawrence Zitnick C, Mitchell M, Girshick R (2016a) Seeing through the human reporting bias: visual classifiers from noisy human-centric labels. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  147. 147.
    Misra I, Zitnick CL, Hebert M (2016b) Shuffle and learn: unsupervised learning using temporal order verification. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  148. 148.
    Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems (NIPS), Stateline, NA, USAGoogle Scholar
  149. 149.
    Nguyen HV, Ho HT, Patel VM, Chellappa R (2015) Dash-n: joint hierarchical domain adaptation and feature learning. IEEE Trans Image Process 24(12):5479–5491Google Scholar
  150. 150.
    Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  151. 151.
    Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Computer vision and pattern recognition (CVPR), Columbus, OH, USAGoogle Scholar
  152. 152.
    Owens A, Wu J, McDermott JH, Freeman WT, Torralba A (2016) Ambient sound provides supervision for visual learning. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  153. 153.
    Pan SJ, Yang Q et al (2010) A survey on transfer learning. Trans Knowl Data Eng 22(10):1345–1359Google Scholar
  154. 154.
    Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210Google Scholar
  155. 155.
    Papadopoulos DP, Uijlings JR, Keller F, Ferrari V (2016) We don’t need no bounding-boxes: training object class detectors using only human verification. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  156. 156.
    Papadopoulos DP, Uijlings JR, Keller F, Ferrari V (2017) Extreme clicking for efficient object annotation. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  157. 157.
    Papadopoulos DP, Uijlings JR, Keller F, Ferrari V (2017) Training object class detectors with click supervision. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  158. 158.
    Patel VM, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: a survey of recent advances. Signal Process Mag 32(3):53–69Google Scholar
  159. 159.
    Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: feature learning by inpainting. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  160. 160.
    Pathak D, Girshick RB, Dollár P, Darrell T, Hariharan B (2017) Learning features by watching objects move. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  161. 161.
    Peng KC, Wu Z, Ernst J (2018) Zero-shot deep domain adaptation. In: European conference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  162. 162.
    Peng X, Sun B, Ali K, Saenko K (2015) Learning deep object detectors from 3D models. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  163. 163.
    Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  164. 164.
    Pinto L, Gandhi D, Han Y, Park YL, Gupta A (2016) The curious robot: learning visual representations via physical interactions. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  165. 165.
    Qiao S, Shen W, Zhang Z, Wang B, Yuille A (2018) Deep co-training for semi-supervised image recognition. In: European conference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  166. 166.
    Qin J, Liu L, Shao L, Shen F, Ni B, Chen J, Wang Y (2017) Zero-shot action recognition with error-correcting output codes. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  167. 167.
    Qiu W, Yuille A (2016) UnrealCV: Connecting computer vision to unreal engine. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  168. 168.
    Rader N, Bausano M, Richards JE (1980) On the nature of the visual-cliff-avoidance response in human infants. Child Dev 51(1):61–68Google Scholar
  169. 169.
    Raj A, Namboodiri VP, Tuytelaars T (2015) Subspace alignment based domain adaptation for RCNN detector. In: British machine vision conference (BMVC), Swansea, UKGoogle Scholar
  170. 170.
    Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. In: Conference on empirical methods in natural language processing (EMNLP), Austin, TX, USAGoogle Scholar
  171. 171.
    Ratner AJ, Ehrenberg H, Hussain Z, Dunnmon J, Ré C (2017) Learning to compose domain-specific transformations for data augmentation. In: Advances in neural information processing systems, Long Beach, CA, USAGoogle Scholar
  172. 172.
    Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: International conference on learning representations (ICLR), Toulon, FranceGoogle Scholar
  173. 173.
    Redko I, Habrard A, Sebban M (2017) In: Theoretical analysis of domain adaptation with optimal transport. In: Joint European conference on machine learning and knowledge discovery in databases (ECML KDD), Skopje, MacedoniaGoogle Scholar
  174. 174.
    Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  175. 175.
    Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014) Training deep neural networks on noisy labels with bootstrapping. In: International conference on learning representations workshops, Banff, CanadaGoogle Scholar
  176. 176.
    Reed S, Akata Z, Lee H, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  177. 177.
    Remez T, Huang J, Brown M (2018) Learning to segment via cut-and-paste. In: European conference on computer vision (ECCV), Munich, GermanyGoogle Scholar
  178. 178.
    Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), Montreal, CanadaGoogle Scholar
  179. 179.
    Richter SR, Vineet V, Roth S, Koltun V (2016) Playing for data: ground truth from computer games. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  180. 180.
    Richter SR, Hayder Z, Koltun V (2017) Playing for benchmarks. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  181. 181.
    Rippel O, Paluri M, Dollar P, Bourdev L (2016) Metric learning with adaptive density discrimination. In: International conference on learning representations (ICLR), San Juan, Puerto RicoGoogle Scholar
  182. 182.
    Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention (MICCAI), Munich, GermanyGoogle Scholar
  183. 183.
    Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM (2016) The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: The computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  184. 184.
    Roy N, McCallum A (2001) Toward optimal active learning through monte carlo estimation of error reduction. In: International conference on machine learning (ICML), Williamstown, MA, USAGoogle Scholar
  185. 185.
    Roy S, Unmesh A, Namboodiri VP (2018) Deep active learning for object detection. In: British machine vision conference (BMVC), Newcastle, UKGoogle Scholar
  186. 186.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252Google Scholar
  187. 187.
    Russo P, Carlucci FM, Tommasi T, Caputo B (2018) From source to target and back: symmetric bi-directional adaptive GAN. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  188. 188.
    Sadeghi F, Levine S (2017) CAD2RL: real single-image flight without a single real image. In: Robotics science and systems (RSS), Boston, MA, USAGoogle Scholar
  189. 189.
    Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: European conference on computer vision (ECCV), Crete, GreeceGoogle Scholar
  190. 190.
    Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Conference of the international speech communication association (INTERSPEECH), SingaporeGoogle Scholar
  191. 191.
    Sakaridis C, Dai D, Van Gool L (2018) Semantic foggy scene understanding with synthetic data. Int J Comput Vis 126:973–992Google Scholar
  192. 192.
    Salakhutdinov R, Larochelle H (2010) Efficient learning of deep Boltzmann machines. In: International conference on artificial intelligence and statistics (ICAIS), San Diego, CA, USAGoogle Scholar
  193. 193.
    Sankaranarayanan S, Balaji Y, Castillo CD, Chellappa R (2018) Generate to adapt: aligning domains using generative adversarial networks. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  194. 194.
    Scheffer T, Decomain C, Wrobel S (2001) Active hidden Markov models for information extraction. In: International symposium on intelligent data analysis, Berlin, HeidelbergGoogle Scholar
  195. 195.
    Schlegl T, Seeböck P, Waldstein SM, Schmidt-Erfurth U, Langs G (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging (IPMI), Boone, NC, USAGoogle Scholar
  196. 196.
    Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  197. 197.
    Sener O, Savarese S (2018) Active learning for convolutional neural networks: a core-set approach. In: International conference on learning representations (ICLR), New Orleans, LA, USAGoogle Scholar
  198. 198.
    Settles B (2009) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-MadisonGoogle Scholar
  199. 199.
    Shao L, Zhu F, Li X (2015) Transfer learning for visual categorization: a survey. IEEE Trans Neural Netw Learn Syst 26(5):1019–1034Google Scholar
  200. 200.
    Shi M, Ferrari V (2016) Weakly supervised object localization using size estimates. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  201. 201.
    Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244Google Scholar
  202. 202.
    Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, Webb R (2017) Learning from simulated and unsupervised images through adversarial training. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  203. 203.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR), San Diego, CA, USAGoogle Scholar
  204. 204.
    Singh S, Gupta A, Efros AA (2012) Unsupervised discovery of mid-level discriminative patches. In: European conference on computer vision (ECCV), Firenze, ItalyGoogle Scholar
  205. 205.
    Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their location in images. In: Computer vision and pattern recognition (CVPR), San Diego, CA, USAGoogle Scholar
  206. 206.
    Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Advances in neural information processing systems (NIPS), Stateline, NA, USAGoogle Scholar
  207. 207.
    Sohn K, Liu S, Zhong G, Yu X, Yang MH, Chandraker M (2017) Unsupervised domain adaptation for face recognition in unlabeled videos. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  208. 208.
    Song HO, Girshick R, Jegelka S, Mairal J, Harchaoui Z, Darrell T (2014) On learning to localize objects with minimal supervision. In: International conference on machine learning (ICML), Beijing, ChinaGoogle Scholar
  209. 209.
    Song HO, Lee YJ, Jegelka S, Darrell T (2014) Weakly-supervised discovery of visual pattern configurations. In: Advances in neural information processing systems (NIPS), Montreal, CanadaGoogle Scholar
  210. 210.
    Stavens D, Thrun S (2006) A self-supervised terrain roughness estimator for off-road autonomous driving. In: Uncertainty in artificial intelligence (UAI), Cambridge, MA, USAGoogle Scholar
  211. 211.
    Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2014) Training convolutional networks with noisy labels. In: International conference on learning representations workshops, Banff, CanadaGoogle Scholar
  212. 212.
    Sun B, Saenko K (2016) Deep coral: correlation alignment for deep domain adaptation. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  213. 213.
    Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems (NIPS), Montreal, CanadaGoogle Scholar
  214. 214.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  215. 215.
    Taigman Y, Polyak A, Wolf L (2017) Unsupervised cross-domain image generation. In: International conference on learning representations (ICLR), Toulon, FranceGoogle Scholar
  216. 216.
    Tan B, Zhang Y, Pan SJ, Yang Q (2017) Distant domain transfer learning. In: AAAI, San Francisco, CA, USAGoogle Scholar
  217. 217.
    Taylor GR, Chosak AJ, Brewer PC (2007) OVVV: using virtual worlds to design and evaluate surveillance systems. In: Computer vision and pattern recognition (CVPR), Minneapolis, MN, USAGoogle Scholar
  218. 218.
    Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li L (2016) Yfcc100m: the new data in multimedia research. Commun ACM 59:64–73Google Scholar
  219. 219.
    Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: International conference on intelligent robots and systems (IROS), Vancouver, CanadaGoogle Scholar
  220. 220.
    Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: ACM international conference on multimedia (MM), Ottawa, CanadaGoogle Scholar
  221. 221.
    Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: Computer vision and pattern recognition (CVPR), Colorado Springs, CO, USAGoogle Scholar
  222. 222.
    Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Computer vision and pattern recognition (CVPR), Columbus, OH, USAGoogle Scholar
  223. 223.
    Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  224. 224.
    Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Computer vision and pattern recognition workshops (CVPRW), Salt Lake City, UT, USAGoogle Scholar
  225. 225.
    Tsai YH, Hung WC, Schulter S, Sohn K, Yang MH, Chandraker M (2018) Learning to adapt structured output space for semantic segmentation. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  226. 226.
    Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. In: Computer vision and pattern recognition (CVPR), Columbus, OH, USAGoogle Scholar
  227. 227.
    Tzeng E, Hoffman J, Darrell T, Saenko K (2015) Simultaneous deep transfer across domains and tasks. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  228. 228.
    Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  229. 229.
    Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior AW, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. CoRR. arXiv:1609.03499 (125)
  230. 230.
    Van Horn G, Branson S, Farrell R, Haber S, Barry J, Ipeirotis P, Perona P, Belongie S (2015) Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  231. 231.
    Varma G, Subramanian A, Namboodiri A, Chandraker M, Jawahar CV (2019) IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: IEEE Winter conference on applications of computer vision (WACV), Waikoloa, HawaiiGoogle Scholar
  232. 232.
    Vazquez D, Lopez AM, Marin J, Ponsa D, Geronimo D (2014) Virtual and real world adaptation for pedestrian detection. Trans Pattern Anal Mach Intell 36(4):797–809Google Scholar
  233. 233.
    Veit A, Alldrin N, Chechik G, Krasin I, Gupta A, Belongie SJ (2017) Learning from noisy large-scale datasets with minimal supervision. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  234. 234.
    Vezhnevets A, Buhmann JM, Ferrari V (2012) Active learning for semantic segmentation with expected change. In: Computer vision and pattern recognition (CVPR), Providence, RI, USAGoogle Scholar
  235. 235.
    Vijayanarasimhan S, Grauman K (2014) Large-scale live active learning: training object detectors with crawled data and crowds. Int J Comput Vis 108(1–2):97–114Google Scholar
  236. 236.
    Vinyals O, Blundell C, Lillicrap T, Wierstra D et al (2016) Matching networks for one shot learning. In: Advances in neural information processing systems (NIPS), Barcelona, SpainGoogle Scholar
  237. 237.
    Vogt P, Smith ADM (2005) Learning color words is slow: a cross-situational learning account. Behav Brain Sci 28(4):509–510Google Scholar
  238. 238.
    Wang C, Mahadevan S (2011) Heterogeneous domain adaptation using manifold alignment. In: International joint conference on artificial intelligence (IJCAI), Barcelona, SpainGoogle Scholar
  239. 239.
    Wang M, Deng W (2018) Deep visual domain adaptation: a survey. Neurocomputing 312:135–153Google Scholar
  240. 240.
    Wang X, Gupta A (2015) Unsupervised learning of visual representations using videos. In: International conference on computer vision (ICCV), Santiago, ChileGoogle Scholar
  241. 241.
    Wang YX, Hebert M (2016) Learning to learn: model regression networks for easy small sample learning. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  242. 242.
    Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. J Big Data 3(1):9Google Scholar
  243. 243.
    Wu J, Yu Y, Huang C, Yu K (2015) Deep multiple instance learning for image classification and auto-annotation. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  244. 244.
    Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, ukasz Kaiser, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR. arXiv:1609.08144
  245. 245.
    Xian Y, Akata Z, Sharma G, Nguyen Q, Hein M, Schiele B (2016) Latent embeddings for zero-shot classification. In: Computer vision and pattern recognition (CVPR), Las Vegas, NV, USAGoogle Scholar
  246. 246.
    Xian Y, Schiele B, Akata Z (2017) Zero-shot learning-the good, the bad and the ugly. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  247. 247.
    Xiao T, Xia T, Yang Y, Huang C, Wang X (2015) Learning from massive noisy labeled data for image classification. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  248. 248.
    Xu J, Schwing AG, Urtasun R (2015) Learning to segment under various forms of weak supervision. In: Computer vision and pattern recognition (CVPR), Boston, MA, USAGoogle Scholar
  249. 249.
    Yan H, Ding Y, Li P, Wang Q, Xu Y, Zuo W (2017) Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  250. 250.
    Yao A, Gall J, Leistner C, Van Gool L (2012) Interactive object detection. In: Computer vision and pattern recognition (CVPR), Providence, RI, USAGoogle Scholar
  251. 251.
    Yi Z, Zhang HR, Tan P, Gong M (2017) DualGAN: unsupervised dual learning for image-to-image translation. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  252. 252.
    Yoo D, Fan H, Boddeti VN, Kitani KM (2018) Efficient k-shot learning with regularized deep networks. In: AAAI, New Orleans, LA, USAGoogle Scholar
  253. 253.
    Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural. networks? In: Advances in neural information processing systems (NIPS), Montreal, CanadaGoogle Scholar
  254. 254.
    Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, Metaxas D (2017a) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  255. 255.
    Zhang J, Ding Z, Li W, Ogunbona P (2018) Importance weighted adversarial nets for partial domain adaptation. In: Computer vision and pattern recognition (CVPR), Salt Lake City, UT, USAGoogle Scholar
  256. 256.
    Zhang L, Xiang T, Gong S et al (2017b) Learning a deep embedding model for zero-shot learning. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  257. 257.
    Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision (ECCV), Amsterdam, NetherlandsGoogle Scholar
  258. 258.
    Zhang R, Isola P, Efros AA (2017c) Split-brain autoencoders: unsupervised learning by cross-channel prediction. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  259. 259.
    Zhang Y, David P, Gong B (2017d) Curriculum domain adaptation for semantic segmentation of urban scenes. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  260. 260.
    Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar
  261. 261.
    Zhu JJ, Bento J (2017) Generative adversarial active learning. In: Advances in neural information processing systems workshops, Long Beach, CAGoogle Scholar
  262. 262.
    Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: International conference on computer vision (ICCV), Venice, ItalyGoogle Scholar
  263. 263.
    Zhu Y, Chen Y, Lu Z, Pan SJ, Xue GR, Yu Y, Yang Q (2011) Heterogeneous transfer learning for image classification. In: AAAI, San Francisco, California, USAGoogle Scholar
  264. 264.
    Zhuang B, Liu L, Li Y, Shen C, Reid ID (2017) Attend in groups: a weakly-supervised deep learning framework for learning from web data. In: Computer vision and pattern recognition (CVPR), Honolulu, HI, USAGoogle Scholar

Copyright information

© Indian Institute of Science 2019

Authors and Affiliations

  • Lovish Chum
    • 1
    Email author
  • Anbumani Subramanian
    • 2
  • Vineeth N. Balasubramanian
    • 3
  • C. V. Jawahar
    • 1
  1. 1.CVITIIIT HyderabadHyderabadIndia
  2. 2.IntelBangaloreIndia
  3. 3.IIT HyderabadHyderabadIndia

Personalised recommendations