Advertisement

CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

  • Sheng Guo
  • Weilin HuangEmail author
  • Haozhi Zhang
  • Chenfan Zhuang
  • Dengke Dong
  • Matthew R. Scott
  • Dinglong Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

We present a simple yet efficient approach capable of training deep neural networks on large-scale weakly-supervised web images, which are crawled raw from the Internet by using text queries, without any human annotation. We develop a principled learning strategy by leveraging curriculum learning, with the goal of handling a massive amount of noisy labels and data imbalance effectively. We design a new learning curriculum by measuring the complexity of data using its distribution density in a feature space, and rank the complexity in an unsupervised manner. This allows for an efficient implementation of curriculum learning on large-scale web images, resulting in a high-performance CNN the model, where the negative impact of noisy labels is reduced substantially. Importantly, we show by experiments that those images with highly noisy labels can surprisingly improve the generalization capability of model, by serving as a manner of regularization. Our approaches obtain state-of-the-art performance on four benchmarks: WebVision, ImageNet, Clothing-1M and Food-101. With an ensemble of multiple models, we achieved a top-5 error rate of 5.2% on the WebVision challenge [18] for 1000-category classification. This result was the top performance by a wide margin, outperforming second place by a nearly 50% relative error rate. Code and models are available at: https://github.com/MalongTech/CurriculumNet.

Keywords

Curriculum learning Weakly supervised Noisy data Large-scale Web images 

References

  1. 1.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML, pp. 41–48. ACM (2009)Google Scholar
  2. 2.
    Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_29CrossRefGoogle Scholar
  3. 3.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. CoRR abs/1106.0219 (1999)CrossRefGoogle Scholar
  4. 4.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. CoRR abs/1606.00915 (2016)Google Scholar
  5. 5.
    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)Google Scholar
  6. 6.
    Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2007 (voc 2007) results (2007, 2008). In: URL http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
  7. 7.
    Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefGoogle Scholar
  8. 8.
    Guo, S., Huang, W., Wang, L., Qiao, Y.: Locally-supervised deep hybrid model for scene recognition. IEEE Trans. Image Process. (TIP) 26, 808–820 (2017)MathSciNetCrossRefGoogle Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  10. 10.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)Google Scholar
  11. 11.
    Hong, S., Noh, H., Han, B.: Decoupled deep neural network for semi-supervised semantic segmentation. In: NIPS, pp. 1495–1503 (2015)Google Scholar
  12. 12.
    Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)Google Scholar
  13. 13.
    Misra, I., Lawrence Zitnick, C., Mitchell, M., Girshick, R.: Seeing through the human reporting bias: visual classifiers from noisy human-centric labels. In: CVPR (2016)Google Scholar
  14. 14.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015)Google Scholar
  15. 15.
    Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: MentorNet: regularizing very deep neural networks on corrupted labels. CoRR abs/1712.05055 (2017)Google Scholar
  16. 16.
    Larsen, J., Nonboe, L., Hintz-Madsen, M., Hansen, L.K.: Design of robust neural network classifiers. In: ICASSP (1998)Google Scholar
  17. 17.
    Lee, K.H., He, X., Zhang, L., Yang, L.: CleanNet: transfer learning for scalable image classifier training with label noise. CoRR abs/1711.07131 (2017)Google Scholar
  18. 18.
    Li, W., et al.: WebVision challenge: visual learning and understanding with web data. CoRR abs/1705.05640 (2017)Google Scholar
  19. 19.
    Li, W., Wang, L., Li, W., Agustsson, E., Van Gool, L.: WebVision database: visual learning and understanding from web data. CoRR abs/1708.02862 (2017)Google Scholar
  20. 20.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection, pp. 2980–2988 (2017)Google Scholar
  21. 21.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  22. 22.
    Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  23. 23.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)Google Scholar
  24. 24.
    Pandey, P., Deepthi, A., Mandal, B., Puhan, N.: FoodNet: recognizing foods using ensemble of deep networks. IEEE Signal Process. Lett. 24(12), 1758–1762 (2017)CrossRefGoogle Scholar
  25. 25.
    Patrini, G., Rozza, A., Menon, A.K., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach, pp. 1944–1952 (2017)Google Scholar
  26. 26.
    Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: NIPS (2009)Google Scholar
  27. 27.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)Google Scholar
  28. 28.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  29. 29.
    Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)CrossRefGoogle Scholar
  30. 30.
    Rolnick, D., Veit, A., Belongie, S., Shavit, N.: Deep learning is robust to massive label noise. CoRR abs/1705.10694 (2017)Google Scholar
  31. 31.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  32. 32.
    Sukhbaatar, S., Fergus, R.: Learning from noisy labels with deep neural networks. CoRR abs/1406.2080 (2014)Google Scholar
  33. 33.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, pp. 4278–4284 (2017)Google Scholar
  34. 34.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)Google Scholar
  35. 35.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)Google Scholar
  36. 36.
    Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: CVPR (2017)Google Scholar
  37. 37.
    Wang, L., Guo, S., Huang, W., Xiong, Y., Qiao, Y.: Knowledge guided disambiguation for large-scale scene classification with multi-resolution CNNs. IEEE Trans. Image Process. (TIP) 26, 2055–2068 (2017)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Chen, X., Shrivastava, A., Gupta, A.: Neil: extracting visual knowledge from web data. In: ICCV (2013)Google Scholar
  39. 39.
    Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: CVPR, pp. 2691–2699 (2015)Google Scholar
  40. 40.
    Zhu, X.: Semi-supervised learning literature survey. CoRR abs/1106.0219 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sheng Guo
    • 1
    • 2
  • Weilin Huang
    • 1
    • 2
    Email author
  • Haozhi Zhang
    • 1
    • 2
  • Chenfan Zhuang
    • 1
    • 2
  • Dengke Dong
    • 1
    • 2
  • Matthew R. Scott
    • 1
    • 2
  • Dinglong Huang
    • 1
    • 2
  1. 1.Malong TechnologiesShenzhenChina
  2. 2.Shenzhen Malong Artificial Intelligence Research CenterShenzhenChina

Personalised recommendations