Advertisement

Learning with Biased Complementary Labels

  • Xiyu YuEmail author
  • Tongliang Liu
  • Mingming Gong
  • Dacheng Tao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

In this paper, we study the classification problem in which we have access to easily obtainable surrogate for true labels, namely complementary labels, which specify classes that observations do not belong to. Let Y and \(\bar{Y}\) be the true and complementary labels, respectively. We first model the annotation of complementary labels via transition probabilities \(P(\bar{Y}=i|Y=j), i\ne j\in \{1,\cdots ,c\}\), where c is the number of classes. Previous methods implicitly assume that \(P(\bar{Y}=i|Y=j), \forall i\ne j\), are identical, which is not true in practice because humans are biased toward their own experience. For example, as shown in Fig. 1, if an annotator is more familiar with monkeys than prairie dogs when providing complementary labels for meerkats, she is more likely to employ “monkey” as a complementary label. We therefore reason that the transition probabilities will be different. In this paper, we propose a framework that contributes three main innovations to learning with biased complementary labels: (1) It estimates transition probabilities with no bias. (2) It provides a general method to modify traditional loss functions and extends standard deep neural network classifiers to learn with biased complementary labels. (3) It theoretically ensures that the classifier learned with complementary labels converges to the optimal one learned with true labels. Comprehensive experiments on several benchmark datasets validate the superiority of our method to current state-of-the-art methods.

Keywords

Multi-class classification Biased complementary labels Transition matrix Modified loss function 

Notes

Acknowledgement

This work was supported by Australian Research Council Projects FL-170100117, DP-180103424, and LP-150100671. This work was partially supported by SAP SE and research grant from Pfizer titled “Developing Statistical Method to Jointly Model Genotype and High Dimensional Imaging Endophenotype”. We are also grateful for the computational resources provided by Pittsburgh Super Computing grant number TG-ASC170024.

Supplementary material

474172_1_En_5_MOESM1_ESM.pdf (483 kb)
Supplementary material 1 (pdf 482 KB)

References

  1. 1.
    Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(Nov), 463–482 (2002)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)CrossRefGoogle Scholar
  4. 4.
    Cheng, J., Liu, T., Ramamohanarao, K., Tao, D.: Learning with bounded instance-and label-dependent label noise. arXiv preprint arXiv:1709.03768 (2017)
  5. 5.
    Cour, T., Sapp, B., Taskar, B.: Learning from partial labels. J. Mach. Learn. Res. 12(May), 1501–1536 (2011)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Ehsan Abbasnejad, M., Dick, A., van den Hengel, A.: Infinite variational autoencoder for semi-supervised learning. In: CVPR, July 2017Google Scholar
  7. 7.
    Gagniuc, P.A.: Markov Chains: From Theory to Implementation and Experimentation. Wiley, Hoboken (2017)CrossRefGoogle Scholar
  8. 8.
    Gong, C., Zhang, H., Yang, J., Tao, D.: Learning with inadequate and incorrect supervision. In: ICDM, pp. 889–894. IEEE (2017)Google Scholar
  9. 9.
    Haeusser, P., Mordvintsev, A., Cremers, D.: Learning by association: a versatile semi-supervised training method for neural networks. In: CVPR (2017)Google Scholar
  10. 10.
    Han, B., Tsang, I.W., Chen, L., Celina, P.Y., Fung, S.F.: Progressive stochastic learning for noisy labels. IEEE Trans. Neural Netw. Learn. Syst. 99, 1–13 (2018)Google Scholar
  11. 11.
    Han, B., et al.: Co-teaching: robust training deep neural networks with extremely noisy labels. arXiv preprint arXiv:1804.06872 (2018)
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  13. 13.
    Ishida, T., Niu, G., Sugiyama, M.: Learning from complementary labels. In: NIPS (2017)Google Scholar
  14. 14.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  15. 15.
    Law, M.T., Yu, Y., Urtasun, R., Zemel, R.S., Xing, E.P.: Efficient multiple instance metric learning using weakly supervised data. In: CVPR, July 2017Google Scholar
  16. 16.
    LeCun, Y., Corinna, C., Christopher, B.J.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
  17. 17.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  18. 18.
    Liu, T., Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 447–461 (2016)CrossRefGoogle Scholar
  19. 19.
    Misra, I., Lawrence Zitnick, C., Mitchell, M., Girshick, R.: Seeing through the Human reporting bias: visual classifiers from noisy Human-centric labels. In: CVPR, pp. 2930–2939 (2016)Google Scholar
  20. 20.
    Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT press, Cambridge (2012)zbMATHGoogle Scholar
  21. 21.
    Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: NIPS, pp. 1196–1204 (2013)Google Scholar
  22. 22.
    Patrini, G., Rozza, A., Menon, A., Nock, R., Qu, L.: Making neural networks robust to label noise: a loss correction approach. In: CVPR (2017)Google Scholar
  23. 23.
    du Plessis, M.C., Niu, G., Sugiyama, M.: Analysis of learning from positive and unlabeled data. In: NIPS, pp. 703–711 (2014)Google Scholar
  24. 24.
    Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015).  https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRefGoogle Scholar
  26. 26.
    Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R.: Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080 (2014)
  27. 27.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: ICML, pp. 1139–1147 (2013)Google Scholar
  28. 28.
    Vahdat, A.: Toward robustness against label noise in training deep discriminative neural networks. In: NIPS, pp. 5596–5605 (2017)Google Scholar
  29. 29.
    Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5), 544–557 (2009)CrossRefGoogle Scholar
  30. 30.
    Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: CVPR, July 2017Google Scholar
  31. 31.
    Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks using dropconnect. In: ICML, pp. 1058–1066 (2013)Google Scholar
  32. 32.
    Wang, R., Liu, T., Tao, D.: Multiclass learning with partially corrupted labels. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2568–2580 (2018)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: NIPS, pp. 521–528 (2003)Google Scholar
  34. 34.
    Yu, X., Liu, T., Gong, M., Batmanghelich, K., Tao, D.: An efficient and provable approach for mixture proportion estimation using linear independence assumption. In: CVPR, pp. 4480–4489 (2018)Google Scholar
  35. 35.
    Yu, X., Liu, T., Gong, M., Zhang, K., Tao, D.: Transfer learning with label noise. arXiv preprint arXiv:1707.09724 (2017)
  36. 36.
    Zhang, T.: Statistical analysis of some multi-category large margin classification methods. J. Mach. Learn. Res. 5(Oct), 1225–1251 (2004)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Zhu, X.: Semi-supervised learning literature survey (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiyu Yu
    • 1
    Email author
  • Tongliang Liu
    • 1
  • Mingming Gong
    • 2
    • 3
  • Dacheng Tao
    • 1
  1. 1.UBTECH Sydney AI Centre, SIT, FEITThe University of SydneySydneyAustralia
  2. 2.Department of PhilosophyCarnegie Mellon UniversityPittsburghUSA
  3. 3.Department of Biomedical InformaticsUniversity of PittsburghPittsburghUSA

Personalised recommendations