Advertisement

Machine Learning

, Volume 107, Issue 4, pp 767–794 | Cite as

Semi-supervised AUC optimization based on positive-unlabeled learning

  • Tomoya Sakai
  • Gang Niu
  • Masashi Sugiyama
Article
  • 589 Downloads

Abstract

Maximizing the area under the receiver operating characteristic curve (AUC) is a standard approach to imbalanced classification. So far, various supervised AUC optimization methods have been developed and they are also extended to semi-supervised scenarios to cope with small sample problems. However, existing semi-supervised AUC optimization methods rely on strong distributional assumptions, which are rarely satisfied in real-world problems. In this paper, we propose a novel semi-supervised AUC optimization method that does not require such restrictive assumptions. We first develop an AUC optimization method based only on positive and unlabeled data and then extend it to semi-supervised learning by combining it with a supervised AUC optimization method. We theoretically prove that, without the restrictive distributional assumptions, unlabeled data contribute to improving the generalization performance in PU and semi-supervised AUC optimization methods. Finally, we demonstrate the practical usefulness of the proposed methods through experiments.

Keywords

AUC optimization Learning from positive and unlabeled data Semi-supervised learning 

Notes

Acknowledgements

TS was supported by KAKENHI 15J09111. GN was supported by the JST CREST program and Microsoft Research Asia. MS was supported by JST CREST JPMJCR1403. We thank Han Bao for his comments.

References

  1. Amini, M. R., Truong, T. V., & Goutte, C. (2008). A boosting algorithm for learning bipartite ranking functions with partially labeled data. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 99–106).Google Scholar
  2. Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473), 138–156.MathSciNetCrossRefzbMATHGoogle Scholar
  3. Blondel, M., Seki, K., & Uehara, K. (2013). Block coordinate descent algorithms for large-scale sparse multiclass classification. Machine Learning, 93(1), 31–52.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27.CrossRefGoogle Scholar
  5. Chapelle, O., Schölkopf, B., & Zien, A. (Eds.). (2006). Semi-supervised learning. Cambridge: MIT Press.Google Scholar
  6. Cortes, C., & Mohri, M. (2004). AUC optimization vs. error rate minimization. Advances in Neural Information Processing Systems, 16, 313–320.Google Scholar
  7. Cozman, F. G., Cohen, I., & Cirelo, M. C. (2003). Semi-supervised learning of mixture models. In Proceedings of the 20th international conference on machine learning (pp. 99–106).Google Scholar
  8. Dredze, M., Crammer, K., & Pereira, F. (2008). Confidence-weighted linear classification. In Proceedings of the 25th international conference on machine learning (pp. 264–271).Google Scholar
  9. du Plessis, M. C., Niu, G., & Sugiyama, M. (2014). Analysis of learning from positive and unlabeled data. Advances in Neural Information Processing Systems, 27, 703–711.Google Scholar
  10. du Plessis, M. C., Niu, G., & Sugiyama, M. (2017). Class-prior estimation for learning from positive and unlabeled data. Machine Learning, 106(4), 463–492.MathSciNetCrossRefzbMATHGoogle Scholar
  11. du Plessis, M. C., Niu, G., & Sugiyama, M. (2015). Convex formulation for learning from positive and unlabeled data. In Proceedings of 32nd international conference on machine learning, JMLR workshop and conference proceedings (Vol. 37, pp. 1386–1394).Google Scholar
  12. Fujino, A., Ueda, N. (2016). A semi-supervised AUC optimization method with generative models. In IEEE 16th international conference on data mining (pp. 883–888).Google Scholar
  13. Gao, W., & Zhou, Z. H. (2015). On the consistency of AUC pairwise optimization. In International joint conference on artificial intelligence (pp. 939–945).Google Scholar
  14. Gao, W., Wang, L., Jin, R., Zhu, S., & Zhou, Z. H. (2016). One-pass AUC optimization. Artificial Intelligence, 236(C), 1–29.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.CrossRefGoogle Scholar
  16. Herschtal, A., & Raskutti, B. (2004). Optimising area under the ROC curve using gradient descent. In Proceedings of the 21st international conference on machine learning.Google Scholar
  17. Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142).Google Scholar
  18. Kawakubo, H., du Plessis, M. C., & Sugiyama, M. (2016). Computationally efficient class-prior estimation under class balance change using energy distance. IEICE Transactions on Information and Systems, E99–D(1), 176–186.CrossRefGoogle Scholar
  19. Kotlowski, W., Dembczynski, K. J, & Huellermeier, E. (2011). Bipartite ranking through minimization of univariate loss. In Proceedings of the 28th international conference on machine learning (pp. 1113–1120).Google Scholar
  20. Krijthe, J. H., & Loog, M. (2017). Robust semi-supervised least squares classification by implicit constraints. Pattern Recognition, 63, 115–126.CrossRefzbMATHGoogle Scholar
  21. Lang, K. (1995). Newsweeder: Learning to filter netnews. In Proceedings of the 12th international machine learning conference.Google Scholar
  22. Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397.Google Scholar
  23. Li, Y. F., & Zhou, Z. H. (2015). Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 175–188.CrossRefGoogle Scholar
  24. Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
  25. Mendelson, S. (2008). Lower bounds for the empirical minimization algorithm. IEEE Transactions on Information Theory, 54(8), 3797–3803.MathSciNetCrossRefzbMATHGoogle Scholar
  26. Niu, G., du Plessis, M. C., Sakai, T., Ma Y., & Sugiyama, M. (2016). Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., & Garnett, R. (Eds.) Advances in neural information processing systems (Vol. 29, pp. 1199–1207)Google Scholar
  27. Rakhlin, A., Shamir, O., & Sridharan, K. (2012). Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th international conference on machine learning (pp. 449–456).Google Scholar
  28. Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for adaboost. Machine Learning, 42(3), 287–320.CrossRefzbMATHGoogle Scholar
  29. Sakai, T., du Plessis, M. C., Niu, G., & Sugiyama, M. (2017). Semi-supervised classification based on classification from positive and unlabeled data. In Proceedings of the 34th international conference on machine learning.Google Scholar
  30. Sokolovska, N., Cappé, O., & Yvon, F. (2008). The asymptotics of semi-supervised learning in discriminative probabilistic models. In Proceedings of the 25th international conference on machine learning (pp. 984–991).Google Scholar
  31. Sundararajan, S., Priyanka, G., & Selvaraj, S. S. K. (2011). A pairwise ranking based approach to learning with positive and unlabeled examples. In Proceedings of the 20th ACM international conference on information and knowledge management (pp. 663–672).Google Scholar
  32. Usunier, N., Amini, M., & Patrick, G. (2006). Generalization error bounds for classifiers trained with interdependent data. In: Weiss, Y., Schölkopf, P. B., & Platt J. C. (Eds.) Advances in neural information processing systems (Vol. 18, pp. 1369–1376). La Jolla, CA: Neural Information Processing Systems Foundation Inc.Google Scholar
  33. Vapnik, V. N. (1998). Statistical learning theory. London: Wiley.zbMATHGoogle Scholar
  34. Ying, Y., Wen, L., & Lyu, S. (2016). Stochastic online AUC maximization. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., & Garnett, R. (Eds.) Advances in neural information processing systems (Vol. 29, pp. 451–459). La Jolla, CA: Neural Information Processing Systems Foundation Inc.Google Scholar
  35. Zhao, P., Jin, R., Yang, T., & Hoi, S. C. (2011). Online AUC maximization. In Proceedings of the 28th international conference on machine learning (pp. 233–240).Google Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Center for Advanced Intelligence ProjectRIKEN, NihonbashiChuo-kuJapan
  2. 2.Graduate School of Frontier SciencesThe University of Tokyo, KashiwanohaKashiwa-shiJapan

Personalised recommendations