Advertisement

Improving ANNs Performance on Unbalanced Data with an AUC-Based Learning Algorithm

  • Cristiano L. Castro
  • Antônio P. Braga
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7553)

Abstract

This paper investigates the use of the Area Under the ROC Curve (AUC) as an alternative criteria for model selection in classification problems with unbalanced datasets. A novel algorithm, named here as AUCMLP, which incorporates AUC optimization into the Multi-layer Perceptron (MLPs) learning process is presented. The basic principle of AUCMLP is the solution of an optimization problem that aims at ranking quality as well as the separability of class distributions with respect to the threshold decision. Preliminary results achieved on real data, point out that our approach is promising, and can lead to better decision surfaces, specially under more severe unbalance conditions.

Keywords

unbalanced datasets classification Area Under the ROC Curve parameter estimation criteria 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rumelhart, D.E., McClelland, J.L.: Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1: Foundations. MIT Press (1986)Google Scholar
  2. 2.
    Lan, J., Hu, M.Y., Patuwo, E., Zhang, G.P.: An investigation of neural network classifiers with unequal misclassification costs and group sizes. Decis. Support Syst. 48, 582–591 (2010)CrossRefGoogle Scholar
  3. 3.
    Fawcett, T.: An introduction to ROC analysis. Pat. Rec. Lett. 27, 861–874 (2006)CrossRefGoogle Scholar
  4. 4.
    Rudin, C., Schapire, R.E.: Margin-based ranking and an equivalence between AdaBoost and RankBoost. J. of Mach. Learn. Research 10, 2193–2232 (2009)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)CrossRefGoogle Scholar
  6. 6.
    Cortes, C., Mohri, M.: AUC optimization vs. error rate minimization. In: Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)Google Scholar
  7. 7.
    Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the wilcoxon-mann-whitney statistic. In: ICML 2003: Proceedings of the 20th Int. Conf. on Machine Learning, pp. 848–855 (2003)Google Scholar
  8. 8.
    Joachims, T.: A support vector method for multivariate performance measures. In: ICML 2005: Proc. of the 22nd Int. Conf. on Machine learning, pp. 377–384 (2005)Google Scholar
  9. 9.
    Herschtal, A., Raskutti, B., Campbell, P.K.: Area under ROC optimization using a ramp approximation. In: Proc. of 6th Int. Conf. on Data Mining, pp. 1–11 (2006)Google Scholar
  10. 10.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. on Knowledge and Data Engineering 21, 1263–1284 (2009)CrossRefGoogle Scholar
  11. 11.
    Khoshgoftaar, T.M., Hulse, J.V., Napolitano, A.: Supervised neural network modeling: An empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans. on Neural Networks 21, 813–830 (2010)CrossRefGoogle Scholar
  12. 12.
    Hanley, J.A., Mcneil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)Google Scholar
  13. 13.
    Batista, G., Prati, R., Monard, M.: A study of the behavior of methods for balancing machine learning training data. SIGKDD Expl. Newsl. 6, 20–29 (2004)CrossRefGoogle Scholar
  14. 14.
    Chen, S., He, H., Garcia, E.A.: Ramoboost: ranked minority oversampling in boosting. IEEE Trans. on Neural Networks 21, 1624–1642 (2010)CrossRefGoogle Scholar
  15. 15.
    UCI machine learning repository, http://archive.ics.uci.edu/ml/
  16. 16.
    Wu, G., Chang, E.: KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Trans. on Knowl. and Data Eng. 17, 786–795 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Cristiano L. Castro
    • 1
  • Antônio P. Braga
    • 1
  1. 1.Department of Computer ScienceFederal University of LavrasLavrasBrazil

Personalised recommendations