Training Classifiers for Unbalanced Distribution and Cost-Sensitive Domains with ROC Analysis

  • Xiaolong Zhang
  • Chuan Jiang
  • Ming-jian Luo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4303)


ROC (Receiver Operating Characteristic) has been used as a tool for the analysis and evaluation of two-class classifiers, even the training data embraces unbalanced class distribution and cost-sensitiveness. However, ROC has not been effectively extended to evaluate multi-class classifiers. In this paper, we proposed an effective way to deal with multi-class learning with ROC analysis. An EMAUC algorithm is implemented to transform a multi-class training set into several two-class training sets. Classification is carried out with these two-class training sets. Empirical results demonstrate that the classifiers trained with the proposed algorithm have competitive performance for unbalanced distribution and cost-sensitive domains.


Classification ROC Cost-Sensitive Learning Error Correcting Output Coding 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fawcett, T., Provost, F.: Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 291–316 (1997)Google Scholar
  2. 2.
    Lusted, L.B.: Logical Analysis in Roentgen Diagnosis. Radiology 74, 178–193 (1960)Google Scholar
  3. 3.
    Dietterich, T.G., Bakiri, G.: Solving Multiclass Learning Problems Via Error Correcting Output Codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)zbMATHGoogle Scholar
  4. 4.
  5. 5.
  6. 6.
    Merz, C.J., Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases, University of California, Irvine (1998), Available:
  7. 7.
    Swets, J.A., Dawes, R.M., Monahan, J.: Better Decisions through Science. Scientific American (2000)Google Scholar
  8. 8.
    Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Machine Learning (2004)Google Scholar
  9. 9.
    Drummond, C., Holte, R.C.: What ROC Curves Can’t Do (and Cost Curves Can). In: Proceedings of the ROC Analysis in Artificial Intelligence, 1st International Workshop, pp. 19–26 (2004)Google Scholar
  10. 10.
    Ling, C.X., Huang, J., Zhang, H.: AUC: a Better Measure than Accuracy in Comparing Learning Algorithms. In: Canadian Conference on AI (2003)Google Scholar
  11. 11.
    Mossman, D.: Three-way ROCs. Medical Decision Making 19(1), 78–89 (1999)CrossRefGoogle Scholar
  12. 12.
    Ferri, C., Flach, P.A., Hernandez-Orallo, J.: Learning Decision Trees Using the Area Under the ROC Curve. In: Proceedings of the Nineteenth International Conference on Machine Learning ICML, pp. 139–146 (2002)Google Scholar
  13. 13.
    Ferri, C., Hernndez-Orallo, J., Salido, M.A.: Volume Under the ROC Surface for Multi-class Problems. In: Proceedings of 14th European Conference on Machine Learning, ECML (2003)Google Scholar
  14. 14.
    Ferri, C., Hernndez-Orallo, J., Salido, M.A.: Volume Under the ROC Surface for Multi-class Problems. Exact Computation and Evaluation of Approximations. 2003, Univ. Politecnica de Valencia: Valencia. 1-40. DSIC. Univ. Politc. Valncia (2003)Google Scholar
  15. 15.
    Hand, D.J., Till, R.J.: A Simple Generalization of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45(2), 171–186 (2001)zbMATHCrossRefGoogle Scholar
  16. 16.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  17. 17.
    Bradley, A.P.: The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30, 1145–1159 (1997)CrossRefGoogle Scholar
  18. 18.
    Flach, P.A.: The Geometry of ROC Space: Using ROC Isometrics to Understand Machine Learning Metrics. In: Proceedings of the International Conference on Machine Learning (2003)Google Scholar
  19. 19.
    Provost, F.J., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)Google Scholar
  20. 20.
    Ling, C.X., Huang, J., Zhang, H.: AUC: a Statistically Consistent and More Discriminating Measure Than Accuracy. In: Proceedings of 18th International Conference on Artificial Intelligence (IJCAI 2003), pp. 329–341 (2003)Google Scholar
  21. 21.
    Huang, J., Lu, J., Ling, C.X.: Comparing Natives Bayes, Decision Trees, and SVM using Accuracy and AUC. In: Proceedings of European Conference on Data Mining (ICDML 2003) (2003)Google Scholar
  22. 22.
    Lachicle, N., Flach, P.: Improving Accuracy and Cost of Two-Class and Multi-Class Probabilistic Classifiers Using ROC Curves. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003) (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xiaolong Zhang
    • 1
  • Chuan Jiang
    • 1
  • Ming-jian Luo
    • 1
  1. 1.School of Computer Science and TechnologyWuhan University of Science and TechnologyWuhanChina

Personalised recommendations