Training Classifiers for Unbalanced Distribution and Cost-Sensitive Domains with ROC Analysis

Zhang, Xiaolong; Jiang, Chuan; Luo, Ming-jian

doi:10.1007/11961239_8

Xiaolong Zhang²²,
Chuan Jiang²² &
Ming-jian Luo²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4303))

Included in the following conference series:

Pacific Rim Knowledge Acquisition Workshop

605 Accesses
1 Citations

Abstract

ROC (Receiver Operating Characteristic) has been used as a tool for the analysis and evaluation of two-class classifiers, even the training data embraces unbalanced class distribution and cost-sensitiveness. However, ROC has not been effectively extended to evaluate multi-class classifiers. In this paper, we proposed an effective way to deal with multi-class learning with ROC analysis. An EMAUC algorithm is implemented to transform a multi-class training set into several two-class training sets. Classification is carried out with these two-class training sets. Empirical results demonstrate that the classifiers trained with the proposed algorithm have competitive performance for unbalanced distribution and cost-sensitive domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fawcett, T., Provost, F.: Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 291–316 (1997)
Google Scholar
Lusted, L.B.: Logical Analysis in Roentgen Diagnosis. Radiology 74, 178–193 (1960)
Google Scholar
Dietterich, T.G., Bakiri, G.: Solving Multiclass Learning Problems Via Error Correcting Output Codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)
MATH Google Scholar
WEKA, http://www.cs.waikato.ac.nz/ml/weka
ROCon, http://www.cs.bris.ac.uk/Research/MachineLearning/rocon
Merz, C.J., Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases, University of California, Irvine (1998), Available: http://www.ics.uci.edu/~mlearn/MLRepository.html
Swets, J.A., Dawes, R.M., Monahan, J.: Better Decisions through Science. Scientific American (2000)
Google Scholar
Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Machine Learning (2004)
Google Scholar
Drummond, C., Holte, R.C.: What ROC Curves Can’t Do (and Cost Curves Can). In: Proceedings of the ROC Analysis in Artificial Intelligence, 1st International Workshop, pp. 19–26 (2004)
Google Scholar
Ling, C.X., Huang, J., Zhang, H.: AUC: a Better Measure than Accuracy in Comparing Learning Algorithms. In: Canadian Conference on AI (2003)
Google Scholar
Mossman, D.: Three-way ROCs. Medical Decision Making 19(1), 78–89 (1999)
Article Google Scholar
Ferri, C., Flach, P.A., Hernandez-Orallo, J.: Learning Decision Trees Using the Area Under the ROC Curve. In: Proceedings of the Nineteenth International Conference on Machine Learning ICML, pp. 139–146 (2002)
Google Scholar
Ferri, C., Hernndez-Orallo, J., Salido, M.A.: Volume Under the ROC Surface for Multi-class Problems. In: Proceedings of 14th European Conference on Machine Learning, ECML (2003)
Google Scholar
Ferri, C., Hernndez-Orallo, J., Salido, M.A.: Volume Under the ROC Surface for Multi-class Problems. Exact Computation and Evaluation of Approximations. 2003, Univ. Politecnica de Valencia: Valencia. 1-40. DSIC. Univ. Politc. Valncia (2003)
Google Scholar
Hand, D.J., Till, R.J.: A Simple Generalization of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45(2), 171–186 (2001)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Bradley, A.P.: The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Flach, P.A.: The Geometry of ROC Space: Using ROC Isometrics to Understand Machine Learning Metrics. In: Proceedings of the International Conference on Machine Learning (2003)
Google Scholar
Provost, F.J., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)
Google Scholar
Ling, C.X., Huang, J., Zhang, H.: AUC: a Statistically Consistent and More Discriminating Measure Than Accuracy. In: Proceedings of 18th International Conference on Artificial Intelligence (IJCAI 2003), pp. 329–341 (2003)
Google Scholar
Huang, J., Lu, J., Ling, C.X.: Comparing Natives Bayes, Decision Trees, and SVM using Accuracy and AUC. In: Proceedings of European Conference on Data Mining (ICDML 2003) (2003)
Google Scholar
Lachicle, N., Flach, P.: Improving Accuracy and Cost of Two-Class and Multi-Class Probabilistic Classifiers Using ROC Curves. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003) (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430081, China
Xiaolong Zhang, Chuan Jiang & Ming-jian Luo

Authors

Xiaolong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-jian Luo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science & Engineering, The University of New South Wales, Sydney, Australia
Achim Hoffmann
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang
Computing Department, Division of Information and Communication Sciences, Macquarie University, 2109, Sydney, NSW, Australia
Debbie Richards
Shimane University, 89-1 Enya-cho Izumo, 6938501, Shimane, Japan
Shusaku Tsumoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Jiang, C., Luo, Mj. (2006). Training Classifiers for Unbalanced Distribution and Cost-Sensitive Domains with ROC Analysis. In: Hoffmann, A., Kang, Bh., Richards, D., Tsumoto, S. (eds) Advances in Knowledge Acquisition and Management. PKAW 2006. Lecture Notes in Computer Science(), vol 4303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11961239_8

Download citation

DOI: https://doi.org/10.1007/11961239_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68955-3
Online ISBN: 978-3-540-68957-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics