Abstract
Performance evaluation of supervised classification learning method related to its prediction ability on independent data is very important in machine learning. It is also almost unthinkable to carry out any research work without the comparison of the new, proposed classifier with other already existing ones. This paper aims to review the most important aspects of the classifier evaluation process including the choice of evaluating metrics (scores) as well as the statistical comparison of classifiers. Critical view, recommendations and limitations of the reviewed methods are presented. The article provides a quick guide to understand the complexity of the classifier evaluation process and tries to warn the reader about the wrong habits.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Batuvita, R., Palade, V.: A new performance measure for class imbalance learning: application to bioinformatics problem. In: Proceedings of 26th International Conference Machine Learning and Applications, pp. 545–550 (2009)
Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Bouckaert, R.: Estimating replicability of classifier learning experiments. In: Proceedings of the 21st Conference on ICML. AAAI Press (2004)
Bradley, P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)
Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1924 (1998)
Demsar, J.: Statistical comparison of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Dmochowski, J., et al.: Maximum likelihood in cost-sensitive learning: model specification, approximation and upper bounds. J. Mach. Learn. Res. 11, 3313–3332 (2010)
Duda, R., Hart, P., Stork, D.: Pattern Classification and Scene Analysis. Wiley, New York (2000)
Drummond, C., Holte, R.: Cost curves: an improved method for visualizing classifier performance. Mach. Learn. 65(1), 95–130 (2006)
Elkan, C.: The foundation of cost-sensitive learning. In: Proceedings of 4th International Conference Artificial Intelligence, vol. 17, pp. 973–978 (2001)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Ferri, C., et al.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30(1), 27–38 (2009)
Finner, H.: On a monotonicity problem in step-down multiple test procedures. J. Am. Stat. Assoc. 88, 920–923 (1993)
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)
Gama J., et. al.: On evaluating stream learning algorithms. Mach. Learn., pp. 1–30 (2013)
Garcia, S., Herrera, F.: An extension on statistical comparison of classifiers over multiple datasets for all pair-wise comparisons. J. Mach. Learn. Res. 9(12), 2677–2694 (2008)
Garcia, S., Fernandez, A., Lutengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in the computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)
GarcÃa, V., Mollineda, R.A., Sánchez, J.S.: Index of balanced accuracy: a performance measure for skewed class distributions. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds.) IbPRIA 2009. LNCS, vol. 5524, pp. 441–448. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02172-5_57
Górecki, T., Krzyśko, M.: Regression methods for combining multiple classifiers. Commun. Stat. Simul. Comput. 44, 739–755 (2015)
Hand, D., Till, R.: A simple generalization of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)
Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)
Hand, D., Anagnostopoulos, C.: A better beta for the H measure of classification performance. Pattern Recogn. Lett. 40, 41–46 (2014)
He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans Data Knowl. Eng. 21(9), 1263–1284 (2009)
Hochberg, Y.: A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–802 (1988)
Hodges, J.L., Lehmann, E.L.: Ranks methods for combination of independent experiments in analysis of variance. Ann. Math. Stat. 33, 482–487 (1962)
Hollander, M., Wolfe, D.: Nonparametric Statistical Methods. Wiley, New York (2013)
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)
Iman, R., Davenport, J.: Approximations of the critical region of the Friedman statistic. Comput. Stat. 9(6), 571–595 (1980)
Japkowicz, N., Stephen, N.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 40–49 (2002)
Japkowicz, N., Shah, M.: Evaluating learning algorithms: a classification perspective. Cambridge University Press, Cambridge (2011)
Krzyśko, M., Wołyński, W., Górecki, T., Skorzybut, M.: Learning Systems. In: WNT, Warszawa (2008) (in Polish)
Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th ICML, pp. 179–186 (1997)
Kurzyński, M.: Pattern Recognition. Statistical Approach. Wrocław University Technology Press, Wrocław (1997) (in Polish)
Malina, W., Åšmiatacz, M.: Pattern Recognition. EXIT Press, Warszawa (2010) (in Polish)
Nadeau, C., Bengio, Y.: Inference for the generalization error. Mach. Learn. 52(3), 239–281 (2003)
Prati, R., et al.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. Knowl. Data Eng. 23(11), 1601–1618 (2011)
Ranavana, R., Palade, V.: Optimized precision: a new measure for classifier performance evaluation. In: Proceedings of the 23rd IEEE International Conference on Evolutionary Computation, pp. 2254–2261 (2006)
Quade, D.: Using weighted rankings in the analysis of complete blocks with additive block effects. J. Am. Stat. Assoc. 74, 680–683 (1979)
Salzberg, S.: On comparing classifiers: pitfalls to avoid and recommended approach. Data Min. Knowl. Disc. 1, 317–328 (1997)
Sánchez-Crisostomo, J.P., Alejo, R., López-González, E., Valdovinos, R.M., Pacheco-Sánchez, J.H.: Empirical analysis of assessments metrics for multi-class imbalance learning on the back-propagation context. In: Tan, Y., Shi, Y., Coello, C.A.C. (eds.) ICSI 2014. LNCS, vol. 8795, pp. 17–23. Springer, Cham (2014). doi:10.1007/978-3-319-11897-0_3
Santafe, G., et al.: Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44, 467–508 (2015)
Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46, 561–584 (1995)
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Proc. Manag. 45, 427–437 (2009)
StÄ…por, K.: Classification methods in computer vision. In: PWN, Warszawa (2011) (in Polish)
Sun, Y., et al.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)
Sun, Y., et. al.: Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of International Conference on Data Mining, pp. 592–602 (2006)
Tadeusiewicz, R., Flasiński, M.: Pattern recognition. In: PWN, Warszawa (1991) (in Polish)
Wolpert, D.: The lack of a priori distinctions between learning algorithms. Neural Comput. 8(7), 1341–1390 (1996)
Woźniak, M.: Hybrid classifiers. Methods of Data, Knowledge and Classifier Combination. SCI, vol. 519, Springer, Heidelberg (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
StÄ…por, K. (2018). Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-59162-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59161-2
Online ISBN: 978-3-319-59162-9
eBook Packages: EngineeringEngineering (R0)