Abstract
Multi-class learning requires a classifier to discriminate among a large set of L classes in order to define a classification rule able to identify the correct class for new observations. The resulting classification rule could not always be robust, particularly when imbalanced classes are observed or the data size is not large. In this paper a new approach is presented aimed at evaluating the reliability of a classification rule. It uses a standard classifier but it evaluates the reliability of the obtained classification rule by re-training the classifier on resampled versions of the original data. User-defined misclassification costs are assigned to the obtained confusion matrices and then used as inputs in a Beta regression model which provides a cost-sensitive weighted classification index. The latter is used jointly with another index measuring dissimilarity in distribution between observed classes and predicted ones. Both indices are defined in [0, 1] so that their values can be graphically represented in a [0, 1]2 space. The visual inspection of the points for each classifier allows us to evaluate its reliability on the basis of the relationship between the values of both indices obtained on the original data and on resampled versions of it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abramson, N. (1963). Information theory and coding. New York: McGraw-Hill.
Cribari-Neto, F., & Zeileis, A. (2010). Beta regression in R. Journal of Statistical Software, 34(2), 1–24.
Ferrari, S., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799–815.
Freitas, C. O., De Carvalho, J. M., Oliveira, J. R., Aires, S. B., & Sabourin, R. (2007). Confusion matrix disagreement for multiple classifiers. In Progress in pattern recognition, image analysis and applications (pp. 387–396). Berlin: Springer.
Habel, K., Grasman, R., Stahel, A., & Sterrat, D. C. (2014). Geometry: Mesh generation and surface tesselation. R package version 0.3-5, http://CRAN.R-project.org/package=geometry
Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2),171–186.
Müssell, C., Lausser, L., Maucher, M., & Kester, H. A. (2012). Multi-objective parameter selection for classifiers. Journal of Statistical Software, 46(5), 1–27.
Rachev, S. T. (1985). The Monge-Kantorovich mass transference problem and its stochastic applications. Theory of Probability and Its Applications, 29(4), 647–676.
Sindhwani, V., Bhattacharya, P., & Rakshit, S. (2001). Information theoretic feature crediting in multiclass support vector machines. In Proceedings of the First SIAM International Conference on Data Mining (pp. 5–7). Philadelphia, PA. SIAM.
Van Son, R. (1995). A method to quantify the error distribution in confusion matrices. In Proceedings of Eurospeech 95, Madrid, 22772280.
Wei, J.-M., Yuan, X.-J., Hu, Q.-H., & Wang, S.-Q. (2010). A novel measure for evaluating classifiers. Expert Systems with Applications, 37(5),3799–3809.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Frigau, L., Conversano, C., Mola, F. (2016). Assessing the Reliability of a Multi-Class Classifier. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-25226-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)