Abstract
In this paper we revisit the problem of classifier calibration, motivated by the issue that existing calibration methods ignore the problem attributes (i.e., they are univariate). We propose a new calibration method inspired in binning-based methods in which the calibrated probabilities are obtained from k instances from a dataset. Bins are constructed by including the k-most similar instances, considering not only estimated probabilities but also the original attributes. This method has been tested wrt. two calibration measures, including a comparison with other traditional calibration methods. The results show that the new method outperforms the most commonly used calibration methods.
This work has been partially supported by the EU (FEDER) and the Spanish MEC/MICINN, under grant TIN 2007-68093-C02 and the Spanish project ”Agreement Technologies” (Consolider Ingenio CSD2007-00022).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bella, A., Ferri, C., Hernandez-Orallo, J., Ramirez-Quintana, M.J.: Calibration of machine learning models. In: Handbook of Research on Machine Learning Applications. IGI Global (2009)
Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proc. of the 10th Intl. Conference on Knowledge Discovery and Data Mining, pp. 69–78 (2004)
Ayer, M., et al.: An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics 5, 641–647 (1955)
Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30(1), 27–38 (2009)
Flach, P., Matsubara, E.: A simple lexicographic ranker and probability estimator. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 575–582. Springer, Heidelberg (2007)
Gama, J., Brazdil, P.: Cascade generalization. Machine Learning 41, 315–343 (2000)
Murphy, A.H.: Scalar and vector partitions of the probability score: Part ii. n-state situation. Journal of Applied Meteorology 11, 1182–1192 (1972)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Boston (1999)
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proc. of the 18th Intl. Conference on Machine Learning, pp. 609–616 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J. (2009). Similarity-Binning Averaging: A Generalisation of Binning Calibration. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-04394-9_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)