Abstract
Classifiers that are deployed in the field can be used and evaluated in ways that were not anticipated when the model was trained. The final evaluation metric may not have been known at training time, additional performance criteria may have been added, the evaluation metric may have changed over time, or the real-world evaluation procedure may have been impossible to simulate. Unforeseen ways of measuring model utility can degrade performance. Our objective is to provide experimental support for modelers who face potential “cross-metric” performance deterioration. First, to identify model-selection metrics that lead to stronger cross-metric performance, we characterize the expected loss where the selection metric is held fixed and the evaluation metric is varied. Second, we show that the number of data points evaluated by a selection metric has substantial impact on the optimal evaluation. While addressing these issues, we consider the effect of calibrating the classifiers to output probabilities influences. Our experiments show that if models are well calibrated, cross-entropy is the highest-performing selection metric if little data is available for model selection. With these experiments, modelers may be in a better position to choose selection metrics that are robust where it is uncertain what evaluation metric will be applied.
Chapter PDF
References
Munson, A., Cardie, C., Caruana, R.: Optimizing to arbitrary NLP metrics using ensemble selection. In: Proc. of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 539–546 (2005)
Huang, J., Ling, C.X.: Evaluating model selection abilities of performance measures. In: Evaluation Methods for Machine Learning, Papers from the AAAI workshop, Technical Report WS-06-06, AAAI, pp. 12–17 (2006)
Soares, C., Costa, J., Brazdil, P.: A simple and intuitive measure for multicriteria evaluation of classification algorithms. In: ECML 2000. Proceedings of the Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, Barcelona, Spain (2000)
Nakhaeizadeh, C., Schnabl, A.: Development of multi-criteria metrics for evaluation of data mining algorithms. In: Heckerman, D., Manilla, H., Pregibon, D. (eds.) Proceedings of the 3rd International Conference on Knowledge Discovery in Databases, Newport Beach, CA, AAAI Press, Menlo Park, CA (1997)
Spiliopoulou, M., Kalousis, A., Faulstich, L.C., Theoharis, T.: NOEMON: An intelligent assistant for classifier selection. In: FGML 1998. Number 11 in 98, Dept. of Computer Science, TU Berlin, pp. 90–97 (1998)
Ting, K.M., Zheng, Z.: Boosting trees for cost-sensitive classifications. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 190–195. Springer, Heidelberg (1998)
Rosset, S.: Model selection via the auc. In: ICML 2004: Proceedings of the Twenty-first International Conference on Machine Learning, p. 89. ACM Press, New York (2004)
Joachims, T.: A support vector method for multivariate performance measures
Cortes, C., Mohri, M.: Auc optimization vs. error rate minimization. In: Thrun, S., Saul, L., Scholkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)
Caruana, R., Niculescu-Mizil, A.: Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 69–78. ACM Press, New York (2004)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Gualtieri, A., Chettri, S.R., Cromp, R., Johnson, L.: Support vector machine classifiers as applied to aviris data. In: Proc. Eighth JPL Airborne Geoscience Workshop (1999)
Perlich, C., Provost, F., Simonoff, J.S.: Tree induction vs. logistic regression: a learning-curve analysis. J. Mach. Learn. Res. 4, 211–255 (2003)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: ICML 2006: Proceedings of the 23rd International Conference On Machine Learning, pp. 161–168. ACM Press, New York (2006)
Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning. In: Proc. 22nd International Conference on Machine Learning (ICML 2005) (2005)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A., Bartlett, P., Schlkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Skalak, D.B., Niculescu-Mizil, A., Caruana, R. (2007). Classifier Loss Under Metric Uncertainty. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)