Machine Learning

, Volume 68, Issue 1, pp 97–106 | Cite as

PAV and the ROC convex hull

  • Tom Fawcett
  • Alexandru Niculescu-Mizil
Technical Note


Classifier calibration is the process of converting classifier scores into reliable probability estimates. Recently, a calibration technique based on isotonic regression has gained attention within machine learning as a flexible and effective way to calibrate classifiers. We show that, surprisingly, isotonic regression based calibration using the Pool Adjacent Violators algorithm is equivalent to the ROC convex hull method.


Classification Classifier calibration ROC Class skew 


  1. Ayer, M., Brunk, H., Ewing, G., Reid, W., & Silverman, E. (1955). An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 5(26), 641–647. MathSciNetGoogle Scholar
  2. Brier, G. W. (1950). Verification of forecasts expressed in terms of probabilities. Monthly Weather Review, 78, 1–3. CrossRefGoogle Scholar
  3. Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In W. Cohen & A. Moore (Eds.), Proceedings of the twenty-third international conference on machine learning (ICML’06) (pp. 161–168). New York: ACM. Google Scholar
  4. Flach, P. A., & Wu, S. (2005). Repairing concavities in ROC curves. In L.P. Kaelbling & A. Saffiotti (Eds.), Proceedings of the nineteenth international joint conference on artificial intelligence (IJCAI’05) (pp. 702–707). Berlin: Springer. Google Scholar
  5. Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. In L.D. Raedt & S. Wrobel (Eds.), Proceedings of the twenty-second international conference on machine learning (ICML’05) (pp. 625–632). New York: ACM. Google Scholar
  6. Platt, J. (1999). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A. Smola, P. Bartlett, B. Schoelkopf & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 61–74). Cambridge: MIT. Google Scholar
  7. Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42(3), 203–231. MATHCrossRefGoogle Scholar
  8. Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293. CrossRefMathSciNetGoogle Scholar
  9. Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Better decisions through science. Scientific American, 283, 82–87. CrossRefGoogle Scholar
  10. Wilbur, W. J., Yeganova, L., & Kim, W. (2005). The synergy between PAV and AdaBoost. Machine Learning, 61(1–3), 71–103. MATHCrossRefGoogle Scholar
  11. Zadrozny, B., & Elkan, C. (2001). Learning and making decisions when costs and probabilities are both unknown. In F. Provost & R. Srikant (Eds.), Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD’01) (pp. 204–213). Google Scholar
  12. Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In D. Hand, D. Keim & R. Ng (Eds.), Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’02) (pp. 694–699). New York: ACM. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Center for the Study of Language and InformationStanford UniversityStanfordUSA
  2. 2.Computer Science DepartmentCornell UniversityIthacaUSA

Personalised recommendations