Advertisement

A New Monte Carlo-Based Error Rate Estimator

  • Ahmed Hefny
  • Amir F. Atiya
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5998)

Abstract

Estimating the classification error rate of a classifier is a key issue in machine learning. Such estimation is needed to compare classifiers or to tune the parameters of a parameterized classifier. Several methods have been proposed to estimate error rate, most of which rely on partitioning the data set or drawing bootstrap samples from it. Error estimators can suffer from bias (deviation from actual error rate) and/or variance (sensitivity to the data set). In this work, we propose an error rate estimator that estimates a generative and a posterior probability models to represent the underlying process that generates the data and exploits these models in a Monte Carlo style to provide two biased estimators whose best combination is determined by an iterative solution. We test our estimator against state of the art estimators and show that it provides a reliable estimate in terms of mean-square-error.

Keywords

Error Rate Mean Square Error Gaussian Process Bootstrap Sample Posterior Probability Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Isaksson, A., Wallman, M., Göransson, H., Gustafsson, M.G.: Cross-validation and bootstrapping are unreliable in small sample classification. Pattern Recogn. Lett. 29(14), 1960–1965 (2008)CrossRefGoogle Scholar
  2. 2.
    Fu, W.J., Carroll, R.J., Wang, S.: Estimating misclassification error with small samples via bootstrap cross-validation. Bioinformatics 21, 1979–1986 (2005)CrossRefGoogle Scholar
  3. 3.
    Jiang, W., Simon, R.: A comparison of bootstrap methods and an adjusted bootstrap approach for estimating prediction error in microarray classification. Statistics in Medicine (2008)Google Scholar
  4. 4.
    Sordo, M., Zeng, Q.T.: On sample size and classification accuracy: A performance comparison. In: Oliveira, J.L., Maojo, V., Martín-Sánchez, F., Pereira, A.S. (eds.) ISBMDA 2005. LNCS (LNBI), vol. 3745, pp. 193–201. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Efron, B.: Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association 78(382), 316–331 (1983)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Efron, B., Tibshirani, R.: Improvements on cross-validation: The .632+ bootstrap method. Journal of the American Statistical Association 92, 548–560 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Sima, C., Dougherty, E.R.: Optimal convex error estimators for classification. Pattern Recogn. 39(9), 1763–1780 (2006)zbMATHCrossRefGoogle Scholar
  8. 8.
    Raudys, S., Jain, A.: Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(3), 252–264 (1991)CrossRefGoogle Scholar
  9. 9.
    Fukunaga, K., Kessel, D.: Application of optimum error-reject functions. IEEE Transaction on Information Theory 19, 814–817 (1972)CrossRefGoogle Scholar
  10. 10.
    Ganesalingam, S., McLachlan, G.J.: Error rate estimation on the basis of posterior probabilities. Pattern Recognition 12(6), 405–413 (1980)zbMATHCrossRefGoogle Scholar
  11. 11.
    Lugosi, G., Pawlak, M.: On the posterior-probability estimate of the error rate of nonparametric classification rules. IEEE Transactions on Information Theory 40(2), 475–481 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Hand, D.J.: An optimal error rate estimator based on average conditional error rate: Asymptotic results. Pattern Recogn. Lett. 4(5), 347–350 (1986)CrossRefGoogle Scholar
  13. 13.
    Schiavo, R.A., Hand, D.J.: Ten more years of error rate research. International Statistical Review 68, 295–310 (2000)zbMATHCrossRefGoogle Scholar
  14. 14.
    Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
  15. 15.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  16. 16.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC (April 1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ahmed Hefny
    • 1
  • Amir F. Atiya
    • 1
  1. 1.Faculty of Engineering, Computer Engineering DepartmentCairo University 

Personalised recommendations