Evaluating Reliability of Single Classifications of Neural Networks

  • Darko Pevec
  • Erik Štrumbelj
  • Igor Kononenko
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6593)


Current machine learning algorithms perform well on many problem domains, but in risk-sensitive decision making, for example in medicine and finance, common evaluation methods that give overall assessments of models fail to gain trust among experts, as they do not provide any information about single predictions. We continue the previous work on approaches for evaluating the reliability of single classifications where we focus on methods that are model independent. These methods have been shown to be successful in their narrow fields of application, so we constructed a testing methodology to evaluate these methods in straightforward, general-use test cases. For the evaluation, we had to derive a statistical reference function, which enables comparison between the reliability estimators and the model’s own predictions. We compare five different approaches and evaluate them on a simple neural network with several artificial and real-world domains. The results indicate that reliability estimators CNK and LCV can be used to improve the model’s predictions.


Reliability estimation Classification Prediction accuracy Prediction error 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007),
  2. 2.
    Bosnić, Z., Kononenko, I.: Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl. Eng. 67(3), 504–516 (2008)CrossRefGoogle Scholar
  3. 3.
    Kanji, G.K.: 100 statistical tests. SAGE Publications, Thousand Oaks (2006)CrossRefGoogle Scholar
  4. 4.
    Kukar, M., Kononenko, I.: Reliable classifications with machine learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 1–8. Springer, Heidelberg (2002)Google Scholar
  5. 5.
    Kukar, M.: Quality assessment of individual classifications in machine learning and data mining. Knowledge and Information Systems 9(3), 364–384 (2006)CrossRefGoogle Scholar
  6. 6.
    Ripley, B.D.: Pattern Recognition and Neural Networks, Cambridge (1996)Google Scholar
  7. 7.
    R Development Core Team: A Language and Environment for Statistical Computing. In: R Foundation for Statistical Computing, Vienna (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Darko Pevec
    • 1
  • Erik Štrumbelj
    • 1
  • Igor Kononenko
    • 1
  1. 1.Faculty of Computer and Information ScienceUniversity of LjubljanaLjubljanaSlovenia

Personalised recommendations