Correcting Bias in Statistical Tests for Network Classifier Evaluation

  • Tao Wang
  • Jennifer Neville
  • Brian Gallagher
  • Tina Eliassi-Rad
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)


It is difficult to directly apply conventional significance tests to compare the performance of network classification models because network data instances are not independent and identically distributed. Recent work [6] has shown that paired t-tests applied to overlapping network samples will result in unacceptably high levels (e.g., up to 50%) of Type I error (i.e., the tests lead to incorrect conclusions that models are different, when they are not). Thus, we need new strategies to accurately evaluate network classifiers. In this paper, we analyze the sources of bias (e.g. dependencies among network data instances) theoretically and propose analytical corrections to standard significance tests to reduce the Type I error rate to more acceptable levels, while maintaining reasonable levels of statistical power to detect true performance differences. We validate the effectiveness of the proposed corrections empirically on both synthetic and real networks.


Social network analysis Network classification 


  1. 1.
    Bengio, Y., Grandvalet, Y.: No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research 5, 1089–1105 (2004)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar
  3. 3.
    Franklin, J.N.: Matrix Theory. Dover Publications, Mineola (1993)Google Scholar
  4. 4.
    Macskassy, S., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8, 935–983 (2007)Google Scholar
  5. 5.
    Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning Journal 52(3), 239–281 (2003)CrossRefzbMATHGoogle Scholar
  6. 6.
    Neville, J., Gallagher, B., Eliassi-Rad, T., Wang, T.: Correcting evaluation bias of relational classifiers with network cross validation. Knowledge and Information Systems, 1–25 (2011)Google Scholar
  7. 7.
    Owen, A.B.: Variance of the number of false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 411–426 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tao Wang
    • 1
  • Jennifer Neville
    • 2
  • Brian Gallagher
    • 3
  • Tina Eliassi-Rad
    • 4
  1. 1.Department of Computer SciencePurdue UniversityWest LafayetteUSA
  2. 2.Department of Computer Science and StatisticsPurdue UniversityWest LafayetteUSA
  3. 3.Lawrence Livermore National LaboratoryLivermoreUSA
  4. 4.Department of Computer ScienceRutgers UniversityPiscatawayUSA

Personalised recommendations