HAUCA Curves for the Evaluation of Biomarker Pilot Studies with Small Sample Sizes and Large Numbers of Features

  • Frank KlawonnEmail author
  • Junxi Wang
  • Ina Koch
  • Jörg Eberhard
  • Mohamed Omar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9897)


Biomarker studies often try to identify a combination of measured attributes to support the diagnosis of a specific disease. Measured values are commonly gained from high-throughput technologies like next generation sequencing leading to an abundance of biomarker candidates compared to the often very small sample size. Here we use an example with more than 50,000 biomarker candidates that we want to evaluate based on a sample of only 24 patients. This seems to be an impossible task and finding purely random-based correlations is guaranteed. Although we cannot identify specific biomarkers in such small pilot studies with purely statistical methods, one can still derive whether there are more biomarkers showing a high correlation with the disease under consideration than one would expect in a setting where correlations are purely random. We propose a method based on area under the ROC curve (AUC) values that indicates how much correlations of the biomarkers with the disease of interest exceed pure random effects. We also provide estimations of sample sizes for follow-up studies to actually identify concrete biomarkers and build classifiers for the disease. We also describe how our method can be extended to other performance measures than AUC.


Feature Selection Linear Discriminant Analysis Confidence Band Chronic Liver Disease Patient Multiclass Classification Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    De Angelis, G., Rittenhouse, H., Mikolajczyk, S., Blair, S., Semjonow, A.: Twenty years of PSA: from prostate antigen to tumor marker. Rev. Urol. 9(3), 113–123 (2007)Google Scholar
  2. 2.
    Lichtinghagen, R., Pietsch, D., Bantel, H., Manns, M., Brand, K., Bahr, M.: The enhanced liver fibrosis (ELF) score: normal values, influence factors and proposed cut-off values. J. Hepatol. 59(2), 236–242 (2013)CrossRefGoogle Scholar
  3. 3.
    Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. 99(10), 6562–6566 (2002)CrossRefzbMATHGoogle Scholar
  4. 4.
    Varma, S., Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 7(91), 1 (2006). doi: 10.1186/1471-2105-7-91 Google Scholar
  5. 5.
    Omar, M., Klawonn, F., Brand, S., Stiesch, M., Krettek, C., Eberhard, J.: Transcriptome-wide high-density microarray analysis reveals differential gene transcription in periprosthetic tissue from hips with low-grade infection versus aseptic loosening. J. Arthroplasty (2016, to appear). doi: 10.1016/j.arth.2016.06.036 Google Scholar
  6. 6.
    Hand, D.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)CrossRefGoogle Scholar
  7. 7.
    Flach, P., Hernández-Orallo, J., Ferri, C.: A coherent interpretation of AUC as a measure of aggregated classification performance. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 657–664 (2011)Google Scholar
  8. 8.
    Mason, S.J., Graham, N.E.: Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Q. J. Royal Meteorol. Soc. 128(584), 2145–2166 (2002)CrossRefGoogle Scholar
  9. 9.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Szafranski, S., Wos-Oxley, M., Vilchez-Vargas, R., Jáuregui, R., Plumeier, I., Klawonn, F., Tomasch, J., Meisinger, C., Kühnisch, J., Sztajer, H., Pieper, D., Wagner-Döbler, I.: High-resolution taxonomic profiling of the subgingival microbiome for biomarker discovery and periodontitis diagnosis. Appl. Environ. Microbiol. 81, 1047–1058 (2015)CrossRefGoogle Scholar
  11. 11.
    Demler, O., Pencina, M., D’Agostino, R.S.: Impact of correlation on predictive ability of biomarkers. Stat. Med. 32, 4196–421 (2013)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Montvida, O., Klawonn, F.: Relative cost curves: An alternative to AUC and an extension to 3-class problems. Kybernetika 50, 647–660 (2014)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Hand, D., Till, R.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    Li, J., Fine, J.: ROC analysis with multiple classes and multiple tests: methodology and its application in microarray studies. Biostatistics 9, 566–576 (2008)CrossRefzbMATHGoogle Scholar
  15. 15.
    Li, J., Fine, J.: Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface. J. Stat. Plan. Infer. 139, 4133–4142 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Hernández-Orallo, J.: Pattern Recogn. ROC curves for regression 46(12), 3395–3411 (2013)Google Scholar
  17. 17.
    Novoselova, N., Della Beffa, C., Wang, J., Li, J., Pessler, F., Klawonn, F.: HUM calculator and HUM package for R: easy-to-use software tools for multicategory receiver operating characteristic analysis. Bioinformatics 30, 1635–1636 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Frank Klawonn
    • 1
    • 2
    Email author
  • Junxi Wang
    • 1
  • Ina Koch
    • 3
  • Jörg Eberhard
    • 4
  • Mohamed Omar
    • 5
  1. 1.Biostatistics, Helmholtz Centre for Infection ResearchBraunschweigGermany
  2. 2.Department of Computer ScienceOstfalia University of Applied SciencesWolfenbüettelGermany
  3. 3.Institute for Molecular BioinformaticsJohann Wolfgang Goethe-UniversityFrankfurtGermany
  4. 4.Department of Prosthetic Dentistry and Biomedical Materials ScienceHannover Medical SchoolHannoverGermany
  5. 5.Trauma DepartmentHannover Medical SchoolHannoverGermany

Personalised recommendations