Table 2 Accuracy of optimized RF and cross validation results across three datasets from DIABIMMUNE research group

From: MegaR: an interactive R package for rapid sample classification and phenotype prediction using metagenome profiles and machine learning

Dataset Data type Optimal model parameter Model accuracy 95% CI Cross validation accuracy
Three country cohort 16S 80%, 100T, 20P 0.9028 0.9382–0.8562 0.8685
WGS 70%, 100T, 10P 0.8864 0.8312–0.9285 0.8803
T1D cohort 16S 80%, 5T, 5P 0.9615 0.8686–0.9928 0.9069
WGS 90%, 100T, 10P 0.9481 0.6774–0.9987 0.9036
Antibiotics cohort 16S 70%, 0T, 0P 0.8772 0.8312–0.9285 0.8643
WGS 80%, 10T, 10P 0.7916 0.6502–0.8951 0.7205
  1. Bold numbers represent highest values for the given data set. 16S RNA and WGS data was tested for each of the three data sets. Optimal model parameters are the values used to obtain the highest accuracy for the data set