Oral Biology pp 347-364 | Cite as

Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques

  • Moritz KebschullEmail author
  • Panos N. Papapanou
Part of the Methods in Molecular Biology book series (MIMB, volume 1537)


Although contemporary high-throughput –omics methods produce high-dimensional data, the resulting wealth of information is difficult to assess using traditional statistical procedures. Machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups.

Here, we demonstrate the utility of (1) supervised classification algorithms in class validation, and (2) unsupervised clustering in class discovery. We use data from our previous work that described the transcriptional profiles of gingival tissue samples obtained from subjects suffering from chronic or aggressive periodontitis (1) to test whether the two diagnostic entities were also characterized by differences on the molecular level, and (2) to search for a novel, alternative classification of periodontitis based on the tissue transcriptomes.

Using machine learning technology, we provide evidence for diagnostic imprecision in the currently accepted classification of periodontitis, and demonstrate that a novel, alternative classification based on differences in gingival tissue transcriptomes is feasible. The outlined procedures allow for the unbiased interrogation of high-dimensional datasets for characteristic underlying classes, and are applicable to a broad range of –omics data.

Key words

Periodontal disease Aggressive periodontitis Chronic periodontitis Gene expression Transcriptome Gingiva Classification Machine learning 



This work was supported by grants from the German Society for Periodontology (DG PARO) and the German Society for Oral and Maxillo-Facial Sciences (DGZMK) to M.K., and by grants from NIH/NIDCR (DE015649 and DE024735) and by an unrestricted gift from Colgate-Palmolive Inc. to P.N.P. The authors thank Prof. Anne-Laure Boulesteix (Munich, Germany) and Prof. Bettina Grün (Linz, Austria) for their support with the CMA and flexmix packages, respectively.


  1. 1.
    Kebschull M, Guarnieri P, Demmer RT, Boulesteix AL, Pavlidis P, Papapanou PN (2013) Molecular differences between chronic and aggressive periodontitis. J Dent Res 92:1081–1088CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35CrossRefGoogle Scholar
  3. 3.
    Kebschull M, Demmer RT, Grun B, Guarnieri P, Pavlidis P, Papapanou PN (2014) Gingival tissue transcriptomes identify distinct periodontitis phenotypes. J Dent Res 93:459–468CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Slawski M, Daumer M, Boulesteix AL (2008) CMA: a comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9:439CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Wickham H (2007) Reshaping data with the reshape package. J Stat Software 21:1–20CrossRefGoogle Scholar
  6. 6.
    Wilkerson MD, Hayes DN (2010) ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26:1572–1573CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, Venables B (2009) gplots: various R programming tools for plotting data. R Package Version 2(4)Google Scholar
  9. 9.
    Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report no. 597, Department of Statistics, University of Washington, USAGoogle Scholar
  10. 10.
    Armitage GC (1999) Development of a classification system for periodontal diseases and conditions. Ann Periodontol 4:1–6CrossRefPubMedGoogle Scholar
  11. 11.
    Armitage GC, Cullinan MP (2010) Comparison of the clinical features of chronic and aggressive periodontitis. Periodontol 2000 53:12–27CrossRefPubMedGoogle Scholar
  12. 12.
    Gillis J, Mistry M, Pavlidis P (2010) Gene function analysis in complex data sets using ErmineJ. Nat Protoc 5:1148–1159CrossRefPubMedGoogle Scholar
  13. 13.
    Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRefGoogle Scholar
  14. 14.
    Papapanou PN, Abron A, Verbitsky M, Picolos D, Yang J, Qin J, Fine JB, Pavlidis P (2004) Gene expression signatures in chronic and aggressive periodontitis: a pilot study. Eur J Oral Sci 112:216–223CrossRefPubMedGoogle Scholar
  15. 15.
    Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739CrossRefPubMedGoogle Scholar
  16. 16.
    Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Boulesteix AL (2010) Over-optimism in bioinformatics research. Bioinformatics 26:437–439CrossRefPubMedGoogle Scholar
  18. 18.
    Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3Google Scholar
  19. 19.
    Boulesteix AL, Strobl C (2009) Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction. BMC Med Res Methodol 9:85CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of Periodontology, Operative and Preventive Dentistry, Faculty of MedicineUniversity of BonnBonnGermany
  2. 2.Division of Periodontics, Section of Oral, Diagnostic and Rehabilitation SciencesColumbia University College of Dental MedicineNew YorkUSA

Personalised recommendations