Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques

Kebschull, Moritz; Papapanou, Panos N.

doi:10.1007/978-1-4939-6685-1_20

Moritz Kebschull^5,6 &
Panos N. Papapanou⁶

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1537))

3371 Accesses
5 Citations
1 Altmetric

Abstract

Although contemporary high-throughput –omics methods produce high-dimensional data, the resulting wealth of information is difficult to assess using traditional statistical procedures. Machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups.

Here, we demonstrate the utility of (1) supervised classification algorithms in class validation, and (2) unsupervised clustering in class discovery. We use data from our previous work that described the transcriptional profiles of gingival tissue samples obtained from subjects suffering from chronic or aggressive periodontitis (1) to test whether the two diagnostic entities were also characterized by differences on the molecular level, and (2) to search for a novel, alternative classification of periodontitis based on the tissue transcriptomes.

Using machine learning technology, we provide evidence for diagnostic imprecision in the currently accepted classification of periodontitis, and demonstrate that a novel, alternative classification based on differences in gingival tissue transcriptomes is feasible. The outlined procedures allow for the unbiased interrogation of high-dimensional datasets for characteristic underlying classes, and are applicable to a broad range of –omics data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kebschull M, Guarnieri P, Demmer RT, Boulesteix AL, Pavlidis P, Papapanou PN (2013) Molecular differences between chronic and aggressive periodontitis. J Dent Res 92:1081–1088
Article CAS PubMed PubMed Central Google Scholar
Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35
Article Google Scholar
Kebschull M, Demmer RT, Grun B, Guarnieri P, Pavlidis P, Papapanou PN (2014) Gingival tissue transcriptomes identify distinct periodontitis phenotypes. J Dent Res 93:459–468
Article CAS PubMed PubMed Central Google Scholar
Slawski M, Daumer M, Boulesteix AL (2008) CMA: a comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinformatics 9:439
Article CAS PubMed PubMed Central Google Scholar
Wickham H (2007) Reshaping data with the reshape package. J Stat Software 21:1–20
Article Google Scholar
Wilkerson MD, Hayes DN (2010) ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26:1572–1573
Article CAS PubMed PubMed Central Google Scholar
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47
Article PubMed PubMed Central Google Scholar
Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, Schwartz M, Venables B (2009) gplots: various R programming tools for plotting data. R Package Version 2(4)
Google Scholar
Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) MCLUST version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report no. 597, Department of Statistics, University of Washington, USA
Google Scholar
Armitage GC (1999) Development of a classification system for periodontal diseases and conditions. Ann Periodontol 4:1–6
Article CAS PubMed Google Scholar
Armitage GC, Cullinan MP (2010) Comparison of the clinical features of chronic and aggressive periodontitis. Periodontol 2000 53:12–27
Article PubMed Google Scholar
Gillis J, Mistry M, Pavlidis P (2010) Gene function analysis in complex data sets using ErmineJ. Nat Protoc 5:1148–1159
Article CAS PubMed Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218
Article Google Scholar
Papapanou PN, Abron A, Verbitsky M, Picolos D, Yang J, Qin J, Fine JB, Pavlidis P (2004) Gene expression signatures in chronic and aggressive periodontitis: a pilot study. Eur J Oral Sci 112:216–223
Article CAS PubMed Google Scholar
Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739
Article CAS PubMed Google Scholar
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28:882–883
Article CAS PubMed PubMed Central Google Scholar
Boulesteix AL (2010) Over-optimism in bioinformatics research. Bioinformatics 26:437–439
Article CAS PubMed Google Scholar
Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3
Google Scholar
Boulesteix AL, Strobl C (2009) Optimal classifier selection and negative bias in error rate estimation: an empirical study on high-dimensional prediction. BMC Med Res Methodol 9:85
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This work was supported by grants from the German Society for Periodontology (DG PARO) and the German Society for Oral and Maxillo-Facial Sciences (DGZMK) to M.K., and by grants from NIH/NIDCR (DE015649 and DE024735) and by an unrestricted gift from Colgate-Palmolive Inc. to P.N.P. The authors thank Prof. Anne-Laure Boulesteix (Munich, Germany) and Prof. Bettina Grün (Linz, Austria) for their support with the CMA and flexmix packages, respectively.

Author information

Authors and Affiliations

Department of Periodontology, Operative and Preventive Dentistry, Faculty of Medicine, University of Bonn, Welschnonnenstr. 17, Bonn, D-53111, Germany
Moritz Kebschull
Division of Periodontics, Section of Oral, Diagnostic and Rehabilitation Sciences, Columbia University College of Dental Medicine, New York, NY, USA
Moritz Kebschull & Panos N. Papapanou

Authors

Moritz Kebschull
View author publications
You can also search for this author in PubMed Google Scholar
Panos N. Papapanou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moritz Kebschull .

Editor information

Editors and Affiliations

University of Otago Fac. Dentistry, Dunedin, New Zealand
Gregory J. Seymour
Department of Oral Sciences, University of Otago Faculty of Dentistry, Dunedin, New Zealand
Mary P. Cullinan
Sir John Walsh Research Institute, University of Otago Faculty of Dentistry, Dunedin, New Zealand
Nicholas C.K. Heng

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Kebschull, M., Papapanou, P.N. (2017). Exploring Genome-Wide Expression Profiles Using Machine Learning Techniques. In: Seymour, G., Cullinan, M., Heng, N. (eds) Oral Biology. Methods in Molecular Biology, vol 1537. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6685-1_20

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6685-1_20
Published: 07 December 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6683-7
Online ISBN: 978-1-4939-6685-1
eBook Packages: Springer Protocols

Publish with us

Policies and ethics