Skip to main content

Model-Based Clustering of DNA Methylation Array Data

  • Chapter

Part of the book series: Translational Bioinformatics ((TRBIO,volume 7))

Abstract

Clustering refers to the “grouping” of observations into a discrete set of classes, such that observations in the same class are more similar compared to objects between classes. In the context of DNA methylation data, clustering can be used to discover novel molecular subtypes or to identify biological pathways comprised of co-methylated CpG dinucleotides, depending on whether the samples or the CpGs themselves are being clustered. In this chapter, we focus on the problem of clustering samples/subjects on the basis of their methylation profile. We begin by discussing the motivation behind clustering DNA methylation data, the nature of DNA methylation data generated from the Illumina BeadArrays, and three promising model-based clustering methods. In addition to providing a methodological overview of each of the three methods, we also demonstrate their application using a publicly available data set deposited in the Gene Expression Omnibus (GEO) database. Issues such as feature selection and comparison of clustering partitions will also be discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Details regarding the specification of the computing resources used for estimating computational times can be found at http://www.acf.ku.edu/wiki/.

References

  • Houseman EA, Christensen BC, Yeh R-F, Marsit CJ, Karagas MR, Wrensch M, Nelson HH, Wiemels J, Zheng S, Wiencke JK, Kelsey KT. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinform. 2008;9:365

    Article  Google Scholar 

  • Kuan PF, Wang S, Zhou X, Chu H. A statistical framework for illumina DNA methylation arrays. Bioinformatics. 2010;26:2849–55.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Siegmund KD, Laird PW, Laird-Offringa IA. A comparison of cluster analysis methods using DNA methylation data. Bioinformatics. 2004;20:1896–904.

    Article  CAS  PubMed  Google Scholar 

  • Koestler DC, Christensen BC, Marsit CJ, Kelsey KT, Houseman EA. Recursively partitioned mixture model clustering of DNA methylation data using biologically informed correlation structures. Stat Appl Genet Mol Biol. 2013;12:225–40.

    PubMed Central  CAS  PubMed  Google Scholar 

  • Fraley C, Raftery AE. Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc. 2002;97:611–31.

    Article  Google Scholar 

  • Du P, Zhang X, Huang C-C, Jafari N, Kibbe WA, Hou L, Lin SM. Comparison of beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinform. 2010;11:587

    Article  CAS  Google Scholar 

  • Saadati M, Benner A. Statistical challenges of high-dimensional methylation data. Stat Med. 2014;33(30):5347–57

    Article  PubMed  Google Scholar 

  • Zhuang J, Widschwendter M, Teschendorff AE. A comparison of feature selection and classification methods in DNA methylation studies using the illumina infinium platform. BMC Bioinform. 2012;13:59

    Article  CAS  Google Scholar 

  • Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinform. 2012;13:86

    Article  Google Scholar 

  • Koestler DC, Marsit CJ, Christensen BC, Accomando W, Langevin SM, Houseman EA, Nelson HH, Karagas MR, Wiencke JK, Kelsey KT. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol Biomark Prev. 2012;21:1293–302.

    Article  CAS  Google Scholar 

  • Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén S-E, Greco D, Söderhäll C, Scheynius A, Kere J. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS One. 2012;7(7):e41361.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wang X, Laird PW, Hinoue T, Groshen S, Siegmund KD. Non-specific filtering of beta-distributed data. BMC Bioinformatics. 2014;15:199

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Banfield J, Raftery A. Model-based gaussian and non-gaussian clustering. Biometrics. 1993;49:803–21.

    Article  Google Scholar 

  • Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological). 1977;39:1–38.

    Google Scholar 

  • Kaufman L, Rousseeuw P. Finding groups in data: an introduction to cluster analysis. Hoboken, New Jersey: Wiley Interscience; 1990.

    Book  Google Scholar 

  • Fraley C, Raftery AE. Model-based methods of classification: using the mclust software in chemometrics. J Stat Softw. 2007;18:1–13.

    Article  Google Scholar 

  • Schwartz G. Estimating the dimension of a model. Ann Stat. 1978;6:461–4.

    Article  Google Scholar 

  • Chen J. Optimal rate of convergence for finite mixture models. Ann Stat. 1995;23:221–33.

    Article  Google Scholar 

  • Wilhelm-Benartzi CS, Koestler DC, Karagas MR, Flanagan JM, Christensen BC, Kelsey KT, Marsit CJ, Houseman EA, Brown R. Review of processing and analysis methods for DNA methylation array data. Br J Cancer. 2013;109:1394–402.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Morris TJ, Beck S. Analysis pipelines and packages for infinium humanmethylation450 beadchip (450k) data. Methods. 2014;72:3–8.

    Article  PubMed  Google Scholar 

  • Marsit CJ, Christensen BC, Houseman EA, Karagas MR, Wrensch MR, Yeh R-F, Nelson HH, Wiemels JL, Zheng S, Posner MR, McClean MD, Wiencke JK, Kelsey KT. Epigenetic profiling reveals etiologically distinct patterns of DNA methylation in head and neck squamous cell carcinoma. Carcinogenesis. 2009;30:416–22.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Hernandez-Vargas H, Lambert M-P, Le Calvez-Kelm F, Gouysse G, McKay-Chopin S, Tavtigian SV, Scoazec J-Y, Herceg Z. Hepatocellular carcinoma displays distinct DNA methylation signatures with potential as clinical predictors. PLoS One. 2010;5(3):e9749.

    Article  PubMed Central  PubMed  Google Scholar 

  • Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, Fan J-B, Shen R. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98:288–95.

    Article  CAS  PubMed  Google Scholar 

  • Merkle EC, Shaffer VA. Binary recursive partitioning: background, methods, and application to psychology. Br J Math Stat Psychol. 2011;64:161–81.

    Article  PubMed  Google Scholar 

  • Marsit CJ, Koestler DC, Christensen BC, Karagas MR, Houseman EA, Kelsey KT. DNA methylation array analysis identifies profiles of blood-derived DNA methylation associated with bladder cancer. J Clin Oncol. 2011;29:1133–9.

    Article  PubMed Central  PubMed  Google Scholar 

  • Langevin SM, Koestler DC, Christensen BC, Butler RA, Wiencke JK, Nelson HH, Houseman EA, Marsit CJ, Kelsey KT. Peripheral blood dna methylation profiles are indicative of head and neck squamous cell carcinoma: an epigenome-wide association study. Epigenetics. 2012;7:291–9.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Cicek MS, Koestler DC, Fridley BL, Kalli KR, Armasu SM, Larson MC, Wang C, Winham SJ, Vierkant RA, Rider DN, Block MS, Klotzle B, Konecny G, Winterhoff BJ, Hamidi H, Shridhar V, Fan J-B, Visscher DW, Olson JE, Hartmann LC, Bibikova M, Chien J, Cunningham JM, Goode EL. Epigenome-wide ovarian cancer analysis identifies a methylation profile differentiating clear-cell histology with epigenetic silencing of the HERG k+ channel. Hum Mol Genet. 2013;22:3038–47.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Jaccard P. Etude comparative de la distribution florale dans une portion des alpes et des jura. In Bull del la Soc Vaud des Sci Nat. 1901;37:547–79.

    Google Scholar 

  • Rand W. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–50.

    Article  Google Scholar 

  • Mallows C, Fowlkes E. A method for comparing two hierarchical clusterings. J Am Stat Assoc. 1983;78:553–69.

    Article  Google Scholar 

  • Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2:193–218.

    Article  Google Scholar 

  • Milligan G, Cooper M. A study of the comparability of external criteria for hierarchical cluster analysis. Multiv Behav Res. 1986;21:441–58.

    Article  Google Scholar 

  • Ma S, Huang J. Penalized feature selection and classification in bioinformatics. Brief Bioinform. 2008;9:392–403.

    Article  PubMed Central  PubMed  Google Scholar 

  • Pok G, Liu J-CS, Ryu KH. Effective feature selection framework for cluster analysis of microarray data. Bioinformation. 2010;4(8):385–9.

    Article  PubMed Central  PubMed  Google Scholar 

  • Wei H-L, Billings SA. Feature subset selection and ranking for data dimensionality reduction. IEEE Trans Pattern Anal Mach Intell. 2007;29:162–6.

    Article  PubMed  Google Scholar 

  • Luo Y, Wong C-J, Kaz AM, Dzieciatkowski S, Carter KT, Morris SM, Wang J, Willis JE, Makar KW, Ulrich CM, Lutterbaugh JD, Shrubsole MJ, Zheng W, Markowitz SD, Grady WM. Differences in DNA methylation signatures reveal multiple pathways of progression from adenoma to colorectal cancer. Gastroenterology. 2014;147:418–29.e8.

    Google Scholar 

  • Wockner LF, Noble EP, Lawford BR, Young RM, Morris CP, Whitehall VLJ, Voisey J. Genome-wide DNA methylation analysis of human brain tissue from schizophrenia patients. Trans Psychiatry. 2014;4:e339.

    Article  CAS  Google Scholar 

  • Milani L, Lundmark A, Kiialainen A, Nordlund J, Flaegstad T, Forestier E, Heyman M, Jonmundsson G, Kanerva J, Schmiegelow K, Söderhäll S, Gustafsson MG, Lönnerholm G, Syvänen A-C. DNA methylation for subtype classification and prediction of treatment outcome in patients with childhood acute lymphoblastic leukemia. Blood. 2010;115:1214–25.

    Article  CAS  PubMed  Google Scholar 

  • Pacheco SE, Houseman EA, Christensen BC, Marsit CJ, Kelsey KT, Sigman M, Boekelheide K. Integrative DNA methylation and gene expression analyses identify DNA packaging and epigenetic regulatory genes associated with low motility sperm. PLoS One. 2011;6(6):e20280.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Koestler DC, Marsit CJ, Christensen BC, Karagas MR, Bueno R, Sugarbaker DJ, Kelsey KT, Houseman EA. Semi-supervised recursively partitioned mixture models for identifying cancer subtypes. Bioinformatics. 2010;26:2578–85.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2004;2:E108.

    Article  PubMed Central  PubMed  Google Scholar 

  • Witten DM, Tibshirani R. A framework for feature selection in clustering. J Am Stat Assoc. 2010;105:713–26.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR. A census of human cancer genes. Nat Rev Cancer. 2004;4:177–83.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We would like to offer our deepest gratitude to Dr. Joseph Usset and Samuel Turpin for their feedback, suggestions, and comments on this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Devin C. Koestler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Koestler, D.C., Houseman, E.A. (2015). Model-Based Clustering of DNA Methylation Array Data. In: Teschendorff, A. (eds) Computational and Statistical Epigenomics. Translational Bioinformatics, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9927-0_5

Download citation

Publish with us

Policies and ethics