Statistical Methods for Methylation Data

  • Graham W. HorganEmail author
  • Sok-Peng Chua
Part of the Methods in Molecular Biology book series (MIMB, volume 1589)


Methylation data are continuous variables with most values in a sample lying in a narrow range. In a research project they can either be the outcome, or a variable potentially explaining some of the variation in other outcomes. A range of statistical methods are appropriate depending on the experimental questions. Before the formal analysis is carried out, it is important that data are checked and cleaned. Where batch effects may be present, this should be accounted for in the analysis. Where many methylation sites are investigated in a study, attention should be given to multiple comparisons and false discovery rates, and multivariate methods such as principal component analysis may be useful.


Batch effects Linear model Regression Statistical power Principal component analysis 



This work was supported by the Scottish Government’s Rural and Environment Science and Analytical Services Division.


  1. 1.
    Wu HC, Wang Q, Yang HI, Tsai WY, Chen CJ, Santella RM (2012) Global DNA methylation levels in white blood cells as a biomarker for hepatocellular carcinoma risk: a nested case-control study. Carcinogenesis 33(7):1340–1345CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Canivell S, Ruano EG, Sisó-Almirall A, Kostov B, González-de Paz L, Fernandez-Rebollo E, Hanzu F, Párrizas M, Novials A, Gomis R (2013) Gastric inhibitory polypeptide receptor methylation in newly diagnosed, drug-naïve patients with type 2 diabetes: a case-control study. PLoS One 8(9):e75474CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Kuchiba A, Iwasaki M, Ono H, Kasuga Y, Yokoyama S, Onuma H, Nishimura H, Kusama R, Tsugane S, Yoshida T (2014) Global methylation levels in peripheral blood leukocyte DNA by LUMA and breast cancer: a case-control study in Japanese women. Br J Cancer 110(11):2765–2771CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Su S, Zhu H, Xu X, Wang X, Dong Y, Kapuku G, Treiber F, Gutin B, Harshfield G, Snieder H, Wang X (2014) DNA methylation of the LY86 gene is associated with obesity, insulin resistance, and inflammation. Twin Res Hum Genet 17(3):183–191CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    King WD, Ashbury JE, Taylor SA, Tse MY, Pang SC, Louw JA, Vanner SJ (2014) A cross-sectional study of global DNA methylation and risk of colorectal adenoma. BMC Cancer 14:488CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Voisin S, Almén MS, Moschonis G, Chrousos GP, Manios Y, Schiöth HB (2015) Dietary fat quality impacts genome-wide DNA methylation patterns in a cross-sectional study of Greek preadolescents. Eur J Hum Genet 23(5):654–662CrossRefPubMedGoogle Scholar
  7. 7.
    Cecil CA, Lysenko LJ, Jaffee SR, Pingault JB, Smith RG, Relton CL, Woodward G, McArdle W, Mill J, Barker ED (2014) Environmental risk, Oxytocin Receptor Gene (OXTR) methylation and youth callous-unemotional traits: a 13-year longitudinal study. Mol Psychiatry 9(10):1071–1077CrossRefGoogle Scholar
  8. 8.
    Simpkin AJ, Suderman M, Gaunt TR, Lyttleton O, McArdle WL, Ring SM, Tilling K, Davey Smith G, Relton CL (2015) Longitudinal analysis of DNA methylation associated with birth weight and gestational age. Hum Mol Genet 24(13):3752–3763PubMedPubMedCentralGoogle Scholar
  9. 9.
    Feinberg JI, Bakulski KM, Jaffe AE, Tryggvadottir R, Brown SC, Goldman LR, Croen LA, Hertz-Picciotto I, Newschaffer CJ, Daniele Fallin M, Feinberg AP (2015) Paternal sperm DNA methylation associated with early signs of autism risk in an autism-enriched cohort. Int J Epidemiol 44:1199CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Bollati V, Schwartz J, Wright R, Litonjua A, Tarantini L, Suh H, Sparrow D, Vokonas P, Baccarelli A (2009) Decline in genomic DNA methylation through aging in a cohort of elderly subjects. Mech Ageing Dev 30(4):234–239CrossRefGoogle Scholar
  11. 11.
    Briollais L, Ozcelik H, Kwiatkowski M, Xu J, Savas S, Olkhov E, Recker F, Kuk C, Hanna S, Fleshner NE, Juvet T, Friedlander M, Li H, Chadwick K, Trachtenberg J, Toi A, Van Der Kwast TH, Diamandis EP, Bapat B, Zlotta AR (2015) Functional role of the kallikrein 6 region of the kallikrein locus in genetic predisposition for aggressive (Gleason ≥8) prostate cancer: fine-mapping and methylation study in a Canadian cohort and the Swiss arm of the European Randomized Study for Prostate Cancer Screening. J Urol Suppl 14(2):e42CrossRefGoogle Scholar
  12. 12.
    Yousefi P, Huen K, Schall RA, Decker A, Elboudwarej E, Quach H, Barcellos L, Holland N (2013) Considerations for normalization of DNA methylation data by Illumina 450K BeadChip assay in population studies. Epigenetics 8(11):1141–1152CrossRefPubMedGoogle Scholar
  13. 13.
    Khan A, Rayner GD (2003) Robustness to non-normality of common tests for the many-sample location problem. J Appl Math Decis Sci 7:187–206CrossRefGoogle Scholar
  14. 14.
    Beasley TM, Erickson S, Allison DB (2009) Rank-based inverse normal transformations are increasingly used, but are they merited? Behav Genet 39:580–595CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Hou L, Zhang X, Tarantini L, Nordio F, Bonzini M, Angelici L, Marinelli B, Rizzo G, Cantone L, Apostoli P, Bertazzi PA, Baccarelli A (2011) Ambient PM exposure and DNA methylation in tumor suppressor genes: a cross-sectional study. Part Fibre Toxicol 8:25. doi: 10.1186/1743-8977-8-25 CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Smith AK, Conneely KN, Newport DJ, Kilaru V, Schroeder JW, Pennell PB, Knight BT, Cubells JC, Stowe ZN, Brennan PA (2012) Prenatal antiepileptic exposure associates with neonatal DNA methylation differences. Epigenetics 7(5):458–463. doi: 10.4161/epi.19617 CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Rusiecki JA, Byrne C, Galdzicki Z, Srikantan V, Chen L, Poulin M, Yan L, Baccarelli A (2013) PTSD and DNA methylation in select immune function gene promoter regions: a repeated measures case-control study of U.S. military service members. Front Psychiatry 4:56CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Inamura K, Yamauchi M, Nishihara R, Lochhead P, Qian ZR, Kuchiba A, Kim SA, Mima K, Sukawa Y, Jung S, Zhang X, Wu K, Cho E, Chan AT, Meyerhardt JA, Harris CC, Fuchs CS, Ogino S (2014) Tumor LINE-1 methylation level and microsatellite instability in relation to colorectal cancer prognosis. J Natl Cancer Inst 106(9): pii: dju195. doi:  10.1093/jnci/dju195
  19. 19.
    Shigeyasu K, Nagasaka T, Mori Y, Yokomichi N, Kawai T, Fuji T, Kimura K, Umeda Y, Kagawa S, Goel A, Fujiwara T (2015) Clinical significance of MLH1 methylation and CpG island methylator phenotype as prognostic markers in patients with gastric cancer. PLoS One 10(6):e0130409. doi: 10.1371/journal.pone.0130409 CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    de Arruda IT, Persuhn DC, de Oliveira NF (2013) The MTHFR C677T polymorphism and global DNA methylation in oral epithelial cells. Genet Mol Biol 36(4):490–493CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Mirabello L, Schiffman M, Ghosh A, Rodriguez AC, Vasiljevic N, Wentzensen N, Herrero R, Hildesheim A, Wacholder S, Scibior-Bentkowska D, Burk RD, Lorincz AT (2013) Elevated methylation of HPV16 DNA is associated with the development of high grade cervical intraepithelial neoplasia. Int J Cancer 132(6):1412–1422CrossRefPubMedGoogle Scholar
  22. 22.
    Melnikov A, Scholtens D, Godwin A, Levenson V (2009) Differential methylation profile of ovarian cancer in tissues and plasma. J Mol Diagn 11(1):60–65CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Beggs AD, Jones A, El-Bahrawy M, Abulafi M, Hodgson SV, Tomlinson IP (2013) Whole-genome methylation analysis of benign and malignant colorectal tumours. J Pathol 229(5):697–704CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Bonello N, Sampson J, Burn J, Wilson IJ, McGrown G, Margison GP, Thorncroft M, Crossbie P, Povey AC, Santibanez-Koref M, Walters K (2013) Bayesian inference supports a location and neighbour-dependent model of DNA methylation propagation at the MGMT gene promoter in lung tumours. J Theor Biol 336:87–95CrossRefPubMedGoogle Scholar
  25. 25.
    Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Cleveland WS, Devlin SJ (1988) Locally-weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 83:596–610CrossRefGoogle Scholar
  27. 27.
    Yang L, Tong ML, Chi X, Zhang M, Zhang CM, Guo XR (2012) Genomic DNA methylation changes in NYGGF4-overexpression 3T3-L1 adipocytes. Int J Mol Sci 13(12):15575–15587CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Li B, Lu Q, Song ZG, Yang L, Jin H, Li ZG, Zhao TJ, Bai YF, Zhu J, Chen HZ, Xu ZY (2013) Functional analysis of DNA methylation in lung cancer. Eur Rev Med Pharmacol Sci 17(9):1191–1197PubMedGoogle Scholar
  29. 29.
    Finer S, Mathews C, Lowe R, Smart M, Hillman S, Foo L, Sinha A, Williams D, Rakyan VK, Hitman GA (2015) Maternal gestational diabetes is associated with genome-wide DNA methylation variation in placenta and cord blood of exposed offspring. Hum Mol Genet 24(11):3021–3029CrossRefPubMedGoogle Scholar
  30. 30.
    del Rosario MC, Ossowski V, Knowler WC, Bogardus C, Baier LJ, Hanson RL (2014) Potential epigenetic dysregulation of genes associated with MODY and type 2 diabetes in humans exposed to a diabetic intrauterine environment: an analysis of genome-wide DNA methylation. Metabolism 63(5):654–660CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Addelman S (1969) The generalized randomized block design. Am Stat 23(4):35–36. doi: 10.2307/2681737 Google Scholar
  32. 32.
    Bailey RA (2008) Design of comparative experiments. Cambridge University Press, Cambridge. ISBN 978-0-521-68357-9CrossRefGoogle Scholar
  33. 33.
    Sun Z, Chai HS, Wu Y, White WM, Donkena KV, Klein CJ, Garovic VD, Therneau TM, Kocher JP (2011) Batch effect correction for genome-wide methylation data with Illumina Infinium platform. BMC Med Genomics 4:84CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Cameron AC, Trivedi PK (1998) Regression analysis of count data. Cambridge University Press, Cambridge. ISBN 0-521-63201-3CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Biomathematics and StatisticsUniversity of AberdeenAberdeenUK

Personalised recommendations