Abstract
DNA methylation, as one of the most important epigenetic mechanisms, is critical for deciding cell fate, and hence tightly relevant to understanding disease processes, such as cancer. We will discuss the multiple testing issue in detecting differential methylation in next generation sequencing studies. The detection requires comparing DNA methylation levels at millions of genomic loci across different genomic samples and can be viewed as a large-scale multiple testing problem. Due to low read counts at individual CpG sites, discreteness in the test statistics is nonignorable and brings up many intriguing statistical issues on proper control of false discovery rates (FDRs). Popular FDR control procedures are often underpowered in methylation sequencing data analysis due to the discreteness. We will discuss FDR control methods that accommodate such discreteness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We only include the aBHH method out of the aBH and the aBHH methods, and the AHSU method out of the HSU and the AHSU methods, as those two have been shown yielding better performance than their non-adaptive counterparts in their original references. Besides, due to high computational cost, we do not include the grouping algorithm in our simulation study.
- 2.
When multiple samples are available in each group, other approaches like those based on t-test or the beta-binomial model may also be used, as discussed in Sect. 3. Our focus is not to address how to model the replicates, but using the FET to demonstrate the performance of various methods in the context of multiple testing of discrete hypotheses.
- 3.
References
Akalin, A., Kormaksson, M., Li, S., Garrett-Bakelman, F. E., Figueroa, M. E., Melnick, A., et al. (2012). methylKit: A comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biology, 13(10), R87.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57, 289–300.
Benjamini, Y., & Liu, W. (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference, 82, 163–170.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Bock, C., Tomazou, E. M., Brinkman, A. B., Müller, F., Simmer, F., Gu, H., Jäger, N., et al. (2010). Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature Biotechnology, 28(10), 1106–1114.
Boyes, J., & Bird, A. (1991). DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. Cell, 64(6), 1123–1134.
Chen, X., & Doerge, R. W. (2015). A weighted FDR procedure under discrete and heterogeneous null distributions. Preprint. arXiv:1502.00973.
Chen, X., & Doerge, R. W. (2018). fdrDiscreteNull: False Discovery Rate Procedures Under Discrete and Heterogeneous Null Distributions. R package version 1.3.
Chen, X., Doerge, R. W., & Heyse, J. F. (2018). Multiple testing with discrete data: Proportion of true null hypotheses and two adaptive FDR procedures. Biometrical Journal, 60(4), 761–779.
Cokus, S. J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C. D., et al. (2008). Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature, 452, 215–219.
Dai, X., Lin, N., Li, D., & Wang, T. (2019). A non-randomized procedure for large-scale heterogeneous multiple discrete testing based on randomized tests. Biometrics, 75(2), 638–649.
Döhler, S., Durand, G., & Roquain, E. (2018). New FDR bounds for discrete and heterogeneous tests. Electronic Journal of Statistics, 12(1), 1867–1900.
Durand, G., & Junge, F. (2019). DiscreteFDR: Multiple Testing Procedures with Adaptation for Discrete Tests. R package version 1.2.
Feng, H., Conneely, K. N., & Wu, H. (2014). A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Research, 42(8), e69.
Genovese, C., & Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 499–517.
Geyer, C. J., & Meeden, G. D. (2005). Fuzzy and randomized confidence intervals and p-values. Statistical Science, 20, 358–366.
Gilbert, P. B. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1), 143–158.
Habiger, J. D. (2015). Multiple test functions and adjusted p-values for test statistics with discrete distributions. Journal of Statistical Planning and Inference, 167, 1–13.
Habiger, J. D., & Pena, E. A. (2011). Randomised P-values and nonparametric procedures in multiple testing. Journal of Nonparametric Statistics, 23(3), 583–604.
Hansen, K. D., Langmead, B., & Irizarry, R. A. (2012). BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biology, 13(10), R83.
Harris, R. A., Wang, T., Coarfa, C., Nagarajan, R. P., Hong, C., Downey, S. L., et al. (2010). Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nature Biotechnology, 28(10), 1097–1105.
Heyse, J. F. (2011). A false discovery rate procedure for categorical data. In Recent advances in biostatistics: False discovery rates, survival analysis, and related topics (pp. 43–58). Singapore: World Scientific.
Jin, B., Li, Y., & Robertson, K. D. (2011). DNA methylation: Superior or subordinate in the epigenetic hierarchy? Genes & Cancer, 2(6), 607–617.
Jones, P. A. (2012). Functions of DNA methylation: Islands, start sites, gene bodies and beyond. Nature Reviews Genetics, 13(7), 484–492.
Jühling, F., Kretzmer, H., Bernhart, S. H., Otto, C., Stadler, P. F., & Hoffmann, S. (2016). Metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Research, 26(2), 256–262.
Khulan, B., Thompson, R. F., Ye, K., Fazzari, M. J., Suzuki, M., Stasiek, E., et al. (2006). Comparative isoschizomer profiling of cytosine methylation: The HELP assay. Genome Research, 16(8), 1046–1055.
Kulinskaya, E., & Lewin, A. (2009). On fuzzy familywise error rate and false discovery rate procedures for discrete distributions. Biometrika, 96(1), 201–211.
Laird, P. W. (2010). Principles and challenges of genome-wide DNA methylation analysis. Nature Reviews Genetics, 11(3), 191–203.
Laurent, L., Wong, E., Li, G., Huynh, T., Tsirigos, A., Ong, C. T., et al. (2010). Dynamic changes in the human methylome during differentiation. Genome Research, 20, 320–331.
Lehmann, E. L., & Romano, J. P. (2006). Testing statistical hypotheses. Berlin: Springer.
Levenson, J. M., & Sweatt, J. D. (2005). Epigenetic mechanisms in memory formation. Nature Reviews Neuroscience, 6(2), 108–118.
Liang, K. (2016). False discovery rate estimation for large-scale homogeneous discrete p-values. Biometrics, 72(2), 639–648.
Liao, J., Lin, Y., Selvanayagam, Z. E., & Shih, W. J. (2004). A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics, 20(16), 2694–2701.
Lister, R., Pelizzola, M., Dowen, R. H., Hawkins, R. D., Hon, G., Tonti-Filippini, J., et al. (2009). Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 462, 315–322.
Maunakea, A. K., Nagarajan, R. P., Bilenky, M., Ballinger, T. J., D’souza, C., Fouse, S. D., et al. (2010). Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature, 466(7303), 253–257.
Meissner, A., Mikkelsen, T. S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., et al. (2008). Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature, 454(7205), 766–770.
Park, Y., Figueroa, M. E., Rozek, L. S., & Sartor, M. A. (2014). MethylSig: A whole genome DNA methylation analysis pipeline. Bioinformatics, 30(17), 2414–2422.
Pounds, S., & Cheng, C. (2006). Robust estimation of the false discovery rate. Bioinformatics, 22(16), 1979–1987.
Rakyan, V. K., Down, T. A., Balding, D. J., & Beck, S. (2011). Epigenome-wide association studies for common human diseases. Nature Reviews Genetics, 12(8), 529–541.
Robinson, M. D., Kahraman, A., Law, C. W., Lindsay, H., Nowicka, M., Weber, L. M., & Zhou, X. (2014). Statistical methods for detecting differentially methylated loci and regions. Frontiers in Genetics, 5, 324.
Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Annals of Statistics, 30, 239–257.
Shafi, A., Mitrea, C., Nguyen, T., & Draghici, S. (2017). A survey of the approaches for identifying differential methylation using bisulfite sequencing data. Briefings in Bioinformatics, 19, 737–753.
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 479–498.
Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. The Annals of Statistics, 31, 2013–2035.
Storey, J. D., Taylor, J. E., & Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1), 187–205.
Sun, S., & Yu, X. (2016). HMM-Fisher: Identifying differential methylation using a hidden Markov model and Fisher’s exact test. Statistical Applications in Genetics and Molecular Biology, 15(1), 55–67.
Sun, W., & Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102(479), 901–912.
Suzuki, M. M., & Bird, A. (2008). DNA methylation landscapes: Provocative insights from epigenomics. Nature Reviews Genetics, 9(6), 465–476.
Tang, Y., Ghosal, S., & Roy, A. (2007). Nonparametric Bayesian estimation of positive false discovery rates. Biometrics, 63(4), 1126–1134.
Tarone, R. (1990). A modified Bonferroni method for discrete data. Biometrics, 46, 515–522.
Tocher, K. (1950). Extension of the Neyman-Pearson theory of tests to discontinuous variates. Biometrika, 37, 130–144.
Watt, F., & Molloy, P. L. (1988). Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter. Genes and Development, 2(9), 1136–1143.
Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L., et al. (2005). Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nature Genetics, 37(8), 853–862.
Westfall, P. H., & Wolfinger, R. D. (1997). Multiple tests with discrete distributions. The American Statistician, 51(1), 3–8.
Wu, H., Xu, T., Feng, H., Chen, L., Li, B., Yao, B., et al. (2015). Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Research, 43(21), e141.
Yu, X., & Sun, S. (2016). HMM-DM: Identifying differentially methylated regions using a hidden Markov model. Statistical Applications in Genetics and Molecular Biology, 15(1), 69–81.
Zhang, Y., Liu, H., Lv, J., Xiao, X., Zhu, J., Liu, X., et al. (2011). QDMR: A quantitative method for identification of differentially methylated regions by entropy. Nucleic Acids Research, 39(9), e58.
Ziller, M. J., Hansen, K. D., Meissner, A., & Aryee, M. J. (2014). Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nature Methods, 12(3), 230–232.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hao, G., Lin, N. (2020). Discrete Multiple Testing in Detecting Differential Methylation Using Sequencing Data. In: Zhao, Y., Chen, DG. (eds) Statistical Modeling in Biomedical Research. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-33416-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-33416-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33415-4
Online ISBN: 978-3-030-33416-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)