Skip to main content

Discrete Multiple Testing in Detecting Differential Methylation Using Sequencing Data

  • Chapter
  • First Online:
Statistical Modeling in Biomedical Research

Part of the book series: Emerging Topics in Statistics and Biostatistics ((ETSB))

  • 1098 Accesses

Abstract

DNA methylation, as one of the most important epigenetic mechanisms, is critical for deciding cell fate, and hence tightly relevant to understanding disease processes, such as cancer. We will discuss the multiple testing issue in detecting differential methylation in next generation sequencing studies. The detection requires comparing DNA methylation levels at millions of genomic loci across different genomic samples and can be viewed as a large-scale multiple testing problem. Due to low read counts at individual CpG sites, discreteness in the test statistics is nonignorable and brings up many intriguing statistical issues on proper control of false discovery rates (FDRs). Popular FDR control procedures are often underpowered in methylation sequencing data analysis due to the discreteness. We will discuss FDR control methods that accommodate such discreteness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We only include the aBHH method out of the aBH and the aBHH methods, and the AHSU method out of the HSU and the AHSU methods, as those two have been shown yielding better performance than their non-adaptive counterparts in their original references. Besides, due to high computational cost, we do not include the grouping algorithm in our simulation study.

  2. 2.

    When multiple samples are available in each group, other approaches like those based on t-test or the beta-binomial model may also be used, as discussed in Sect. 3. Our focus is not to address how to model the replicates, but using the FET to demonstrate the performance of various methods in the context of multiple testing of discrete hypotheses.

  3. 3.

    As mentioned in Sect. 3.1.1 Gilbert [17], recognized this issue and proposed a grid search method. However, the grid search method is computationally too costly to be included in the simulation study.

References

  1. Akalin, A., Kormaksson, M., Li, S., Garrett-Bakelman, F. E., Figueroa, M. E., Melnick, A., et al. (2012). methylKit: A comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biology, 13(10), R87.

    Google Scholar 

  2. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57, 289–300.

    Article  MathSciNet  MATH  Google Scholar 

  3. Benjamini, Y., & Liu, W. (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference, 82, 163–170.

    Article  MathSciNet  MATH  Google Scholar 

  4. Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.

    Article  MathSciNet  MATH  Google Scholar 

  5. Bock, C., Tomazou, E. M., Brinkman, A. B., Müller, F., Simmer, F., Gu, H., Jäger, N., et al. (2010). Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature Biotechnology, 28(10), 1106–1114.

    Article  Google Scholar 

  6. Boyes, J., & Bird, A. (1991). DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. Cell, 64(6), 1123–1134.

    Article  Google Scholar 

  7. Chen, X., & Doerge, R. W. (2015). A weighted FDR procedure under discrete and heterogeneous null distributions. Preprint. arXiv:1502.00973.

    Google Scholar 

  8. Chen, X., & Doerge, R. W. (2018). fdrDiscreteNull: False Discovery Rate Procedures Under Discrete and Heterogeneous Null Distributions. R package version 1.3.

    Google Scholar 

  9. Chen, X., Doerge, R. W., & Heyse, J. F. (2018). Multiple testing with discrete data: Proportion of true null hypotheses and two adaptive FDR procedures. Biometrical Journal, 60(4), 761–779.

    Article  MathSciNet  MATH  Google Scholar 

  10. Cokus, S. J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C. D., et al. (2008). Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature, 452, 215–219.

    Article  Google Scholar 

  11. Dai, X., Lin, N., Li, D., & Wang, T. (2019). A non-randomized procedure for large-scale heterogeneous multiple discrete testing based on randomized tests. Biometrics, 75(2), 638–649.

    Article  MathSciNet  MATH  Google Scholar 

  12. Döhler, S., Durand, G., & Roquain, E. (2018). New FDR bounds for discrete and heterogeneous tests. Electronic Journal of Statistics, 12(1), 1867–1900.

    Article  MathSciNet  MATH  Google Scholar 

  13. Durand, G., & Junge, F. (2019). DiscreteFDR: Multiple Testing Procedures with Adaptation for Discrete Tests. R package version 1.2.

    Google Scholar 

  14. Feng, H., Conneely, K. N., & Wu, H. (2014). A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Research, 42(8), e69.

    Article  Google Scholar 

  15. Genovese, C., & Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 499–517.

    Article  MathSciNet  MATH  Google Scholar 

  16. Geyer, C. J., & Meeden, G. D. (2005). Fuzzy and randomized confidence intervals and p-values. Statistical Science, 20, 358–366.

    Article  MathSciNet  MATH  Google Scholar 

  17. Gilbert, P. B. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(1), 143–158.

    Article  MathSciNet  MATH  Google Scholar 

  18. Habiger, J. D. (2015). Multiple test functions and adjusted p-values for test statistics with discrete distributions. Journal of Statistical Planning and Inference, 167, 1–13.

    Article  MathSciNet  MATH  Google Scholar 

  19. Habiger, J. D., & Pena, E. A. (2011). Randomised P-values and nonparametric procedures in multiple testing. Journal of Nonparametric Statistics, 23(3), 583–604.

    Article  MathSciNet  MATH  Google Scholar 

  20. Hansen, K. D., Langmead, B., & Irizarry, R. A. (2012). BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biology, 13(10), R83.

    Article  Google Scholar 

  21. Harris, R. A., Wang, T., Coarfa, C., Nagarajan, R. P., Hong, C., Downey, S. L., et al. (2010). Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nature Biotechnology, 28(10), 1097–1105.

    Article  Google Scholar 

  22. Heyse, J. F. (2011). A false discovery rate procedure for categorical data. In Recent advances in biostatistics: False discovery rates, survival analysis, and related topics (pp. 43–58). Singapore: World Scientific.

    Chapter  Google Scholar 

  23. Jin, B., Li, Y., & Robertson, K. D. (2011). DNA methylation: Superior or subordinate in the epigenetic hierarchy? Genes & Cancer, 2(6), 607–617.

    Article  Google Scholar 

  24. Jones, P. A. (2012). Functions of DNA methylation: Islands, start sites, gene bodies and beyond. Nature Reviews Genetics, 13(7), 484–492.

    Article  Google Scholar 

  25. Jühling, F., Kretzmer, H., Bernhart, S. H., Otto, C., Stadler, P. F., & Hoffmann, S. (2016). Metilene: Fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Research, 26(2), 256–262.

    Article  Google Scholar 

  26. Khulan, B., Thompson, R. F., Ye, K., Fazzari, M. J., Suzuki, M., Stasiek, E., et al. (2006). Comparative isoschizomer profiling of cytosine methylation: The HELP assay. Genome Research, 16(8), 1046–1055.

    Article  Google Scholar 

  27. Kulinskaya, E., & Lewin, A. (2009). On fuzzy familywise error rate and false discovery rate procedures for discrete distributions. Biometrika, 96(1), 201–211.

    Article  MathSciNet  MATH  Google Scholar 

  28. Laird, P. W. (2010). Principles and challenges of genome-wide DNA methylation analysis. Nature Reviews Genetics, 11(3), 191–203.

    Article  Google Scholar 

  29. Laurent, L., Wong, E., Li, G., Huynh, T., Tsirigos, A., Ong, C. T., et al. (2010). Dynamic changes in the human methylome during differentiation. Genome Research, 20, 320–331.

    Article  Google Scholar 

  30. Lehmann, E. L., & Romano, J. P. (2006). Testing statistical hypotheses. Berlin: Springer.

    MATH  Google Scholar 

  31. Levenson, J. M., & Sweatt, J. D. (2005). Epigenetic mechanisms in memory formation. Nature Reviews Neuroscience, 6(2), 108–118.

    Article  Google Scholar 

  32. Liang, K. (2016). False discovery rate estimation for large-scale homogeneous discrete p-values. Biometrics, 72(2), 639–648.

    Article  MathSciNet  MATH  Google Scholar 

  33. Liao, J., Lin, Y., Selvanayagam, Z. E., & Shih, W. J. (2004). A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics, 20(16), 2694–2701.

    Article  Google Scholar 

  34. Lister, R., Pelizzola, M., Dowen, R. H., Hawkins, R. D., Hon, G., Tonti-Filippini, J., et al. (2009). Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 462, 315–322.

    Article  Google Scholar 

  35. Maunakea, A. K., Nagarajan, R. P., Bilenky, M., Ballinger, T. J., D’souza, C., Fouse, S. D., et al. (2010). Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature, 466(7303), 253–257.

    Article  Google Scholar 

  36. Meissner, A., Mikkelsen, T. S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., et al. (2008). Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature, 454(7205), 766–770.

    Article  Google Scholar 

  37. Park, Y., Figueroa, M. E., Rozek, L. S., & Sartor, M. A. (2014). MethylSig: A whole genome DNA methylation analysis pipeline. Bioinformatics, 30(17), 2414–2422.

    Article  Google Scholar 

  38. Pounds, S., & Cheng, C. (2006). Robust estimation of the false discovery rate. Bioinformatics, 22(16), 1979–1987.

    Article  Google Scholar 

  39. Rakyan, V. K., Down, T. A., Balding, D. J., & Beck, S. (2011). Epigenome-wide association studies for common human diseases. Nature Reviews Genetics, 12(8), 529–541.

    Article  Google Scholar 

  40. Robinson, M. D., Kahraman, A., Law, C. W., Lindsay, H., Nowicka, M., Weber, L. M., & Zhou, X. (2014). Statistical methods for detecting differentially methylated loci and regions. Frontiers in Genetics, 5, 324.

    Article  Google Scholar 

  41. Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Annals of Statistics, 30, 239–257.

    Article  MathSciNet  MATH  Google Scholar 

  42. Shafi, A., Mitrea, C., Nguyen, T., & Draghici, S. (2017). A survey of the approaches for identifying differential methylation using bisulfite sequencing data. Briefings in Bioinformatics, 19, 737–753.

    Article  Google Scholar 

  43. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 479–498.

    Article  MathSciNet  MATH  Google Scholar 

  44. Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. The Annals of Statistics, 31, 2013–2035.

    Article  MathSciNet  MATH  Google Scholar 

  45. Storey, J. D., Taylor, J. E., & Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1), 187–205.

    Article  MathSciNet  MATH  Google Scholar 

  46. Sun, S., & Yu, X. (2016). HMM-Fisher: Identifying differential methylation using a hidden Markov model and Fisher’s exact test. Statistical Applications in Genetics and Molecular Biology, 15(1), 55–67.

    Article  MathSciNet  MATH  Google Scholar 

  47. Sun, W., & Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102(479), 901–912.

    Article  MathSciNet  MATH  Google Scholar 

  48. Suzuki, M. M., & Bird, A. (2008). DNA methylation landscapes: Provocative insights from epigenomics. Nature Reviews Genetics, 9(6), 465–476.

    Article  Google Scholar 

  49. Tang, Y., Ghosal, S., & Roy, A. (2007). Nonparametric Bayesian estimation of positive false discovery rates. Biometrics, 63(4), 1126–1134.

    Article  MathSciNet  MATH  Google Scholar 

  50. Tarone, R. (1990). A modified Bonferroni method for discrete data. Biometrics, 46, 515–522.

    Article  MATH  Google Scholar 

  51. Tocher, K. (1950). Extension of the Neyman-Pearson theory of tests to discontinuous variates. Biometrika, 37, 130–144.

    Article  MathSciNet  MATH  Google Scholar 

  52. Watt, F., & Molloy, P. L. (1988). Cytosine methylation prevents binding to DNA of a HeLa cell transcription factor required for optimal expression of the adenovirus major late promoter. Genes and Development, 2(9), 1136–1143.

    Article  Google Scholar 

  53. Weber, M., Davies, J. J., Wittig, D., Oakeley, E. J., Haase, M., Lam, W. L., et al. (2005). Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nature Genetics, 37(8), 853–862.

    Article  Google Scholar 

  54. Westfall, P. H., & Wolfinger, R. D. (1997). Multiple tests with discrete distributions. The American Statistician, 51(1), 3–8.

    Google Scholar 

  55. Wu, H., Xu, T., Feng, H., Chen, L., Li, B., Yao, B., et al. (2015). Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Research, 43(21), e141.

    Google Scholar 

  56. Yu, X., & Sun, S. (2016). HMM-DM: Identifying differentially methylated regions using a hidden Markov model. Statistical Applications in Genetics and Molecular Biology, 15(1), 69–81.

    Article  MathSciNet  MATH  Google Scholar 

  57. Zhang, Y., Liu, H., Lv, J., Xiao, X., Zhu, J., Liu, X., et al. (2011). QDMR: A quantitative method for identification of differentially methylated regions by entropy. Nucleic Acids Research, 39(9), e58.

    Article  Google Scholar 

  58. Ziller, M. J., Hansen, K. D., Meissner, A., & Aryee, M. J. (2014). Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nature Methods, 12(3), 230–232.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hao, G., Lin, N. (2020). Discrete Multiple Testing in Detecting Differential Methylation Using Sequencing Data. In: Zhao, Y., Chen, DG. (eds) Statistical Modeling in Biomedical Research. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-33416-1_4

Download citation

Publish with us

Policies and ethics