Abstract
DNA methylation has been extensively linked to alterations in gene expression, playing a key role in the manifestation of multiple diseases, especially cancer. Hence, the sequence determinants of methylation and the relationship between methylation and expression are of great interest from a molecular biology perspective. Several models have been suggested to support the prediction of methylation status. These models, however, have two main limitations: (a) they are limited to specific CpG loci; and (b) they are not easily interpretable. We address these limitations using deep learning with attention. We produce a general model that predicts DNA methylation for a given sample in any CpG position based solely on the sample’s gene expression profile and the sequence surrounding the CpG. Depending on gene-CpG proximity, our model attains a Spearman correlation of up to 0.84 for thousands of CpG sites on two separate test sets of CpG positions and subjects (cancer and healthy samples). Importantly, our approach, especially the use of attention, offers a novel framework with which to extract valuable insights from gene expression data when combined with sequence information. We demonstrate this by linking several motifs and genes to methylation activity, including Nodal and Hand1. The code and trained weights are available at: https://github.com/YakhiniGroup/Methylation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhasin, M., et al.: Prediction of methylated CPGS in DNA sequences using a support vector machine. FEBS Lett. 579(20), 4302–4308 (2005)
Chen, X., Ji, Z., Webber, A., Sharrocks, A.D.: Genome-wide binding studies reveal DNA binding specificity mechanisms and functional interplay amongst forkhead transcription factors. Nucl. Acids Res. 44(4), 1566–1578 (2015)
Cooper, D.N., et al.: Methylation-mediated deamination of 5-methylcytosine appears to give rise to mutations causing human inherited disease in CpNpG trinucleotides, as well as in CpG dinucleotides. Hum. Genomics 4(6), 406 (2010)
Dai, H.Q., et al.: TET-mediated DNA demethylation controls gastrulation by regulating Lefty-Nodal signalling. Nature 538(7626), 528 (2016)
Das, R., et al.: Computational prediction of methylation status in human genomic sequences. Proc. Nat. Acad. Sci. 103(28), 10713–10716 (2006)
Eden, E., et al.: Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol. 3(3), e39 (2007)
Ehrlich, M.: DNA methylation in cancer: too much, but also too little. Oncogene 21(35), 5400 (2002)
Fiorito, G., et al.: Oxidative stress and inflammation mediate the effect of air pollution on cardio - and cerebrovascular disease: a prospective study in nonsmokers. Environ. Mol. Mutagen. 59(3), 234–246 (2018)
Grasso, C.S., et al.: Genetic mechanisms of immune evasion in colorectal cancer. Cancer Discov. 8, 730–749 (2018)
Hollenberg, S.M., et al.: Identification of a new family of tissue-specific basic helix-loop-helix proteins with a two-hybrid system. Mol. Cell. Biol. 15(7), 3813–3822 (1995)
Hui, J., et al.: Intronic CA-repeat and CA-rich elements: a new class of regulators of mammalian alternative splicing. EMBO J. 24(11), 1988–1998 (2005)
Irier, H.A., Jin, P.: Dynamics of DNA methylation in aging and Alzheimer’s disease. DNA Cell Biol. 31(S1), S-42 (2012)
Kajiura, K., et al.: Frequent silencing of the candidate tumor suppressor TRIM58 by promoter methylation in early-stage lung adenocarcinoma. Oncotarget 8(2), 2890 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kurdyukov, S., Bullock, M.: DNA methylation analysis: choosing the right method. Biology 5(1), 3 (2016)
Leibovich, L., et al.: Drimust: a web server for discovering rank imbalanced motifs using suffix trees. Nucl. Acids Res. 41(W1), W174–W179 (2013)
Ma, B., et al.: Predicting DNA methylation level across human tissues. Nucl. Acids Res. 42(6), 3515–3528 (2014)
Maor, G.L., et al.: The alternative role of DNA methylation in splicing regulation. Trends Genet. 31(5), 274–280 (2015)
Nejman, D., et al.: Molecular rules governing de novo methylation in cancer. Cancer Res. 74(5), 1475–1483 (2014)
Nichol, K., Pearson, C.E.: CpG methylation modifies the genetic stability of cloned repeat sequences. Genome Res. 12(8), 1246–1256 (2002)
Plumitallo, S., et al.: Functional analysis of a novel eng variant in a patient with hereditary hemorrhagic telangiectasia (HHT) identifies a new Sp1 binding-site. Gene 647, 85–92 (2018)
Raiber, E.A., et al.: A non-canonical DNA structure is a binding motif for the transcription factor Sp1 in vitro. Nucl. Acids Res. 40(4), 1499–1508 (2011)
Wang, Y., et al.: Predicting DNA methylation state of CpG dinucleotide using genome topological features and deep networks. Sci. Rep. 6, 19598 (2016)
Yang, C., et al.: Prevalence of the initiator over the tata box in human and yeast genes and identification of DNA motifs enriched in human tata-less core promoters. Gene 389(1), 52–65 (2007)
Zhang, W., et al.: Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome Biol. 16(1), 14 (2015)
Acknowledgments
We would like to thank the Yakhini Group, and specifically Leon Anavy and Oz Solomon, for valuable discussions and suggestions. We also thank Anthony Mathelier and colleagues from the Kristensen Group for important comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Levy-Jurgenson, A., Tekpli, X., Kristensen, V.N., Yakhini, Z. (2019). Predicting Methylation from Sequence and Gene Expression Using Deep Learning with Attention. In: Holmes, I., Martín-Vide, C., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2019. Lecture Notes in Computer Science(), vol 11488. Springer, Cham. https://doi.org/10.1007/978-3-030-18174-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-18174-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18173-4
Online ISBN: 978-3-030-18174-1
eBook Packages: Computer ScienceComputer Science (R0)