Prediction of CpG Islands as an Intrinsic Clustering Property Found in Many Eukaryotic DNA Sequences and Its Relation to DNA Methylation
The promoter region of around 70% of all genes in the human genome is overlapped by a CpG island (CGI). CGIs have known functions in the transcription initiation and outstanding compositional features like high G+C content and CpG ratios when compared to the bulk DNA. We have shown before that CGIs manifest as clusters of CpGs in mammalian genomes and can therefore be detected using clustering methods. These techniques have several advantages over sliding window approaches which apply compositional properties as thresholds. In this protocol we show how to determine local (CpG islands) and global (distance distribution) clustering properties of CG dinucleotides and how to generalize this analysis to any k-mer or combinations of it. In addition, we illustrate how to easily cross the output of a CpG island prediction algorithm with our methylation database to detect differentially methylated CGIs. The analysis is given in a step-by-step protocol and all necessary programs are implemented into a virtual machine or, alternatively, the software can be downloaded and easily installed.
Key wordsCpG islands Clustering DNA words DNA methylation Virtual machine
- 16.Bernaola-Galván P, Oliver JL, Hackenberg M et al (2012) Segmentation of time series with long-range fractal correlations. Eur Phys J B. https://doi.org/10.1140/epjb/e2012-20969-5
- 19.Dios F, Barturen G, Lebrón R et al (2014) DNA clustering and genome complexity. Comput Biol Chem 53:71–78. https://doi.org/10.1016/j.compbiolchem.2014.08.011 CrossRefPubMedGoogle Scholar