Identification of DNA CpG Islands Using Inter-dinucleotide Distances
In this study we set to explore the potentialities of the inter-genomic symbols distance for finding CpG islands in DNA sequences. We explore the distance distributions of the inter CpG and SS distance in the independent nucleotide context (reference). We confront the empirical results from the complete human genome, CpG islands and non CpG islands, with the corresponding reference results.
We propose a model to discriminate CpG islands based on some statistical properties of the inter-dinucleotide distances distributions in DNA sequences. The results of this exploratory study suggest that inter-SS symbols distance has high ability to discriminate CpG islands.
KeywordsState Diagram Distance Distribution Reference Distribution Symbol Distance Absorb Markov Chain
- 5.Grinstead, C.M.: Introduction to Probability. American Mathematical Society, Washington, D.C. (1998)Google Scholar
- 9.Illingworth, R., Kerr, A., DeSousa, D., Jäÿrgensen, H., Ellis, P., Stalker, J., Jackson, D., Clee, C., Plumb, R., Rogers, J., Humphray, S., Cox, T., Langford, C., Bird, A.: A novel CpG island set identifies tissue-specific methylation at developmental gene loci. PLoS Biol. 6(1), e22 (2008)CrossRefGoogle Scholar
- 10.Takai, D., Jones, P.: The CpG island searcher: a new WWW resource. Silico Biol. 3(3), 235–240 (2003)Google Scholar