Advertisement

Recognition of Coding Regions in Genome Alignment

  • T. V. Astakhova
  • S. V. Petrova
  • I. I. Tsitovich
  • M. A. RoytbergEmail author
Chapter
  • 757 Downloads

Abstract

Gene recognition is an old and important problem. Statistical and homology-based methods work relatively well, if one tries to find long exons or full genes but are unable to recognize relatively short coding fragments. Genome alignments and study of synonymous and non-synonymous substitutions give a chance to overcome this drawback. Our aim is to propose a criterion to distinguish short coding and non-coding fragments of genome alignment and to create an algorithm to locate aligned coding regions. We have developed a method to locate aligned exons in a given alignment. First, we scan the alignment with a window of a fixed size (∼ 40 bp) and assign a score to each window position. The score reflects if numbers KS of synonymous substitutions, KN of non-synonymous substitutions, and D of deleted symbols look like those for coding regions. Second, we mark the ‘qualified exon-like’ regions, QELRs, i.e., sequences of consecutive high-scoring windows. Presumably, each QELR contains one exon. Third, we point out an exon within every QELR. All the steps have to be performed twice, for the direct and reverse complement chains independently. Finally, we compare the predictions for two chains to exclude any possible predictions of ‘exon shadows’ on complementary chain instead of real exons. Tests have shown that ∼ 93 % of the marked QELRs have intersections with real exons and ∼ 93 % of the aligned annotated exons intersect the marked QELRs. The total length of marked QELRs is ∼ 1.30 of the total length of annotated exons. About 85 % of the total length of predicted exons belongs to annotated exons. The runtime of the algorithm is proportional to the length of a genome alignment.

Key words

coding region gene recognition genome alignment synonymous and non-synonymous substitution 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • T. V. Astakhova
    • 1
  • S. V. Petrova
    • 1
  • I. I. Tsitovich
    • 2
  • M. A. Roytberg
    • 1
    Email author
  1. 1.Institute of Mathematical Problems in BiologyRussian Academy of SciencesPushchinoRussia
  2. 2.Institute of Information Transmission ProblemsRussian Academy of SciencesMoscowRussia

Personalised recommendations