Abstract
Gene recognition is an old and important problem. Statistical and homology-based methods work relatively well, if one tries to find long exons or full genes but are unable to recognize relatively short coding fragments. Genome alignments and study of synonymous and non-synonymous substitutions give a chance to overcome this drawback. Our aim is to propose a criterion to distinguish short coding and non-coding fragments of genome alignment and to create an algorithm to locate aligned coding regions. We have developed a method to locate aligned exons in a given alignment. First, we scan the alignment with a window of a fixed size (∼ 40 bp) and assign a score to each window position. The score reflects if numbers KS of synonymous substitutions, KN of non-synonymous substitutions, and D of deleted symbols look like those for coding regions. Second, we mark the ‘qualified exon-like’ regions, QELRs, i.e., sequences of consecutive high-scoring windows. Presumably, each QELR contains one exon. Third, we point out an exon within every QELR. All the steps have to be performed twice, for the direct and reverse complement chains independently. Finally, we compare the predictions for two chains to exclude any possible predictions of ‘exon shadows’ on complementary chain instead of real exons. Tests have shown that ∼ 93 % of the marked QELRs have intersections with real exons and ∼ 93 % of the aligned annotated exons intersect the marked QELRs. The total length of marked QELRs is ∼ 1.30 of the total length of annotated exons. About 85 % of the total length of predicted exons belongs to annotated exons. The runtime of the algorithm is proportional to the length of a genome alignment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Astakhova, T.V., Petrova, S.V., Tsitovich, I.I., Roytberg, M.A. (2006). Recognition of Coding Regions in Genome Alignment. In: Kolchanov, N., Hofestaedt, R., Milanesi, L. (eds) Bioinformatics of Genome Regulation and Structure II. Springer, Boston, MA. https://doi.org/10.1007/0-387-29455-4_1
Download citation
DOI: https://doi.org/10.1007/0-387-29455-4_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-29450-6
Online ISBN: 978-0-387-29455-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)