Skip to main content

Abstract

Gene recognition is an old and important problem. Statistical and homology-based methods work relatively well, if one tries to find long exons or full genes but are unable to recognize relatively short coding fragments. Genome alignments and study of synonymous and non-synonymous substitutions give a chance to overcome this drawback. Our aim is to propose a criterion to distinguish short coding and non-coding fragments of genome alignment and to create an algorithm to locate aligned coding regions. We have developed a method to locate aligned exons in a given alignment. First, we scan the alignment with a window of a fixed size (∼ 40 bp) and assign a score to each window position. The score reflects if numbers KS of synonymous substitutions, KN of non-synonymous substitutions, and D of deleted symbols look like those for coding regions. Second, we mark the ‘qualified exon-like’ regions, QELRs, i.e., sequences of consecutive high-scoring windows. Presumably, each QELR contains one exon. Third, we point out an exon within every QELR. All the steps have to be performed twice, for the direct and reverse complement chains independently. Finally, we compare the predictions for two chains to exclude any possible predictions of ‘exon shadows’ on complementary chain instead of real exons. Tests have shown that ∼ 93 % of the marked QELRs have intersections with real exons and ∼ 93 % of the aligned annotated exons intersect the marked QELRs. The total length of marked QELRs is ∼ 1.30 of the total length of annotated exons. About 85 % of the total length of predicted exons belongs to annotated exons. The runtime of the algorithm is proportional to the length of a genome alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. A. Roytberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Astakhova, T.V., Petrova, S.V., Tsitovich, I.I., Roytberg, M.A. (2006). Recognition of Coding Regions in Genome Alignment. In: Kolchanov, N., Hofestaedt, R., Milanesi, L. (eds) Bioinformatics of Genome Regulation and Structure II. Springer, Boston, MA. https://doi.org/10.1007/0-387-29455-4_1

Download citation

Publish with us

Policies and ethics