Recognition of Coding Regions in Genome Alignment

Astakhova, T. V.; Petrova, S. V.; Tsitovich, I. I.; Roytberg, M. A.

doi:10.1007/0-387-29455-4_1

T. V. Astakhova⁴,
S. V. Petrova⁴,
I. I. Tsitovich⁵ &
…
M. A. Roytberg⁴

874 Accesses

Abstract

Gene recognition is an old and important problem. Statistical and homology-based methods work relatively well, if one tries to find long exons or full genes but are unable to recognize relatively short coding fragments. Genome alignments and study of synonymous and non-synonymous substitutions give a chance to overcome this drawback. Our aim is to propose a criterion to distinguish short coding and non-coding fragments of genome alignment and to create an algorithm to locate aligned coding regions. We have developed a method to locate aligned exons in a given alignment. First, we scan the alignment with a window of a fixed size (∼ 40 bp) and assign a score to each window position. The score reflects if numbers K_S of synonymous substitutions, K_N of non-synonymous substitutions, and D of deleted symbols look like those for coding regions. Second, we mark the ‘qualified exon-like’ regions, QELRs, i.e., sequences of consecutive high-scoring windows. Presumably, each QELR contains one exon. Third, we point out an exon within every QELR. All the steps have to be performed twice, for the direct and reverse complement chains independently. Finally, we compare the predictions for two chains to exclude any possible predictions of ‘exon shadows’ on complementary chain instead of real exons. Tests have shown that ∼ 93 % of the marked QELRs have intersections with real exons and ∼ 93 % of the aligned annotated exons intersect the marked QELRs. The total length of marked QELRs is ∼ 1.30 of the total length of annotated exons. About 85 % of the total length of predicted exons belongs to annotated exons. The runtime of the algorithm is proportional to the length of a genome alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Author information

Authors and Affiliations

Institute of Mathematical Problems in Biology, Russian Academy of Sciences, ul. Institutskaia, 4, Pushchino, Russia, 142290
T. V. Astakhova, S. V. Petrova & M. A. Roytberg
Institute of Information Transmission Problems, Russian Academy of Sciences, Bol’shoi Karetnyi per., 19, Moscow, Russia, 127994
I. I. Tsitovich

Authors

T. V. Astakhova
View author publications
You can also search for this author in PubMed Google Scholar
S. V. Petrova
View author publications
You can also search for this author in PubMed Google Scholar
I. I. Tsitovich
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Roytberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. A. Roytberg .

Editor information

Editors and Affiliations

Institute of Cytology & Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
Nikolay Kolchanov
Bielefeld University, Bielefeld, Germany
Ralf Hofestaedt
CNR-ITB Institute of Biomedical Technologies, Segrate (Milano), Italy
Luciano Milanesi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Astakhova, T.V., Petrova, S.V., Tsitovich, I.I., Roytberg, M.A. (2006). Recognition of Coding Regions in Genome Alignment. In: Kolchanov, N., Hofestaedt, R., Milanesi, L. (eds) Bioinformatics of Genome Regulation and Structure II. Springer, Boston, MA. https://doi.org/10.1007/0-387-29455-4_1

Download citation

DOI: https://doi.org/10.1007/0-387-29455-4_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-29450-6
Online ISBN: 978-0-387-29455-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics