Computational Identification of Short Initial Exons

  • Sayanthan Logeswaran
  • Eliathamby Ambikairajah
  • Julien Epps
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4146)


Despite the existence of many gene prediction programs and their increasing accuracy over the last few years, the accurate identification of short exons remains a challenging problem. In this paper we concentrate on short initial exons and present a method to improve the detection of these short coding regions. The proposed algorithm is based on the Weight Array Method (WAM) and CpG islands. The algorithm was evaluated on a total of 158 sequences containing short initial exons, and achieves an accuracy of up to 73%. By comparison with GENSCAN, the proposed WAM-CpG Island algorithm reveals an improvement of up to 22%. Further, the WAM-CpG island approach can be employed to complement existing gene prediction packages to produce substantial improvements in the correct detection of short initial exons.


Donor Site Gene Prediction Computational Identification Translation Initiation Site Internal Exon 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Rogic, S., Mackworth, A.K., Ouellette, F.B.F.: Evaluation of Gene-Finding Programs on Mammalian Sequences. Genome Research 11, 817–832 (2001)CrossRefGoogle Scholar
  2. 2.
    Brent, M.R., Guigo, R.: Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14(3), 264–272 (2004)CrossRefGoogle Scholar
  3. 3.
    Mathe, C., Sagot, M.F., Schiex, T., Rouze, P.: Current methods of gene prediction, their strengths and weaknesses. Nucl. Acids Res. 30(19), 4103–4117 (2002)CrossRefGoogle Scholar
  4. 4.
    Zhang, M.Q., Marr, T.G.: A weight array method for splicing signal analysis. CABIOS 9(5), 499–509 (1993)Google Scholar
  5. 5.
    Bird, A.: CpG islands as gene markers in the vertebrate nucleus. Trends Genet. 3, 342–347 (1987)CrossRefGoogle Scholar
  6. 6.
    Gardiner-Garden, M., Frommer, M.: CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987)CrossRefGoogle Scholar
  7. 7.
    Burset, M., Guigo, R.: Evaluation of gene structure prediction programs. Genomics 34, 353–357 (1996)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Burge, C., Karlin, S.: Prediction of complete gene structure in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)CrossRefGoogle Scholar
  10. 10.
    Zien, A., et al.: Engineering support vector machines that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)CrossRefGoogle Scholar
  11. 11.
    Guigo, R.: DNA composition, codon usage and exon prediction (2000),
  12. 12.
    Hannenhalli, S., Levy, S.: Promoter prediction in the human genome. Bioinformatics 17(suppl. 1), s90–s96 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sayanthan Logeswaran
    • 1
  • Eliathamby Ambikairajah
    • 1
  • Julien Epps
    • 1
    • 2
  1. 1.School of Electrical Engineering and TelecommunicationsThe University of New South WalesSydneyAustralia
  2. 2.National ICT Australia (NICTA)EveleighAustralia

Personalised recommendations