Translation Initiation Sites Prediction with Mixture Gaussian Models

  • Guoliang Li
  • Tze-Yun Leong
  • Louxin Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3240)


Translation initiation sites (TIS) are important signals in cDNA sequences. Many research efforts have tried to predict TIS in cDNA sequences. In this paper, we propose using mixture Gaussian models to predict TIS in cDNA sequences. Some new global measures are used to generate numerical features from cDNA sequences, such as the length of the open reading frame downstream from ATG, the number of other ATGs upstream and downstream from the current ATGs, etc. With these global features, the proposed method predicts TIS with sensitivity 98% and specificity 92%. The sensitivity is much better than that from other methods. We attribute the improvement in sensitivity to the nature of the global features and the mixture Gaussian models.


Support Vector Machine Feature Vector Mixture Gaussian Model Global Feature Translation Initiation Site 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, P.K., Bafna, V.: Detecting non-adjoining correlations within signals in DNA. In: Proceeding of the 2nd Annual International Conference on Computational Molecular Biology RECOMB, pp. 2–8 (1998)Google Scholar
  2. 2.
    Bishop, C.M.: Neural networks for pattern recognition. Clarendon Press, Oxford (1995)Google Scholar
  3. 3.
    Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)CrossRefGoogle Scholar
  4. 4.
    Cigan, A., Feng, L., Donahue, T.: tRNAi(met) functions in directing the scanning ribosome to the start site of translation. Science 242, 93–97 (1988)CrossRefGoogle Scholar
  5. 5.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via The EM Algorithm. Journal of Royal Statistical Society 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Derst, C., Reczko, M., Hatzigeorgiou, A.: Prediction of human translational initiation sites using a multiple neural network approach. The International Journal of Computers, Systems and Signals 1, 169–179 (2000)Google Scholar
  7. 7.
    Dever, T.E.: Gene-specific regulation by general translation factors. Cell 108, 545–556 (2002)CrossRefGoogle Scholar
  8. 8.
    Fickett, J.W.: The gene identification problem: an overview for developers. Computer & Chemistry 20, 103–108 (1996)CrossRefGoogle Scholar
  9. 9.
    Hatzigeorgiou, A.G.: Translation initiation start prediction in human cDNAs with high accuracy. Bioinformatics 18, 343–350 (2002)CrossRefGoogle Scholar
  10. 10.
    Hosmer, D.W., Lemeshow, S.: Applied logistic regression. John Wiley & Sons, New York (2000)zbMATHCrossRefGoogle Scholar
  11. 11.
    Kozak, M.: At least six nucleotides preceding the initiator codon enhance translation in mammalian cells. Molecular Biology 196, 947–950 (1987)CrossRefGoogle Scholar
  12. 12.
    Kozak, M.: How do eucaryotic ribosomes select initiation regions in messenger RNA? Cell 15, 1109–1123 (1978)CrossRefGoogle Scholar
  13. 13.
    Kozak, M.: Interpreting cDNA sequences: some insights from studies on translation. Mammalian Genome 7 (1996)Google Scholar
  14. 14.
    Kozak, M.: Pushing the limits of the scanning mechanism for initiation of translation. Gene. 299 (2002)Google Scholar
  15. 15.
    Kozak, M.: The scanning model for translation: an update. Cell Biology 108, 229–241 (1989)CrossRefGoogle Scholar
  16. 16.
    Murphy, K.: Bayes Net Toolbox for Matlab (2004),
  17. 17.
    Nadershahi, A., Fahrenkrug, S.C., Ellis, L.B.M.: Comparison of computational methods for identifying translation initiation sites in EST data. BMC Bioinformatics 5 (2004)Google Scholar
  18. 18.
    Nishikawa, T., Ota, T., Isogai, T.: Prediction whether a human cDNA sequence contains initiation codon by combining statistical information and similarity with protein sequences. Bioinformatics 16, 960–967 (2000)CrossRefGoogle Scholar
  19. 19.
    Pedersen, A., Nielsen, H.: Neural network prediction of translation initiation sites in eukaryotes: prespectives for EST and genome analysis. In: Gaasterland, T., Karp, P.D., Karplus, K., Ouzounis, C.A., Sander, C., Valencia, A. (eds.) Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology ISMB 1997, pp. 226–233. AAAI Press, Halkidiki (1997)Google Scholar
  20. 20.
    Pertea, M., Salzberg, S.: A Method to Improve the Performance of Translation Start Site Detection and Its Application for Gene Finding. In: Proceeding of the 2nd Workshop on Algorithms in BioInformatics (WABI 2002), pp. 210–219 (2002)Google Scholar
  21. 21.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  22. 22.
    Salamov, A., Nishikawa, T., Swindells, M.B.: Assessing protein coding region integrity in cDNA sequencing projects. Bioinformatics 14, 384–390 (1998)CrossRefGoogle Scholar
  23. 23.
    Salzberg, S.: A method for identifying splice sites and translational start sites in eukaryotic mRNA. Computer Applications in Biosciences (CABIOS) 13, 365–376 (1997)Google Scholar
  24. 24.
    Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10 (1982)Google Scholar
  25. 25.
    Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
  26. 26.
    Zien, A., Ratsch, G., Mika, S., Scholkopf, B., Lengauer, T., Muller, K.-R.: Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16, 799–807 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Guoliang Li
    • 1
  • Tze-Yun Leong
    • 1
  • Louxin Zhang
    • 2
  1. 1.Medical Computing Laboratory, School of ComputingNational University of SingaporeSingapore
  2. 2.Department of MathematicsNational University of SingaporeSingapore

Personalised recommendations