Protein Molecular Function Prediction Based on the Phylogenetic Tree

Part of the Communications in Computer and Information Science book series (CCIS, volume 304)


We employ a novel method to construct a phylogenetic tree based on distance matrix among different protein molecular sequences, and present a statistical model to infer specific molecular function for unannotated protein sequences within the phylogenetic tree. Our method produced specific and consistent molecular function prediction across the P-falciparum family. For the P-falciparum family, it achieves 91.2% precision and 76.9% recall, outperforms the related method GOtcha and BLAST. Finally, we intend to improve our method through adopting a more appropriate feature extraction approach from the sequence or a better statistical inference model in the future.


distance matrix phylogenetic tree GO term 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Eisen, J.A.: Phylogenomics: Improving Functional Predictions for Uncharacterized Genes by Evolutionary Analysis. Genome Research (8), 163–167 (1998)Google Scholar
  2. 2.
    Barbara, E.E., Michael, I.J., Kathryn, E.M.: Protein Molecular Function Prediction by Bayesian Phylogenomics. PLoS Computational Biology 1(5), e45 (2005)Google Scholar
  3. 3.
    Barbara, E.E., Michael, I.J., Kathryn, E.M.: A Graphical Model for Predicting Protein Molecular Function. In: ICML, Pittsburgh (2006)Google Scholar
  4. 4.
    Eisen, J.A., Hanawalt, P.C.: A Phylogenomics Study of DNA Repair Genes, proteins, and Processes. Mutation Research (3), 171–213 (1999)Google Scholar
  5. 5.
    Shen, J., Zhang, J., Luo, X.: Predicting Protein–protein Interactions Based Only on Sequences Information. Proceedings of the National Academy of Sciences 104(11), 4337–4341 (2007)CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Camon, E.: The Gene Ontology Annotation (GOA) Database: Sharing Knowledge in Uniprot with Gene Ontology. Nucleic Acids Research (32), 262–266 (2004)CrossRefGoogle Scholar
  8. 8.
    Jukes, T.H., Cantor, C.R.: Evolution of Protein Molecules. Mammalian protein metabolism, pp. 21–132. Academic Press, New York (1969)Google Scholar
  9. 9.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, p. 531. Morgan Kaufmann (1988)Google Scholar
  10. 10.
    Cowell, R.G., Dawid, A.P., Lauritzen, S.L.: Probabilistic Networks and Expert System, 321 p. Springer, New York (2003)Google Scholar
  11. 11.
    Karaoz, U., Murail, T.M., Letovsky, S.: Whole-genome Annotation by Using Evidence Intergration in Functional-linkage Networks. Proceedings of the National Academy of Sciences 101, 2888–2893 (2004)CrossRefGoogle Scholar
  12. 12.
    Martin, D.M.A.: GOtcha: A New Method for Prediction of Protein Function Assessed by the Annotation of Seven Genomes. BMC Bioinformatics (5), 178–195 (2004)CrossRefGoogle Scholar
  13. 13.
    Altschul, S.F.: Basic Local Alignment Search Tool. J. Mol. Biol. (215), 403–410 (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Lu Jian
    • 1
    • 2
  1. 1.School of Information Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina
  2. 2.Intelligent Computing Laboratory, Institute of Intelligent MachinesChinese Academy of SciencesHefeiChina

Personalised recommendations