A Preliminary Study on the Prediction of Human Protein Functions

  • Guido Bologna
  • Anne-Lise Veuthey
  • Marco Pagni
  • Lydie Lane
  • Amos Bairoch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6686)


In the human proteome, about 5’000 proteins lack experimentally validated functional information. In this work we propose to tackle the problem of human protein function prediction by three distinct supervised learning schemes: one-versus-all classification; tournament learning; multi-label learning. Target values of supervised learning models are represented by the nodes of a subset of the Gene Ontology, which is widely used as a benchmark for functional prediction. With an independent dataset including very difficult cases the recall measure reached a reasonable performance for the first 50 ranked predictions, on average; however, average precision was quite low.


Gene Ontology Average Precision Average Recall Swiss Institute Predict Protein Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., et al.: Gene prioritization through genomic data fusion. Nat. Biotechnol. 24(5), 537–544 (2006)CrossRefGoogle Scholar
  2. 2.
    Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)CrossRefGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth, Belmont (1984)zbMATHGoogle Scholar
  4. 4.
    Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: web based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research 31(13), 3692–3697 (2003)CrossRefGoogle Scholar
  5. 5.
    Eisenberg, D., Schwarz, E., Komaromy, M., Wall, R.: Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179(1), 125–142 (1984)CrossRefGoogle Scholar
  6. 6.
    Hu, L., Huang, T., Shi, X., Lu, W.C., Cai, Y.D., Chou, K.C.: Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One 6(1), e14556 (2011)Google Scholar
  7. 7.
    Jensen, L.J., Gupta, R., Staerfeldt, H.-H., Brunak, S.: Prediction of human protein function according to gene ontology categories. Bioinformatics 19(5), 635–642 (2003)CrossRefGoogle Scholar
  8. 8.
    Kazawa, H., Izumitani, T., Taira, H., Maeda, E.: Maximal margin labelling for multi-topic text categorization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 649–656. MIT Press, Cambridge (2005)Google Scholar
  9. 9.
    Mewes, H.W., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., Frishman, D.: MIPS: a database for protein sequences and complete genomes. Nucl. Acids Research 27, 44–48 (1999)CrossRefGoogle Scholar
  10. 10.
    Pena-Castillo, L., Tasan, M., Myers, C.L., Lee, H., Joshi, T., Zhang, C., Guan, Y., Leone, M., Pagnani, A., Kim, W.K., et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 9(suppl. 1), S2 (2008)CrossRefGoogle Scholar
  11. 11.
    Ranea, J.A., Yeats, C., Grant, A., Orengo, C.A.: Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes. PLoS Comput. Biol. 3(11), e237 (2007)CrossRefGoogle Scholar
  12. 12.
    The Gene Ontology Consortium. The gene ontology project in 2008. Nucleic Acid Research 36(1), D440–D444 (November 2007)Google Scholar
  13. 13.
    Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. International Journal of Data Warehouse and Mining 3(3), 1–13 (2007)CrossRefGoogle Scholar
  14. 14.
    Vapnik, V.: The nature of statistical learning. Springer, Heidelberg (1995)CrossRefzbMATHGoogle Scholar
  15. 15.
    Vens, C., Struyf, J., Schietgat, L., Dzeroski, S.: Decision trees for hierarchical multi-label classification. Machine Learning 73(2), 185–214 (2008)CrossRefGoogle Scholar
  16. 16.
    Zhang, M.L., Zhou, Z.H.: Multi-label neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18(10), 1338–1351 (2006)CrossRefGoogle Scholar
  17. 17.
    Zhu, M., Gao, L., Guo, Z., Li, Y., Wang, D., Wang, J., Wang, C.: Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities. Gene 391(1-2), 113–119 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Guido Bologna
    • 1
  • Anne-Lise Veuthey
    • 2
  • Marco Pagni
    • 3
  • Lydie Lane
    • 1
    • 4
  • Amos Bairoch
    • 1
    • 4
  1. 1.CALIPHO Group, Swiss Institute of BioinformarticsGeneva 4Switzerland
  2. 2.Swiss-Prot Group, Swiss Institute of BioinformarticsGeneva 4Switzerland
  3. 3.Vital-IT Group, Swiss Institute of BioinformarticsQuartier SorgeGenopodeSwitzerland
  4. 4.Department of Structural Biology and BioinformaticsUniversity of GenevaGeneva 4Switzerland

Personalised recommendations