Active Learning for Protein Function Prediction in Protein-Protein Interaction Networks

  • Wei Xiong
  • Luyu Xie
  • Jihong Guan
  • Shuigeng Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7986)


The high-throughput technologies have led to vast amounts of protein-protein interaction (PPI) data, and a number of approaches based on PPI networks have been proposed for protein function prediction. However, these approaches do not work well if annotated proteins are scarce in the networks. To address this issue, we propose an active learning based approach that uses graph-based centrality metrics to select proper candidates for labeling. We first cluster a PPI network by using the spectral clustering algorithm and select some proper candidates for labeling within each cluster, and then apply a collective classification algorithm to predict protein function based on these annotated proteins. Experiments over two real datasets demonstrate that the active learning based approach achieves better prediction performance by choosing more informative proteins for labeling. Experimental results also validate that betweenness centrality is more effective than degree centrality and closeness centrality in most cases.


Protein function prediction Active learning Collective classification Protein-protein interaction network 


  1. 1.
    Barrell, D., Dimmer, E., Huntley, R., Binns, D., O’Donovan, C., Apweiler, R.: The goa database in 2009 an integrated gene ontology annotation resource. Nucleic Acids Research 37, D396–D403 (2009)Google Scholar
  2. 2.
    Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Molecular Systems Biology 3, 1–13 (2007)Google Scholar
  3. 3.
    Settles, B.: Active learning literature survey. University of Wisconsin, Madison (2010)Google Scholar
  4. 4.
    Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)Google Scholar
  5. 5.
    Sabidussi, G.: The centrality index of a graph. Psychometrika 31, 581–603 (1966)Google Scholar
  6. 6.
    Freeman, L.C.: A set of measures of centrality based upon betweenness. Sociometry 40, 35–41 (1977)Google Scholar
  7. 7.
    Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in netwok data. AI Magazine 29, 93–106 (2008)Google Scholar
  8. 8.
    Ruepp, A., Zollner, A., Maier, D., Albermann, K., et al.: The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research 32, 5539–5545 (2004)Google Scholar
  9. 9.
    Güldener, U., Münsterkötter, M., Kastenmüller, G., Strack, N., et al.: Cygd: the comprehensive yeast genome database. Nucleic Acids Research 33, D364–D368 (2005)Google Scholar
  10. 10.
    Ruepp, A., Doudieu, O., Van den Oever, J., Brauner, B., et al.: The mouse functional genome database (mfungd): functional annotation of proteins in the light of their cellular context. Nucleic Acids Research 34, D568–D571 (2006)Google Scholar
  11. 11.
    Damian, S., Andrea, F., Michael, K., Milan, S., et al.: The string database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research 39, D561–D568 (2011)Google Scholar
  12. 12.
    Bogdanov, P., Singh, A.K.: Molecular Function Prediction Using Neighborhood Features. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 208–217 (2010)Google Scholar
  13. 13.
    Schwikowski, B., Uetz, P., Fields, S.: A Network of Protein-Protein Interactions in Yeast. Nature Biotechnology 18, 1257–1261 (2000)Google Scholar
  14. 14.
    Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22, 1623–1630 (2006)Google Scholar
  15. 15.
    Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl. 1), i302-i310 (2005)Google Scholar
  16. 16.
    Arnau, V., Mars, S., Marin, I.: Iterative cluster analysis of protein interaction data. Bioinformatics 21, 364–378 (2005)Google Scholar
  17. 17.
    Adamcsek, B., Palla, G., Farkas, I.J., Derenyi, I., Vicsek, T.: Cfinder: locating cliques and overlapping modulesin biological networks. Bioinformatics 22, 1021–1023 (2006)Google Scholar
  18. 18.
    Dunn, R., Dudbridge, F., Sanderson, C.: The use of edge-betweenness clustering to investigate biological function in protein interaction networks. BMC Bioinformatics 6, 39 (2005)Google Scholar
  19. 19.
    Chua, H.N., Sung, W.K., Wong, L.: An efficient strategy for extensive integration of diverse biological data for protein function prediction. Bioinformatics 23(24), 3364–3373 (2007)Google Scholar
  20. 20.
    Hu, L., Huang, T., Shi, X., Lu, W., Cai, Y., et al.: Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS ONE 6(1), e14556 (2011)Google Scholar
  21. 21.
    Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1069–1078. ACL Press (2008)Google Scholar
  22. 22.
    Körner, C., Wrobel, S.: Multi-class ensemble-based active learning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 687–694. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1289–1296. MIT Press (2008b)Google Scholar
  24. 24.
    Guo, Y., Greiner, R.: Optimistic active learning using mutual information. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 823–829. AAAI Press (2007)Google Scholar
  25. 25.
    Xu, Z., Akella, R., Zhang, Y.: Incorporating diversity and density in active learning for relevance feedback. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 246–257. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  26. 26.
    Liu, Y.: Active learning with support vector machine applied to gene expression data for cancer classification. Journal of Chemical Information and Computer Sciences 44(6), 1936–1941 (2004)Google Scholar
  27. 27.
    Vogiatzis, D., Tsapatsoulis, N.: Active learning for microarray data. International Journal of Approximate Reasoning 47(1), 85–96 (2008)zbMATHCrossRefGoogle Scholar
  28. 28.
    Mohamed, T.P., Carbonell, J.G., Ganapathiraju, M.K.: Active learning for human protein-protein interaction prediction. BMC Bioinformatics 11(suppl. 1), S57 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Wei Xiong
    • 1
  • Luyu Xie
    • 1
  • Jihong Guan
    • 2
  • Shuigeng Zhou
    • 1
  1. 1.School of Computer Science, and Shanghai Key Lab of Intelligent Information ProcessingFudan UniversityShanghaiChina
  2. 2.Department of Computer Science & TechnologyTongji UniversityShanghaiChina

Personalised recommendations