Conditional Random Fields for Protein Function Prediction

  • Thies Gehrmann
  • Marco Loog
  • Marcel J. T. Reinders
  • Dick de Ridder
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7986)


Markov Random Fields (MRF) have been shown to be good predictors of functional annotation, using protein-protein interaction data. Many other sources of data can also be used in this prediction task, but they are typically not integrated.In this study, we extend a method using MRFs in order to allow the use of additional data.

A conditional random field (CRF) model is proposed as an alternative to an MRF model in order to remove the requirement of modeling relationships between the sources of data. We observe that a substantial performance improvement is possible using additional data, such as genetic interaction networks. The improvement gained from each source of evidence is not the same for each protein function, indicating that each source supplies different information. We demonstrate that CRFs can be used to efficiently integrate various sources of data to predict functional annotations.


Gene Ontology True Positive Functional Annotation Markov Random Fields Conditional Random Field 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Radivojac, P., Clark, W.T., Oron, T.R., Schnoes, A.M., Wittkop, T., Sokolov, A., Graim, K., Funk, E.A.: A large-scale evaluation of computational protein function prediction. Nature Methods 10(3) (January 2013)Google Scholar
  2. 2.
    Deng, M., Zhang, K., Mehta, S., Chen, T., Sun, F.: Prediction of protein function using protein-protein interaction data. Journal of Computational Biology 10(6), 947–960 (2003)CrossRefGoogle Scholar
  3. 3.
    Kourmpetis, Y.A.I., van Dijk, A.D.J., Bink, M.C.A.M., van Ham, R.C.H.J., ter Braak, C.J.F.: Bayesian Markov random field analysis for protein function prediction based on network data. PLoS ONE 5(2), 9293 (2010)Google Scholar
  4. 4.
    Letovsky, S., Kasif, S.: Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19(suppl. 1), 197–204 (2003)Google Scholar
  5. 5.
    Deng, M., Chen, T., Sun, F.: An integrated probabilistic model for functional prediction of proteins. Journal of Computational Biology 11(2-3), 463–475 (2004)CrossRefGoogle Scholar
  6. 6.
    Deng, M., Tu, Z., Sun, F., Chen, T.: Mapping Gene Ontology to proteins based on protein-protein interaction data. Bioinformatics 20(6), 895–902 (2004)CrossRefGoogle Scholar
  7. 7.
    Kourmpetis, Y.A.I., van Dijk, A.D.J., van Ham, R.C.H.J., ter Braak, C.J.F.: Genome-wide computational function prediction of Arabidopsis proteins by integration of multiple data sources. Plant Physiology 155(1), 271–281 (2011)CrossRefGoogle Scholar
  8. 8.
    Gehrmann, T.: Conditional random fields for protein function prediction. thesis, Delft University of Technology, Delft (2012)Google Scholar
  9. 9.
    Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields (November 2010)Google Scholar
  10. 10.
    Collins, S.R., Kemmeren, P., Zhao, X.C., Greenblatt, J.F., Spencer, F., Holstege, F.C.P., Weissman, J.S., Krogan, N.J.: Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Molecular & Cellular Proteomics 6(3), 439–450 (2007)CrossRefGoogle Scholar
  11. 11.
    Michael Ashburner, C.A.: Creating the gene ontology resource: design and implementation. Genome Research 11(8), 1425–1433 (2001)CrossRefGoogle Scholar
  12. 12.
    Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Research 34(suppl. 1), D535–D539 (2006)Google Scholar
  13. 13.
    Stark, C., Su, T.C., Breitkreutz, A., Lourenco, P., Dahabieh, M., Breitkreutz, B.J., Tyers, M., Sadowski, I.: PhosphoGRID: a database of experimentally verified in vivo protein phosphorylation sites from the budding yeast Saccharomyces cerevisiae. Database 2010 (January 2010)Google Scholar
  14. 14.
    Gasch, A.: Megayeast expression dataset (August 2012),
  15. 15.
    Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., Doerks, T., Stark, M., Muller, J., Bork, P., Jensen, L.J., von Mering, C.: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research 39(database issue), D561–D568 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Thies Gehrmann
    • 1
  • Marco Loog
    • 2
  • Marcel J. T. Reinders
    • 1
    • 3
    • 4
  • Dick de Ridder
    • 1
    • 3
    • 4
  1. 1.Delft Bioinformatics LabDelft University of TechnologyDelftThe Netherlands
  2. 2.Pattern Recognition LabDelft University of TechnologyDelftThe Netherlands
  3. 3.Netherlands Bioinformatics CentreNijmegenThe Netherlands
  4. 4.Kluyver Centre for Genomics of Industrial FermentationDelftThe Netherlands

Personalised recommendations