Conditional Random Fields for Protein Function Prediction
Markov Random Fields (MRF) have been shown to be good predictors of functional annotation, using protein-protein interaction data. Many other sources of data can also be used in this prediction task, but they are typically not integrated.In this study, we extend a method using MRFs in order to allow the use of additional data.
A conditional random field (CRF) model is proposed as an alternative to an MRF model in order to remove the requirement of modeling relationships between the sources of data. We observe that a substantial performance improvement is possible using additional data, such as genetic interaction networks. The improvement gained from each source of evidence is not the same for each protein function, indicating that each source supplies different information. We demonstrate that CRFs can be used to efficiently integrate various sources of data to predict functional annotations.
KeywordsGene Ontology True Positive Functional Annotation Markov Random Fields Conditional Random Field
- 1.Radivojac, P., Clark, W.T., Oron, T.R., Schnoes, A.M., Wittkop, T., Sokolov, A., Graim, K., Funk, E.A.: A large-scale evaluation of computational protein function prediction. Nature Methods 10(3) (January 2013)Google Scholar
- 3.Kourmpetis, Y.A.I., van Dijk, A.D.J., Bink, M.C.A.M., van Ham, R.C.H.J., ter Braak, C.J.F.: Bayesian Markov random field analysis for protein function prediction based on network data. PLoS ONE 5(2), 9293 (2010)Google Scholar
- 4.Letovsky, S., Kasif, S.: Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19(suppl. 1), 197–204 (2003)Google Scholar
- 8.Gehrmann, T.: Conditional random fields for protein function prediction. M.sc. thesis, Delft University of Technology, Delft (2012)Google Scholar
- 9.Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields (November 2010)Google Scholar
- 12.Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Research 34(suppl. 1), D535–D539 (2006)Google Scholar
- 13.Stark, C., Su, T.C., Breitkreutz, A., Lourenco, P., Dahabieh, M., Breitkreutz, B.J., Tyers, M., Sadowski, I.: PhosphoGRID: a database of experimentally verified in vivo protein phosphorylation sites from the budding yeast Saccharomyces cerevisiae. Database 2010 (January 2010)Google Scholar
- 14.Gasch, A.: Megayeast expression dataset (August 2012), http://gasch.genetics.wisc.edu/datasets.html
- 15.Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., Doerks, T., Stark, M., Muller, J., Bork, P., Jensen, L.J., von Mering, C.: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Research 39(database issue), D561–D568 (2011)Google Scholar