Abstract
Computational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, the task is modeled as multi-label classification problems where the labels (functions) are treated independently or semi-independently. However, databases such as the Gene Ontology provide information about the similarities between functions. We explore the use of the Metric Labeling combinatorial optimization problem to make use of heuristically computed distances between functions to make more accurate predictions of protein function in networks derived from both physical interactions and a combination of other data types. To do this, we give a new technique (based on convex optimization) for converting heuristic semimetric distances into a metric with minimum least-squared distortion (LSD). The Metric Labeling approach is shown to outperform five existing techniques for inferring function from networks. These results suggest Metric Labeling is useful for protein function prediction, and that LSD minimization can help solve the problem of converting heuristic distances to a metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics, 830–836 (2006)
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE T. on Pat. Anal. Mach. Intell. 23(11), 1222–1239 (2001)
Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources, Second Meeting of The North American Chapter of The Association For Computational Linguistics (2001)
Chekuri, C., Khanna, S., Naor, J., Zosin, L.: A linear programming formulation and approximation algorithms for the metric labeling problem. SIAM J. Discret. Math. 18(3), 608–625 (2005)
Cheng, J., Cline, M., Martin, J., Finkelstein, D., Awad, T., Kulp, D., Siani-Rose, M.A.: A knowledge-based clustering algorithm driven by Gene Ontology. J. Biopharm. Stat. 14(3), 687–700 (2004)
Chuzhoy, J., Naor, J.S.: The hardness of metric labeling. In: 45th Annual IEEE Symp. Foundations of Computer Science, pp. 108–114. IEEE Computer Society, Washington, DC (2004)
Deng, M., Tu, Z., Sun, F., Chen, T.: Mapping gene ontology to proteins based on protein–protein interaction data. Bioinformatics 20(6), 895–902 (2004)
Dotan-Cohen, D., Kasif, S., Melkman, A.A.: Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering. Bioinformatics 25(14), 1789–1795 (2009)
Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. In: Proc. 35th Annual ACM Symp. on Theory of Computing, pp. 448–455 (2003)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)
Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O., Silver, P.A.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000)
Gavin, A.C., Bosche, M., Krause, R., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868), 141–147 (2002)
GNU Linear Programming Kit (2010), http://www.gnu.org/software/glpk/
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast 18(6), 523–531 (2001)
Ho, Y., Gruhler, A., Heilbut, A., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415(6868), 180–183 (2002)
Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., O’Shea, E.K.: Global analysis of protein localization in budding yeast. Nature 425(6959), 686–691 (2003)
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98(8), 4569–4574 (2001)
Jensen, L.J., Gupta, R., Strfeldt, H.H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)
Karaoz, U., Murali, T.M., Letovsky, S., Zheng, Y., Ding, C., Cantor, C.R., Kasif, S.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl. Acad. Sci. USA 101(9), 2888–2893 (2004)
Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. In: Proc. 40th Annual IEEE Symp. on Foundations of Computer Science, pp. 14–23 (1999)
Komodakis, N., Tziritas, G.: Approximate labeling via graph-cuts based on linear programming. IEEE T. Pat. Anal. Mach. Intell. 29(8), 1436–1453 (2007)
Kourmpetis, Y.A., van Dijk, A.D., Bink, M.C., van Ham, R.C., Ter Braak, C.J.: Bayesian markov random field analysis for protein function prediction based on network data. PloS One 5(2), e9293+ (2010)
Kui, M.D., Zhang, K., Mehta, S., Chen, T., Sun, F.: Prediction of protein function using protein-protein interaction data. J. Computat. Biol. 10, 947–960 (2002)
Kumar, M.P., Koller, D.: MAP estimation of semi-metric MRFs via hierarchical graph cuts. In: UAI 2009: Proc. Twenty-Fifth Conf. on Uncertainty in Artificial Intelligence, pp. 313–320. AUAI Press, Arlington (2009)
Lee, H., Tu, Z., Deng, M., Sun, F., Chen, T.: Diffusion kernel-based logistic regression models for protein function prediction. OMICS 10(1), 40–55 (2006)
Li, S.Z.: Markov random field modeling in computer vision. Springer, London (1995)
Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. 17th Internat. Conf. on Computational Linguistics, pp. 768–774. Association for Computational Linguistics, Morristown (1998)
Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th Internat. Conf. Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(Suppl 1), i302–i310 (2005)
Rain, J.C., Selig, L., De Reuse, H., Battaglia, V., Reverdy, C., Simon, S., Lenzen, G., Petel, F., Wojcik, J., Schachter, V., Chemama, Y., Labigne, A., Legrain, P.: The protein-protein interaction map of Helicobacter pylori. Nature 409(6817), 211–215 (2001)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. J. Artificial Intelligence Research 11, 95–130 (1999)
Schlicker, A., Domingues, F., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7(1), 302 (2006)
Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18(12), 1257–1261 (2000)
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007)
Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucl. Acids Res. 34(suppl 1), D535–D539 (2005)
The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nat. Genetics 25(1), 25–29 (2000)
Uetz, P., Giot, L., Cagney, G., et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770), 623–627 (2000)
ILOG CPLEX (2010), http://www.ibm.com/software/integration/optimization/cplex-optimizer
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21(6), 697–700 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sefer, E., Kingsford, C. (2011). Metric Labeling and Semi-metric Embedding for Protein Annotation Prediction. In: Bafna, V., Sahinalp, S.C. (eds) Research in Computational Molecular Biology. RECOMB 2011. Lecture Notes in Computer Science(), vol 6577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20036-6_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-20036-6_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20035-9
Online ISBN: 978-3-642-20036-6
eBook Packages: Computer ScienceComputer Science (R0)