Skip to main content

Metric Labeling and Semi-metric Embedding for Protein Annotation Prediction

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6577))

Abstract

Computational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, the task is modeled as multi-label classification problems where the labels (functions) are treated independently or semi-independently. However, databases such as the Gene Ontology provide information about the similarities between functions. We explore the use of the Metric Labeling combinatorial optimization problem to make use of heuristically computed distances between functions to make more accurate predictions of protein function in networks derived from both physical interactions and a combination of other data types. To do this, we give a new technique (based on convex optimization) for converting heuristic semimetric distances into a metric with minimum least-squared distortion (LSD). The Metric Labeling approach is shown to outperform five existing techniques for inferring function from networks. These results suggest Metric Labeling is useful for protein function prediction, and that LSD minimization can help solve the problem of converting heuristic distances to a metric.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics, 830–836 (2006)

    Google Scholar 

  2. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE T. on Pat. Anal. Mach. Intell. 23(11), 1222–1239 (2001)

    Article  Google Scholar 

  3. Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources, Second Meeting of The North American Chapter of The Association For Computational Linguistics (2001)

    Google Scholar 

  4. Chekuri, C., Khanna, S., Naor, J., Zosin, L.: A linear programming formulation and approximation algorithms for the metric labeling problem. SIAM J. Discret. Math. 18(3), 608–625 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cheng, J., Cline, M., Martin, J., Finkelstein, D., Awad, T., Kulp, D., Siani-Rose, M.A.: A knowledge-based clustering algorithm driven by Gene Ontology. J. Biopharm. Stat. 14(3), 687–700 (2004)

    Article  MathSciNet  Google Scholar 

  6. Chuzhoy, J., Naor, J.S.: The hardness of metric labeling. In: 45th Annual IEEE Symp. Foundations of Computer Science, pp. 108–114. IEEE Computer Society, Washington, DC (2004)

    Chapter  Google Scholar 

  7. Deng, M., Tu, Z., Sun, F., Chen, T.: Mapping gene ontology to proteins based on protein–protein interaction data. Bioinformatics 20(6), 895–902 (2004)

    Article  Google Scholar 

  8. Dotan-Cohen, D., Kasif, S., Melkman, A.A.: Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering. Bioinformatics 25(14), 1789–1795 (2009)

    Article  Google Scholar 

  9. Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. In: Proc. 35th Annual ACM Symp. on Theory of Computing, pp. 448–455 (2003)

    Google Scholar 

  10. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  11. Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O., Silver, P.A.: Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000)

    Article  Google Scholar 

  12. Gavin, A.C., Bosche, M., Krause, R., et al.: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415(6868), 141–147 (2002)

    Article  Google Scholar 

  13. GNU Linear Programming Kit (2010), http://www.gnu.org/software/glpk/

  14. Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast 18(6), 523–531 (2001)

    Article  Google Scholar 

  15. Ho, Y., Gruhler, A., Heilbut, A., et al.: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415(6868), 180–183 (2002)

    Article  Google Scholar 

  16. Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., O’Shea, E.K.: Global analysis of protein localization in budding yeast. Nature 425(6959), 686–691 (2003)

    Article  Google Scholar 

  17. Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., Sakaki, Y.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl. Acad. Sci. USA 98(8), 4569–4574 (2001)

    Article  Google Scholar 

  18. Jensen, L.J., Gupta, R., Strfeldt, H.H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)

    Article  Google Scholar 

  19. Karaoz, U., Murali, T.M., Letovsky, S., Zheng, Y., Ding, C., Cantor, C.R., Kasif, S.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl. Acad. Sci. USA 101(9), 2888–2893 (2004)

    Article  Google Scholar 

  20. Kleinberg, J., Tardos, E.: Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. In: Proc. 40th Annual IEEE Symp. on Foundations of Computer Science, pp. 14–23 (1999)

    Google Scholar 

  21. Komodakis, N., Tziritas, G.: Approximate labeling via graph-cuts based on linear programming. IEEE T. Pat. Anal. Mach. Intell. 29(8), 1436–1453 (2007)

    Article  Google Scholar 

  22. Kourmpetis, Y.A., van Dijk, A.D., Bink, M.C., van Ham, R.C., Ter Braak, C.J.: Bayesian markov random field analysis for protein function prediction based on network data. PloS One 5(2), e9293+ (2010)

    Article  Google Scholar 

  23. Kui, M.D., Zhang, K., Mehta, S., Chen, T., Sun, F.: Prediction of protein function using protein-protein interaction data. J. Computat. Biol. 10, 947–960 (2002)

    Google Scholar 

  24. Kumar, M.P., Koller, D.: MAP estimation of semi-metric MRFs via hierarchical graph cuts. In: UAI 2009: Proc. Twenty-Fifth Conf. on Uncertainty in Artificial Intelligence, pp. 313–320. AUAI Press, Arlington (2009)

    Google Scholar 

  25. Lee, H., Tu, Z., Deng, M., Sun, F., Chen, T.: Diffusion kernel-based logistic regression models for protein function prediction. OMICS 10(1), 40–55 (2006)

    Article  Google Scholar 

  26. Li, S.Z.: Markov random field modeling in computer vision. Springer, London (1995)

    Book  Google Scholar 

  27. Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. 17th Internat. Conf. on Computational Linguistics, pp. 768–774. Association for Computational Linguistics, Morristown (1998)

    Google Scholar 

  28. Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th Internat. Conf. Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  29. Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(Suppl 1), i302–i310 (2005)

    Article  Google Scholar 

  30. Rain, J.C., Selig, L., De Reuse, H., Battaglia, V., Reverdy, C., Simon, S., Lenzen, G., Petel, F., Wojcik, J., Schachter, V., Chemama, Y., Labigne, A., Legrain, P.: The protein-protein interaction map of Helicobacter pylori. Nature 409(6817), 211–215 (2001)

    Article  Google Scholar 

  31. Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. J. Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  32. Schlicker, A., Domingues, F., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7(1), 302 (2006)

    Article  Google Scholar 

  33. Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18(12), 1257–1261 (2000)

    Article  Google Scholar 

  34. Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007)

    Article  Google Scholar 

  35. Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucl. Acids Res. 34(suppl 1), D535–D539 (2005)

    Google Scholar 

  36. The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nat. Genetics 25(1), 25–29 (2000)

    Google Scholar 

  37. Uetz, P., Giot, L., Cagney, G., et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403(6770), 623–627 (2000)

    Article  Google Scholar 

  38. ILOG CPLEX (2010), http://www.ibm.com/software/integration/optimization/cplex-optimizer

  39. Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21(6), 697–700 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sefer, E., Kingsford, C. (2011). Metric Labeling and Semi-metric Embedding for Protein Annotation Prediction. In: Bafna, V., Sahinalp, S.C. (eds) Research in Computational Molecular Biology. RECOMB 2011. Lecture Notes in Computer Science(), vol 6577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20036-6_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20036-6_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20035-9

  • Online ISBN: 978-3-642-20036-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics