Inferring Protein–Protein Interactions from Multiple Protein Domain Combinations

  • Simon P. Kanaan
  • Chengbang Huang
  • Stefan Wuchty
  • Danny Z. Chen
  • Jesús A. Izaguirre
Part of the Methods in Molecular Biology book series (MIMB, volume 541)


The ever accumulating wealth of knowledge about protein interactions and the domain architecture of involved proteins in different organisms offers ways to understand the intricate interplay between interactome and proteome. Ultimately, the combination of these sources of information will allow the prediction of interactions among proteins where only domain composition is known. Based on the currently available protein–protein interaction and domain data of Saccharomyces cerevisiae and Drosophila melanogaster we introduce a novel method, Maximum Specificity Set Cover (MSSC), to predict potential protein–protein interactions. Utilizing interactions and domain architectures of domains as training sets, this algorithm employs a set cover approach to partition domain pairs, which allows the explanation of the underlying protein interaction to the largest degree of specificity. While MSSC in its basic version only considers domain pairs as the driving force between interactions, we also modified the algorithm to account for combinations of more than two domains that govern a protein–protein interaction. This approach allows us to predict the previously unknown protein–protein interactions in S. cerevisiae and D. melanogaster, with a degree of sensitivity and specificity that clearly outscores other approaches. As a proof of concept we also observe high levels of co-expression and decreasing GO distances between interacting proteins. Although our results are very encouraging, we observe that the quality of predictions significantly depends on the quality of interactions, which were utilized as the training set of the algorithm. The algorithm is part of a Web portal available at

Key words

Domain combinations set cover protein interaction prediction 



Danny Chen was supported in part by the NSF under Grant CCF-0515203. Jesús Izaguirre was supported by partial funding from NSF grants IOB-0313730, CCR-0135195, and DBI- 0450067. Stefan Wuchty was supported by the Northwestern Institute of Complexity (NICO).


  1. 1.
    Rain JC, Selig L, DeReuse H, Battaglia V, Reverdy C, Simon S, Lenzen G, Petel F, Wo jcik J, Schächter V, Chemama Y, Labigne A, Legrain P. The protein–protein interaction map of Helicobacter pylori. Nature 2001, 409, 211–215.PubMedCrossRefGoogle Scholar
  2. 2.
    Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, Nishizawa M, Yamamoto K, Kuhara S, Sakaki Y. Towards a protein–protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc Nat Acad Sci USA 2000, 97, 1143–1147.PubMedCrossRefGoogle Scholar
  3. 3.
    Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Nat Acad Sci USA 2001, 98, 4569–4574.PubMedCrossRefGoogle Scholar
  4. 4.
    Uetz P, Giot L, Cagney G, Mansfield T, Judson R, Knight J, Lockshorn D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg J. A comprehensive analysis of protein–protein interactions of Saccharomyces cerevisiae. Nature 2000, 403, 623–627.PubMedCrossRefGoogle Scholar
  5. 5.
    Gavin A, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick J, Michon AM, Cruciat CM, Remor M, Böfert C, Schelder M, Bra jenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley R, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415, 141–147.PubMedCrossRefGoogle Scholar
  6. 6.
    Ho Y, Gruhler A, Heilbut A, Bader G, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutillier K, coauthors. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002, 415, 180–183.PubMedCrossRefGoogle Scholar
  7. 7.
    Jeong H, Mason S, Barabási AL, Oltvai Z. Lethality and centrality in protein networks. Nature 2001, 411, 41–42.PubMedCrossRefGoogle Scholar
  8. 8.
    Walhout A, Sordella R, Lu X, Hartley J, Temple G, Brasch M, Thierry-Mieg N, Vidal M. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 2000, 287, 116–122.PubMedCrossRefGoogle Scholar
  9. 9.
    Li S, Armstrong C, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Ha T, et al. A map of the interactome network of the metazoan C. elegans. Science 2004, 303, 540–543.PubMedCrossRefGoogle Scholar
  10. 10.
    Giot L, Bader J, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao Y, Ooi C, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon C, Finley R Jr, White K, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets R, McKenna M, Chant J, Rothberg J. A protein interaction map of Drosophila melanogaster. Science 2004, 302, 1727–1736.CrossRefGoogle Scholar
  11. 11.
    Enright A, Iliopoulos I, Kyrpides N, Ouzounis C. Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402, 86–90.PubMedCrossRefGoogle Scholar
  12. 12.
    Marcotte E, Pellegrini M, Thompson M, Yeates T, Eisenberg D. A combined algorithm for genomewide prediction of protein function. Nature 1999, 402, 83–86.PubMedCrossRefGoogle Scholar
  13. 13.
    Pellegrini M, Marcotte E, Thompson M, Eisenberg D, Yeates T. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA 1999, 96, 4285–4288.PubMedCrossRefGoogle Scholar
  14. 14.
    Wo jcik J, Schächter V. protein–protein interaction map inference using interacting domain profile pairs. Bioinformatics 2001, 17, 296S–305S.Google Scholar
  15. 15.
    Deng M, Mehta S, Sun F, Cheng T. Inferring domain-domain interactions from protein–protein interactions. Genome Res 2002, 12, 1540–1548.PubMedCrossRefGoogle Scholar
  16. 16.
    Iossifov I, Krauthammer M, Friedman C, Hatzivassiloglou V, Bader J, White K, Rzhetsky A. Probabilistic inference of molecular networks from noisy data sources. Bioinformatics 2004, 20, 1205–1213.PubMedCrossRefGoogle Scholar
  17. 17.
    Sprinzak E, Margalit H. Correlated sequence-signature as markers of protein–protein interaction. J Mol Biol 2001, 311, 681–692.PubMedCrossRefGoogle Scholar
  18. 18.
    Goldberg D, Roth F. Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA 2003, 100, 4372–4376.PubMedCrossRefGoogle Scholar
  19. 19.
    Tong A, Drees B, Nardelli G, Bader G, Branetti B, Castagnoli L, Evangelista M, ferracuti S, Nelson B, Apoluzzi S, et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295, 321–324.PubMedCrossRefGoogle Scholar
  20. 20.
    Albert I, Albert R. Conserved network motifs allow protein–protein interaction prediction. Bioinformatics 2004, 20, 3346–3352.PubMedCrossRefGoogle Scholar
  21. 21.
    Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G. Mint – a molecular interaction database. FEBS Lett. 513, 2002, 135–140.PubMedCrossRefGoogle Scholar
  22. 22.
    Mewes HW, D Frishman UB, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B. MIPS: A database for genomes and protein sequences. Nucl Acids Res 2002, 30, 31–34.PubMedCrossRefGoogle Scholar
  23. 23.
    Bader G, Donaldson I, Wolting C, Ouellette B, Pawson T, Hogue C. BIND – The biomolecular interaction network database. Nucl Acids Res 2001, 29, 242–245.PubMedCrossRefGoogle Scholar
  24. 24.
    Xenarios I, Salwinski L, Duan X, Higney P, Kim SM, Eisenberg D. Dip, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucl Acids Res 2002, 30, 303–305.PubMedCrossRefGoogle Scholar
  25. 25.
    Bader J, Chaudhuri D, Rothberg J, Chant J. Gaining confidence in high-throughput protein interaction networks. Nature Biotech 2004, 22, 78–85.CrossRefGoogle Scholar
  26. 26.
    Apweiler R, Bairoch A, Wu C, Barker W, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin M, Natale D, O’Donovan C, Redaschi N, Yeh L. Uniprot: The universal protein knowledgebase. Nucl Acids Res 2004, 32, D115–D119.PubMedCrossRefGoogle Scholar
  27. 27.
    Mulder N, Apweiler R, Attwood T, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley R, Courcelle E, Das U, Durbin R, LFalquet, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, MKrestyaninova, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard S, Pagni M, Peyruc D, Ponting C, Selengut J, Servant F, Sigrist C, Vaughan R, Zdobnov E. The interpro database, 2003 brings increased coverage and new features. Nucl Acids Res 2003, 31, 315–318.PubMedCrossRefGoogle Scholar
  28. 28.
    Kriventseva E, Fleischmann W, Zdobnov E, Apweiler R. CluSTr: A database of clusters of SWISS-PROT+TrEMBL proteins. Nucl Acids Res 2001, 29, 33–36.PubMedCrossRefGoogle Scholar
  29. 29.
    Consortium G. The gene ontology (go) database and information resource. Nucl Acids Res 2004, 32, D258–D261.CrossRefGoogle Scholar
  30. 30.
    Kersey P, Duarte J, Williams A, Apweiler R, Karavidopoulou Y, Birney E. The international protein index: An integrated database for proteomics experiments. Proteomics 2004, 4, 1985–1988.PubMedCrossRefGoogle Scholar
  31. 31.
    Bateman A, Coin L, Durbin R, Finn R, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer E, Studholme D, Yeats C, Eddy S. The PFAM protein families database. Nucl Acids Res 2004, 32, D138–D141.PubMedCrossRefGoogle Scholar
  32. 32.
    Grigoriev A. A relationship between gene expression and protein interactions on the proteome scale: Analysis of the bacteriophage t7 and the yeast Saccharomyces cerevisiae. Nucl Acids Res 2001, 29: 3513–3519.PubMedCrossRefGoogle Scholar
  33. 33.
    Ge H, Ziu L, Church G, Vidal M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 2001, 29, 482–486.PubMedCrossRefGoogle Scholar
  34. 34.
    Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B. GOToolBox: Functional analysis of gene datasets based on gene ontology. Genome Biol. 2004, 5, R101.Google Scholar
  35. 35.
    Doolittle R. The multiplicity of domains in proteins. Ann Rev Biochem 1995, 64, 287–314.PubMedCrossRefGoogle Scholar
  36. 36.
    Li WH, Gu Z, Wang H. Evolutionary analyses of the human genome. Nature 2001, 409, 847–849.PubMedCrossRefGoogle Scholar
  37. 37.
    Johnson DS. Approximation algorithms for combinatorial problems. J Comput System Sci 1974, 9, 256–278.CrossRefGoogle Scholar
  38. 38.
    Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms, Second Edition. McGraw Hill Boston, MA, 2001.Google Scholar
  39. 39.
    Huang C, Morcos F, Kanaan S, Wuchty S, Chen D, Izaguirre J. Predicting protein–protein interactions from protein domains using a set cover approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2007, 4, 78–87.PubMedCrossRefGoogle Scholar
  40. 40.
    von Mering C, Krause R, Snel B, Cornell M, Oliver S, Fields S, Bork P. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 2003, 31, 399–403.Google Scholar
  41. 41.
    Wuchty S. Topology and evolution in the yeast protein interaction network. Genome Res 2004, 14, 1310–1314.PubMedCrossRefGoogle Scholar
  42. 42.
    Fraser H, Hirsh A, Steinmetz L, Scharfe C, Feldman M. Evolutionary rate in the protein interaction network. Science 2002, 296, 750–752.PubMedCrossRefGoogle Scholar
  43. 43.
    Wuchty S, Oltvai Z, Barabaśi AL. Evolutionary conservation of motif constituents within the yeast protein interaction network. Nat Genet 2003, 35, 176–179.PubMedCrossRefGoogle Scholar
  44. 44.
    Wuchty S, Barabási AL, Ferdig M. Stable evolutionary signal in a yeast protein int eraction network. BMC Evol Biol. 2006, 6, pp. 8.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Simon P. Kanaan
    • 1
  • Chengbang Huang
    • 2
  • Stefan Wuchty
    • 2
  • Danny Z. Chen
    • 2
  • Jesús A. Izaguirre
    • 2
  1. 1.Department of Computer ScienceUniversity of Notre DameNotre DameUSA
  2. 2.Northwestern Institute of ComplexityNorthwestern UniversityEvanstonUSA

Personalised recommendations