Evidence of probabilistic behaviour in protein interaction networks
- 4.2k Downloads
Data from high-throughput experiments of protein-protein interactions are commonly used to probe the nature of biological organization and extract functional relationships between sets of proteins. What has not been appreciated is that the underlying mechanisms involved in assembling these networks may exhibit considerable probabilistic behaviour.
We find that the probability of an interaction between two proteins is generally proportional to the numerical product of their individual interacting partners, or degrees. The degree-weighted behaviour is manifested throughout the protein-protein interaction networks studied here, except for the high-degree, or hub, interaction areas. However, we find that the probabilities of interaction between the hubs are still high. Further evidence is provided by path length analyses, which show that these hubs are separated by very few links.
The results suggest that protein-protein interaction networks incorporate probabilistic elements that lead to scale-rich hierarchical architectures. These observations seem to be at odds with a biologically-guided organization. One interpretation of the findings is that we are witnessing the ability of proteins to indiscriminately bind rather than the protein-protein interactions that are actually utilized by the cell in biological processes. Therefore, the topological study of a degree-weighted network requires a more refined methodology to extract biological information about pathways, modules, or other inferred relationships among proteins.
KeywordsDegree Distribution Protein Interaction Network Average Path Length Poissonian Degree Distribution Cell Component Category
Experimental protein-protein interaction (PPI) data and related networks, obtained from high-throughput methodology as well as hand-curation, are being widely used to probe the nature of biological organization and extract functional relationships among sets of proteins [1, 2]. What has not been appreciated is that the guiding principles involved in assembling these networks may exhibit considerable probabilistic behaviour. Here, we show that the probability of an interaction between two proteins is generally proportional to the product of their individual numbers of interacting partners (or degrees) and discuss the consequences of this for probing PPI networks. Understanding the underlying organizational principles in assembling PPI networks holds the key for interpreting and analyzing the observed interactions.
High-throughput methodologies [3, 4, 5, 6] to determine PPI networks have been used to probe the interactome of a range of organisms. The organization of these interaction networks has been studied using graph-theoretical techniques [7, 8, 9] to find global characteristics that can be mapped back to biological phenomena, such as evolutionary conserved interactions, pathway or module organization, and localization of essential proteins in the network, to mention a few. Since we know that outcomes of cellular actions are biologically "deterministic" in the sense that cells use energy, synthesize proteins, duplicate DNA, etc., the analysis of PPI networks is aimed at finding and extracting causative components. If this information is to be mined from a global dataset, it is vital to have an accurate model of the architecture of the determined PPI networks. The incorporation of the underlying determining principle of PPI organization into graph-theoretical topological studies will provide a baseline from which biologically-relevant insights could be extracted. For example, a guided biological framework implies that cellular processes consist of precise and unique protein-protein interactions, whereas a probabilistic model is suggestive of an underlying principle that is more chemical than biological, describing the ability of proteins to bind.
We show here that currently available PPI data support the latter interpretation and demonstrate that the probability of an interaction between two proteins is proportional to their numbers of interacting partners. The observations suggest that PPI networks are almost completely probabilistic and, therefore, in a proteome context, PPI interactions for specific biological processes are generally not distinct. From a purely biological point of view, the knowledge of any potential interactions between proteins is useful. However, by identifying common themes in large PPI networks, the underlying principles responsible for the discovered interactions may become more apparent.
Networks can be constructed directly from probabilistic procedures where the interactions, or edges, between two nodes is determined from an a priori probability distribution of edges, the simplest being the Erdös-Rényi random model [10, 11]. However, biological networks, including PPI typically show power-law scaling in their degree distributions, in that the probability of any node having a given number of interactions follows a power law [12, 13, 14]. As such, the Erdös-Rényi model, which generates Poissonian degree distributions, is an unsuitable archetype for PPI networks. Networks with power-law degree distributions can be constructed using a number of techniques, including those based on preferential attachment [15, 16], duplication [17, 18, 19], and hierarchical [20, 21] approaches. Alternatively, the geometric random model generates networks that nearly follow a power-law distribution . While each of these models may have qualitatively simulated biological networks, none have consistently and accurately reproduced properties of individual PPI networks.
Here, we describe insights into the topologies of PPI networks that should serve to enhance the development of future models. A degree-weighted network is one in which the probability of an interaction between two nodes is proportional to the product of their degrees, i.e., P ij ∝ k i k j , where k i and k j are the degrees, or number of interactions, associated with nodes i and j, respectively . A type of degree-weighted network denoted "STICKY"  has been proposed as a model for PPI networks on the basis of similarities in derived global, or average, network properties, e.g., graphlet frequencies and average clustering coefficients. However, this model generates far too many nodes of zero degree and is therefore an unsuitable prototype for PPI networks. It is thus of importance to both qualitatively and quantitatively ascertain the extent of degree-weighted behaviour in biological networks. Here, we explore the nature of the protein-protein connectivities more directly and conclusively demonstrate that PPI networks indeed contain degree-weighted elements.
Probabilistic behaviour in protein interaction networks
Properties of PPI networks
Number of proteins
Number of interactions
γ† (× 10-5)
γ(cal)‡ (× 10-5)
1.00 ± 0.02
1.01 ± 0.02
1.02 ± 0.02
1.03 ± 0.06
1.15 ± 0.08
1.06 ± 0.04
1.08 ± 0.07
1.03 ± 0.09
0.99 ± 0.05
If we express the probability of interaction as P(k1, k2) = γ(k1k2) θ , then the power, θ, and proportionality constant, γ, can be determined for each network by linear regression on data with product degrees less than the cutoff (Table 1). We find that all powers, θ, are very close to one, which is consistent with a probability function that is linear in each degree [23, 24]. The proportionality constants γ determined from the regressions can also be calculated from normalizations via γ(cal) = E/Σi<j(k i k j ), where E is the total number of interactions in the network and the summation is over all pairs of proteins. We find that the fitted and calculated proportionality constants are in good agreement (Table 1). Therefore, not only is degree-weighted behaviour evident in the networks but this property can straightforwardly be extracted, and modelled by P ij = γk i k j , where the proportionality constant γ is determined from the degrees of the proteins.
Having demonstrated that PPI networks exhibit degree-weighted behaviour up to a certain value of the degree product k1k2, we turn our attention to these nonconforming regions of the networks. Of the networks analyzed here, only that of P. falciparum (Figure 1i) shows a degree-weighted tendency throughout. In terms of the number of interactions, this network is the second smallest only to that of the high-confidence network of Caenorhabditis elegans (Figure 1h), which shows more consistent behaviour in the high-degree product range than the other networks. However, there does not seem to be any association between levels of consistency and the sizes of the PPI networks. The nature of the deviations from degree-weighted behaviour is similar in all networks (Figure 1) and consists of a levelling off in values of P(k1, k2) together with increased variability. An important observation is that the probabilities of interaction in these high-degree areas are still quite high when compared to the well-behaved, lower-degree interaction regions. Thus, even though the high-degree nodes (or hub proteins) do not seem to obey degree-weighted behaviour, they still prefer to interact with each other rather than with lower-degree proteins. These findings are similar to that reported previously, in that the hub proteins act somewhat differently to the remainder of the proteins . However, in contrast, we find that interactions between hub proteins have high probability compared to an interaction between low-degree nodes. It has been commonly accepted that hubs in a network avoid each other , however, we do not find this to be so.
Impact of degree-weighted behaviour upon network topology
The path length maps clearly demonstrate that almost every high-degree protein has another high-degree protein located within one or two steps. This implies that any existing modules that incorporate one or more high-degree proteins are very likely to overlap or neighbour each other. As such, it is doubtful that isolated modules, or dense clusters, will contain high-degree proteins. Rather, they might contain proteins of more modest degree. However, the maps also indicate that any protein is, on the average, within three steps of a hub. Therefore, isolated complexes, if they exist, are likely to be few steps away from a high-degree protein. The observed trends in degree-weighted behaviour and shortest path lengths suggests that the PPI networks are extremely dense in their core, or interconnected hub region, and become somewhat sparser as the number of steps from the core is increased. Therefore, if any concentrated clusters are identified by some graph theoretical criterion, then there are probably many other complexes satisfying, or very nearly satisfying, this criterion. Thus, the concept of an isolated module becomes indistinct.
Analogy between degree-weighted connectivity and randomness
Discussion and Conclusion
The degree-weighted nature of PPI networks as well as the hub grouping present a quandary in that it implies that the assembly of these networks may be less biologically guided and more probabilistic in nature. One reason for this may be that high-throughput methods make little assumption about a protein's locality in the cell and therefore allow for more interactions than might be observed in vivo. In fact, only 40–50% of the identified interactions from high-throughput yeast two-hybrid (Y2H) analyses of S. cerevisiae were between proteins occurring in the same cellular compartment [27, 28]. However, the Yeast-CORE PPI network, which is considered to be high confidence and has a high conservation of interactions between proteins of the same compartment , exhibits a high level of degree-weighted behaviour (Figure 1d). Another consideration is that the various approaches to identify protein-protein interactions unintentionally bias their collation from the different functional and cell component categories . However, all the PPI networks studied here show similar degree-weighted connectivity even though five of them (Figures 1b (D. melanogaster), 1e (S. cerevisiae), 1g–h (C. elegans), and 1i (P. falciparum)) are almost completely determined from Y2H screens, while the remaining four are compiled from a variety of experimental sources (see Additional file 1).
It could also be that PPI networks determined from high-throughput methods contain non-specific interactions. Such variability is not unexpected considering the large amount of irreproducibility of once-identified interactions . In such a case, we might expect to see similar probabilistic behaviour as that observed here. Contrary to this, though, the high-confidence network of C. elegans , which contains interactions found in three independent repeated experiments, exhibits clear degree-weighted characteristics (Figure 1h).
Obviously, protein-protein interactions are necessary for a myriad of biological processes, however, if the event is "controlled" by other time- and location-dependent processes, the actual binding or interaction could be of secondary importance. If degree-weighted behaviour is observed in a network, i.e., if protein interactions appear probabilistic, an analysis of expected binding events will determine whether the observed binding events are guided by their interactions or just by their ability to bind. This will greatly enhance the capability of interpreting and extracting biological information from protein-protein interaction networks. The findings presented here provide a cautionary note on the biological interpretation of large PPI networks. One interpretation of the observed degree-weighted networks is that we are witnessing the ability to bind, and not necessarily what connections/interactions are actually present in the cell. The true biological connections that are used in a pathway or biological process cannot be back-engineered from this type of data without taking into account a degree-weighted model, and hence the topological study of a degree-weighted network requires a more refined methodology to extract biological information about pathways, modules, or other inferred relationships among proteins. A priori knowledge of a protein's degree or connectivity is not available, however, algorithms to predict this [31, 32], as well as their interactions [32, 33, 34], are being developed. Whether application of these predictive algorithms on genomic scales yield degree-weighted networks remains to be seen, and may even serve as a test for the verity of the resultant network topologies.
Further insight into the degree-weighted nature of PPI networks may be obtained from analyses of the interacting protein pairs at more elementary levels. An avenue for this dissection has been to characterize the structural and functional domains present in each protein [35, 36] and subsequently identify consistent signatures, i.e., pairs of domains that are more likely to be involved in binding [37, 38]. In this way, domain-domain interaction (DDI) networks can be derived and then compared against PPI networks to see if they have similar topological properties such as degree-weighted behaviour. If, for example, degree-weighted behaviour is not observed in DDI networks, then one would anticipate consistent precepts for the allowed interactions, thereby allowing for alternative, and more insightful, analyses of PPI networks.
One utility of knowing that a network is degree-weighted is to use the probabilistic interpretation to find nodes that deviate from degree-weighted probability. Such nodes would represent a potential network that is biologically deterministic by its protein-protein interactions alone. For example, clusters of low-degree proteins might imply selective complex formation, and hubs found to be isolated from other high-degree proteins may represent important bottlenecks.
We thank the referees for valuable feedback which helped improve the paper. The authors were supported, in part, by the Military Operational Medicine research program of the U.S. Army Medical Research and Materiel Command, Ft. Detrick, Maryland. This effort was supported by the U.S. Army's Network Science initiative. The opinions and assertions contained herein are the private views of the authors and are not to be construed as official or as reflecting the views of the U.S. Army or the U.S. Department of Defense. This paper has been approved for public release with unlimited distribution.
- 4.Gavin A-C, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415 (6868): 141-147. 10.1038/415141aCrossRefPubMedGoogle Scholar
- 5.Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415 (6868): 180-183. 10.1038/415180aCrossRefPubMedGoogle Scholar
- 6.Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A, Bertone P, Lan N, Jansen R, Bidlingmaier S, Houfek T, Mitchell T, Miller P, Dean RA, Gerstein M, Snyder M: Global analysis of protein activities using proteome chips. Science. 2001, 293 (5537): 2101-2105. 10.1126/science.1062191CrossRefPubMedGoogle Scholar
- 8.Przulj N: Graph Theory Analysis of Protein-Protein Interactions. Knowledge Discovery in Proteomics. Edited by: Jurisica I, Wigle DA. 2005, CRC PressGoogle Scholar
- 10.Erdös P, Rényi A: On random graphs. Publ Math. 1959, 6: 290-297.Google Scholar
- 11.Erdös P, Rényi A: On the evolution of random graphs. Publ Math Inst Hung Acad Sci. 1960, 5: 17-61.Google Scholar
- 30.Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M: A map of the interactome network of the metazoan C. elegans. Science. 2004, 303 (5657): 540-543. 10.1126/science.1091403PubMedCentralCrossRefPubMedGoogle Scholar
- 34.von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P: STRING 7 – recent developments in the integration and prediction of protein interactions. Nucleic acids research. 2007, D358-362. 35 DatabaseGoogle Scholar
- 37.Sprinzak E, Altuvia Y, Margalit H: Characterization and prediction of protein-protein interactions within and between complexes. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103 (40): 14718-14723. 10.1073/pnas.0603352103PubMedCentralCrossRefPubMedGoogle Scholar