Influence of metabolic network structure and function on enzyme evolution
- 12k Downloads
Most studies of molecular evolution are focused on individual genes and proteins. However, understanding the design principles and evolutionary properties of molecular networks requires a system-wide perspective. In the present work we connect molecular evolution on the gene level with system properties of a cellular metabolic network. In contrast to protein interaction networks, where several previous studies investigated the molecular evolution of proteins, metabolic networks have a relatively well-defined global function. The ability to consider fluxes in a metabolic network allows us to relate the functional role of each enzyme in a network to its rate of evolution.
Our results, based on the yeast metabolic network, demonstrate that important evolutionary processes, such as the fixation of single nucleotide mutations, gene duplications, and gene deletions, are influenced by the structure and function of the network. Specifically, central and highly connected enzymes evolve more slowly than less connected enzymes. Also, enzymes carrying high metabolic fluxes under natural biological conditions experience higher evolutionary constraints. Genes encoding enzymes with high connectivity and high metabolic flux have higher chances to retain duplicates in evolution. In contrast to protein interaction networks, highly connected enzymes are no more likely to be essential compared to less connected enzymes.
The presented analysis of evolutionary constraints, gene duplication, and essentiality demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes. Our results underscore the need for systems-based approaches in studies of molecular evolution.
KeywordsMetabolic Network Additional Data File Metabolic Flux Protein Interaction Network Evolutionary Constraint
Molecular networks and the genes encoding their building blocks represent two different levels of biological organization that interact in evolution. On the one hand, genetic changes such as point mutations, gene deletions, and gene duplications influence the structure and evolution of these networks. Conversely, network function may constrain the kinds of mutations that can be tolerated, and thus how genes evolve. Existing work on the structure and evolution of molecular networks has mainly focused on protein interaction networks [1, 2, 3, 4, 5, 6]. Such networks are very heterogeneous: they contain large macromolecular complexes, regulatory interactions, signaling interactions, and interactions of proteins that provide structural support for a cell. As a result, it is difficult to ascertain how network structure reflects network function. A large fraction of false positives and false negatives in protein interaction networks [7, 8] further complicates the structure to function analysis. In contrast, cellular metabolic networks are relatively well-characterized in several model organisms such as Saccharomyces cerevisiae [9, 10] and Escherichia coli . Their function - biosynthesis and energy production - is also well understood, as is the relationship of network structure to network function.
In the present study, we ask how the topology of a metabolic network and the metabolic fluxes (a metabolic flux is the rate at which a chemical reaction converts reactants into products) through reactions in the network influence the evolution of metabolic network genes through point mutations and gene duplication. Our results suggest that both network structure and function need to be understood to fully appreciate how metabolic networks constrain the evolution of their parts. The present study has become possible with the recent publication of a comprehensive compendium of metabolic reactions in the yeast Saccharomyces cerevisiae . This compendium comprises 1,175 metabolic reactions and 584 metabolites, and involves about 16% of all yeast genes.
Using the stoichiometric equations that describe chemical reactions, we calculate the connectivity of an enzyme as the number of other metabolic enzymes that produce or consume the enzyme's products or reactants (see Materials and methods and Additional data file 1). In other words, a metabolic enzyme A and a metabolic enzyme B are connected if they share the same metabolite as either a product or reactant. Highly connected enzymes in this representation are enzymes that share metabolites with many other enzymes. Including the most highly connected metabolites and cofactors such as ATP or hydrogen in a network representation would render the network structure dominated by these few nodes, and would obscure functional relationships between enzymes. We thus excluded the top 14 most highly connected metabolites: ATP, H, ADP, pyrophosphate, orthophosphate, CO2, NAD, glutamate, NADP, NADH, NADPH, AMP, NH3, and CoA . The results we report below are qualitatively insensitive to the exact number of removed metabolites.
Highly connected enzymes evolve slowly
Why do highly connected enzymes show greater evolutionary constraint (smaller Ka/Ks)? One possibility is that this correlation is primarily mediated by the corresponding gene expression level . Indeed, confirming previous observations , we found a significant negative correlation between the ratio Ka/Ks and mRNA expression levels (Spearman's rank correlation r = -0.33, P = 5.5 × 10-10; Pearson's correlation r = -0.30, P = 3.6 × 10-8). Information on mRNA expression of metabolic genes was obtained from the study by Holstege et al.  in which the number of mRNA molecules per cell was estimated based on microarray data. We also found a relatively weak correlation between connectivity and expression levels (Spearman's rank correlation r = 0.11, P = 4.6 × 10-2). Nevertheless, a partial correlation analysis - controlling for mRNA expression levels - between gene connectivity and evolutionary constraint Ka/Ks shows that enzymes in highly connected parts of the network evolve slowly independent of expression levels (Spearman's partial correlation r = -0.18, P = 1.4 × 10-3; the P value for Spearman's partial correlation was estimated by randomization).
Enzymes that carry large metabolic fluxes evolve slowly
How well a metabolic network supports cell growth can be computationally quantified through the apparatus of metabolic flux analysis . In flux balance analysis, the constraints imposed by stoichiometry and reversibility of chemical reactions are used to restrict the space of feasible metabolic fluxes. The constrained system can be subjected to an optimization procedure to obtain a flux distribution that maximizes some desirable metabolic property. Because cellular growth-rate is an important component of the fitness in a single-cell organism, biomass production is often used as the property being optimized. The predictions of flux balance analysis are often in good agreement with experimental results for E. coli [18, 19] and S. cerevisiae .
Correlation between enzymatic flux magnitude and evolutionary constraint Ka/Ks
Maximum uptake rates (mmol/gDW/h)
Spearman's rank correlation (P value) with zero fluxes
Spearman's rank correlation (P value) without zero fluxes
-0.28 (P = 3.8 × 10-3)
-0.25 (P = 3.6 × 10-6)
-0.31 (P = 1.7 × 10-3)
-0.22 (P = 5.7 × 10-5)
-0.26 (P = 9.3 × 10-3)
-0.21 (P = 1.2 × 10-4)
-0.27 (P = 6.4 × 10-3)
-0.20 (P = 2.5 × 10-4)
-0.25 (P = 1.3 × 10-2)
-0.20 (P = 1.8 × 10-6)
-0.08 (P = 0.45)
-0.21 (P = 9.2 × 10-5)
-0.010 (P = 0.39)
-0.19 (P = 3.7 × 10-4)
Gene duplication correlation with connectivity and flux
Connectivity, essentiality, and metabolic robustness
Evolutionary constraints on enzymes are indirect indicators of metabolic robustness to amino acid changes, changes that a metabolic network tolerated for well over millions of years of evolution. Another type of biological robustness is that against complete gene deletions. Robustness against gene deletions can be derived from laboratory studies in which the effects of gene deletions on growth rate and other indicators of fitness are studied [23, 24]. These studies determine essential genes, that is, genes whose elimination in one or more laboratory environments is effectively lethal. Our use of available essentiality data is motivated by the observation that highly connected proteins in protein interaction networks may be more likely to be essential to a cell . We carried out analyses using data on essential genes derived from a large scale gene deletion study by Giaever et al. , and used the Saccharomyces genome database (SGD)  to collect the essentiality data.
In sum, we demonstrate that both highly connected enzymes and enzymes that carry high metabolic fluxes in the yeast metabolic network have tolerated fewer amino acid substitutions in their evolutionary history. Why are enzymes carrying larger fluxes more constrained? The likely answer comes from the observation that most mutations affecting enzymatic activity may reduce rather than increase flux. Enzymes carrying high fluxes tend to have reaction products that enter a large number of metabolic pathways. Consequently, a mutational reduction in the activity of such enzymes should be more detrimental than a reduction in the activity of enzymes with lower flux.
We also show that the genes encoding enzymes with high flux have more duplicates. Importantly, we do not argue that duplications arise more frequently for genes whose products carry high flux, but that such duplications are more likely to be preserved in evolution, because of the advantage - higher flux - they provide. While a gene's duplicates can initially be preserved through an advantageous increase in metabolic flux, after divergence they may provide other functional benefits . Divergence of metabolic genes in their expression and regulation is well-established for gene in intensely studied parts of metabolism, such as tricarboxylic acid cycle enzymes .
We found that the association between predicted enzymatic flux and evolutionary rate is most pronounced for carbon sources that dominate the natural environment of yeast. This suggests that one can use the association between flux and evolutionary constraint to search for conditions that dominated the evolution of metabolic networks. Similar analyses, which use genomic data to infer the environment that has shaped an organism's evolution, have been used before to show that carbon limitation may have influenced the evolution of the E. coli metabolic network more strongly than nitrogen limitation , and to show that yeast evolution favored fermentation over respiration .
A previous study by Hahn et al.  reported that, based on amino acid divergence, in the E. coli metabolic network there exists no statistically significant association between enzyme connectivity and evolutionary constraint. We emphasize that any contradiction between this earlier work and our results is only apparent. First, the earlier study was based on a much smaller set of enzymes (n = 108 as opposed to n = 350 here), and thus had less statistical power. Nevertheless, two different statistical measures in the previous study showed, like we do here, a negative association between connectivity and evolutionary constraint, albeit not at P < 0.05. Second, because of the lack of sufficient sequence information for a closely related sister species of E. coli, the previous study used only amino acid divergence K a and not the preferable K a /Ks to indicate evolutionary constraint. In fact, the correlation between connectivity and Ka is very similar between the present study and the previous work (Spearman's rank correlation r = -0.13, P = 1.2 × 10-2 here versus Spearman's rank correlation r = -0.15, P = 7 × 10-2 in the study by Hahn et al.).
It should not be surprising that the observed associations are weak in magnitude. The reason for the low magnitude is that many other factors influence the evolution of enzyme-coding genes. Two of these factors are gene expression levels (discussed in the paper) and constraints stemming from the tertiary and quaternary structure of enzymes, which may differ among enzymes (little is known about such constraints). The key point is that besides all these other factors, metabolic network function and structure also has a clear influence on protein evolution.
How do our results on the yeast metabolic network relate to earlier work on protein interaction networks? There, a similar relationship between protein connectivity and evolutionary constraint has been suggested [4, 5]; however, this association exists for different reasons. Highly connected proteins in protein interaction networks may evolve slowly because a larger fraction of a highly connected protein's sequence is involved in protein interactions and may thus be evolutionarily constrained . In contrast, high protein connectivity in the metabolic network is established not through protein-protein interactions, but through consumption or production of widely used metabolites. In metabolic networks, mutations in enzyme-coding genes - changing reaction rates and concentrations - may have especially deleterious consequences for widely used metabolites. Consequently, highly connected metabolic enzymes may evolve slowly due to functional as opposed to structural constraints. Our ability to consider fluxes through enzymes in a metabolic network allows us to relate the functional role of each enzyme in a network to its rate of evolution. Such a functional analysis of a genome-scale network has no counterpart in any other genome-scale network studied thus far.
In conclusion, our analysis of evolutionary constraints, gene duplication, and essentiality demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes. In the long run, system analyses of biological networks will allow us to increasingly place the evolution of genes in the larger context in which they operate, as building blocks of cellular networks.
Materials and methods
We used a comprehensive collection of the yeast S. cerevisiae metabolic reactions by Foster et al.  to calculate metabolic enzyme connectivities. In addition to enzymatic reactions assigned to 671 open reading frames (ORFs), the collection contains reactions unassigned to known ORFs, transport reactions, and reactions represented by large macromolecular complexes. These reactions were used to calculate other enzyme connectivities but were excluded from the main analysis. Large macromolecular complexes (containing several ORFs) were represented by single enzymatic nodes in the calculation of connectivities for other metabolic enzymes. In order to include only functional relationships in the calculation of the enzyme connectivities, we excluded the 14 highly connected metabolites and co-factors (as described in the main text). As a result of the exclusion, a small fraction (5%) of network enzymes became disconnected from the network (they have zero connectivity). These enzymes were not included in the analysis.
Flux balance analysis
Flux balance analysis (FBA) was used to obtain metabolic flux distribution as described previously [10, 17, 19]. The network by Forster et al.  was used in all flux balance calculations. The in silico network of yeast metabolism includes central carbon metabolism, transmembrane transport reactions, pathways responsible for the synthesis and degradation of amino acids, nucleic acids, vitamins, cofactors, and lipids. In total, the network consists of 733 metabolites and 1,175 metabolic reactions. In the flux-balance analysis, the constraints limiting nutrient uptake, reaction irreversibility, and steady-state conservation of metabolite concentrations are applied. The fluxes optimal for growth are then obtained by maximization of biomass production using linear optimization. Linear optimization was performed using the GNU Linear Programming Kit .
We identified duplicates in the S. cerecisae genome using a previously described whole-genome analysis tool . Briefly, the tool locates gene duplicates in a genome using BLASTP  and aligns them globally with the Needleman and Wunsch dynamic programming alignment algorithm . Putative duplicate pairs with less than 40% amino acid similarity or less than 100 aligned amino acid residues were excluded; for the remaining pairs we calculated the number of substitutions per synonymous site (Ks) and the number of substitutions per non-synonymous site (Ka) using the maximum likelyhood models of Muse and Gaut  and Goldman and Yang .
The average Ka/Ks, Ka, and Ks values used in the analysis were obtained from the study by Kellis et al. . In a complementary approach, we also recalculated the average ratios using the maximum-likelihood method of Yang and Nielsen  and obtained qualitatively similar results.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a figure showing examples of metabolic connectivity. (a) An example of the metabolic reaction network from sphingoglycolipid metabolism; metabolites are drawn as small circles (DHSP, sphinganine 1-phosphate; PETHM, ethanolamine phosphate; SPH, sphinganine; CDPETN, CDPethanolamine; ETHM, ethanolamine) and enzyme-encoding genes are shown in rectangles. (b) Metabolic connectivity of the dpl1 gene (solid edges), as defined by the reactions shown in (a). The dpl1 gene has a total of six metabolic connections: two established through ethanolamine phosphate (red edges); and four through sphinganine 1-phosphate (blue edges). Metabolic connections between other enzymes are show by dashed edges. Additional data file 2 demonstrates the relationship between enzyme connectivity and the average amino acid divergence Ka. Spearman's rank correlation r = -0.13, P = 1.6 × 10-2. Additional data file 3 shows the relationship between enzyme connectivity and the average silent divergence Ks. Spearman's rank correlation r = -0.056, P = 0.30. Additional data file 4 is a histogram of the calculated metabolic fluxes in the yeast network for aerobic growth on glucose (maximal uptake rate for glucose 15.3 mmol/g dry weight/h; oxygen 0.2 mmol/g dry weight/h). Note the small number of fluxes - representing glycolysis - with disproportionately large magnitudes. Similar flux distributions were also obtained for other growth conditions. Additional data file 5 shows the correlation between non-zero enzymatic flux through a reaction and the number of duplicates of the respective enzyme's coding gene. Additional data file 6 provides connectivity and evolutionary parameters (Ka/Ks, Ka, Ks) for yeast metabolic enzymes.
We thank Dr Andrey Rzhetsky, Dr Uwe Sauer, and Dr Eugene Koonin for valuable discussions. We also thank two anonymous reviewers for several very helpful suggestions.
- 13.Li W-H: Molecular Evolution. 1997, Sunderland: Sinauer AssociatesGoogle Scholar
- 21.Strathern JN, Jones EW, Broach JR: The Molecular Biology of the Yeast Saccharomyces. Metabolism and Gene Expression. 1982, Cold Spring Harbor Press, NYGoogle Scholar
- 33.Makhorin A: GNU Linear Programming Kit. 2001, Boston: Free Software FoundationGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.