Discovery of Boolean metabolic networks: integer linear programming based approach
Abstract
Background
Traditional drug discovery methods focused on the efficacy of drugs rather than their toxicity. However, toxicity and/or lack of efficacy are produced when unintended targets are affected in metabolic networks. Thus, identification of biological targets which can be manipulated to produce the desired effect with minimum sideeffects has become an important and challenging topic. Efficient computational methods are required to identify the drug targets while incurring minimal sideeffects.
Results
In this paper, we propose a graphbased computational damage model that summarizes the impact of enzymes on compounds in metabolic networks. An efficient method based on Integer Linear Programming formalism is then developed to identify the optimal enzymecombination so as to minimize the sideeffects. The identified target enzymes for known successful drugs are then verified by comparing the results with those in the existing literature.
Conclusions
Sideeffects reduction plays a crucial role in the study of drug development. A graphbased computational damage model is proposed and the theoretical analysis states the captured problem is NPcompleteness. The proposed approaches can therefore contribute to the discovery of drug targets. Our developed software is available at “http://hkumath.hku.hk/~wkc/APBC2018metabolicnetwork.zip”.
Keywords
Metabolic network Integer linear programming Boolean modelBackground
One of the most important biological processes in organism is metabolism and its dysregulation can contribute to many human diseases. The manipulation of metabolism has been extensively studied in the field of metabolic engineering as many of metabolic process produce commodity and specialty chemicals. In particular, production of industrially valuable products using a microbial host with recombinant technology becomes one of the most successful applications of metabolic engineering [1, 2, 3]. Metabolic network is used to model the behavior of reactions and chemicals [4]. Authors in [5] have proposed a new mathematical model for large metabolic network and demonstrated that the outcome of network expansion is robust against of single or few reactions. Indeed, for the analysis of metabolic networks, many studies have been conducted so as to develop Boolean models. For example, Smart et al. [6] studied the effect of deletion of each enzyme in metabolic network of a Boolean model, and Lemke et al. [7] considered almost the same problem from the viewpoint of the Boolean aspect. Tamura et al. [8] developed an integer linear programming method for Boolean reaction cut (BRC) problem. Furthermore, in [9], the computational complexity of BRC was analyzed. In the Boolean model of metabolic networks, compounds and reactions can be represented by “OR” and “AND” nodes, respectively. Li et al. [10] have developed methods for finding a set of enzymes whose inhibition stops the production of the target compound with a minimum elimination of nontarget compounds. Furthermore, Takemoto et al. [11] and Lee et al. [12] estimated the size distribution of the deletion effect of enzymes using a branching process. Enumerating all the possible enzyme combination is infeasible since the number of combination increases exponentially with the number of enzymes. Lu et al. [13] developed an integer linear programming method for designing synthetic metabolic networks by considering minimum reaction insertion in a Boolean model. The proposed method can appropriately solve minimum reaction insertion in a Boolean network.
In pharmaceutics, the development of drugs mainly focus on target identification and lead inhibitor identification [14]. Furthermore, many researchers have done a lot of work on the efficacy of drugs regardless of their sideeffects (toxicity). However, toxicity or lack of efficacy may occur when the compounds which are not the intended targets in the metabolic network. Thus how to effectively reduce the sideeffect of drugs has become an important and challenging issue. The current works aim to identify the biological targets (could be enzyme or protein) for drugs which can be manipulated so as to produce the desired effect (i.e., curing a disease) with minimum disruptive sideeffects [15, 16]. Drug targets, in particular enzymes, are chosen to reduce abnormal metabolites by formulating an optimal combination problem in enzyme combination in metabolic networks [10, 17]. It is known that the function of enzymes is to catalyze reactions and produces metabolites (compounds) in metabolic networks. Human diseases, especially metabolic diseases, may be directly caused by the accumulation of certain compounds due to the malfunctions of enzymes. We denote such compounds as target compounds while the others as nontarget compounds. For example, the malfunction of enzyme phenylalanine hydroxylase results in accumulation of the amino acid phenylalanine which causes the disease phenylketonuria [18]. Therefore, there is a great need to identify a set of enzymes which are manipulated by drugs to prevent the buildup of target compounds with minimal damage. Here damage corresponds to the number of nontarget compounds whose production are forced to stop by the inhibition of that enzyme (or enzyme set).
Then the problem becomes to identify the optimal enzyme set whose inhibition eliminates the target compounds and at the same time, incurs minimum damage based on the given metabolic network and a set of target compounds. We denote such problem as enzyme combination identification (ECI). Sridhar et al. [19] developed a branchandbound algorithm to dynamically explore the search space. Furthermore, two filtering strategies are proposed to prune the search space to guarantee an optimal solution. However, their algorithm is complicated and impractical for us to use it. In this paper, we develop an efficient method based on Integer Linear Programming [20, 21, 22] which has a wide application in solving NPhard problems. Furthermore, all instances of the original problem are needed to convert into integer programming formalization so as to apply the existing free solver called CPLEX [23].
This paper has two main contributions. First, we formulate a new biological problem based on Boolean metabolic network to identify the drug target with minimum damage. Second, we propose an effective and efficient method, Integer Linear Programming, (needs O(m+n) variables) to solve the captured problem and it is easy to implement. By integrating the human metabolic networks, we have shown that the proposed approach can accurately identify the target enzyme set for known drugs in the literature. Furthermore, experiments have shown that our proposed method is extremely efficient which can effectively solve the problem in seconds.
The remainder of the paper is organized as follows. In “Problem formulation” section, we formulate the captured problem. “Methods” section presents the algorithm ILPECI. In “Results and discussions” section, we discuss the experimental results and give a theoretical illustration. Finally, conclusions are given in the last section.
Problem formulation

For each edge (u,v)∈A, either (u∈V_{ C })∧(v∈V_{ R }) or (u∈V_{ R })∧(v∈V_{ C }v∈V_{ E }) holds and “ ∧” represents “AND” function;

Each source node has no incoming edges;

For nodes v∉V_{ s } and v∈V_{ C } have at least one incoming edge.
The main problem Enzyme Combination Identification (ECI) in a Boolean network is first described with a simple example and then followed by its mathematical formalization.

Input: A metabolic network and a set of target compounds T(T∈C), C is the set of compounds.

Output: Find a set of enzymes X(X∈E) with minimum damage, whose inhibition stops the production of all the compounds in T.

(i) For each v∈V_{ s },v=1.

(ii) For each v∉V_{ s },v=1 if and only if there is u such that (u,v)∈E and u=1.

(iii) For each v∈V_{ r },v=1 if and only if u=1 holds for all u such that (u,v)∈E.
Condition (ii) implies that compound nodes correspond to “OR” nodes. Condition (iii) implies that reaction nodes correspond to “AND” nodes, which means that the output is forced to 0 when a node is inactivated.
From Fig. 1, we can see that inhibition of E_{1} results in the knock out of compounds C_{4},C7,C_{8} and C_{9} in addition to the target compound C_{5}. Then we denote the number of nontarget compounds knocked out as the damage, which is caused by the manipulating the enzyme set in the metabolic network. It can be seen that the damage of inhibiting E_{2} is 2 (i.e., C_{2} and C_{4}). Compound C_{7} is still producible because it can be produced by R_{3} even after the inhibition of E_{2}. The damage effect of inhibition of E_{1} is 4 (i.e., C_{4},C7,C_{8} and C_{9}). Both E_{1} and E_{2} are potential drug targets since they can achieve the effect of disrupting the target compound C_{5}. However, E_{2} is a better drug target than E_{1} owing to the fact that it causes less damage.
Methods
In this section, we introduce integer programmingbased methods for ECI. Integer programming, in particular, Integer Linear Programming (ILP) is set to minimize (or maximize) a linear objective function under linear constraints with all the variables taking integer values. In the following, each variable takes including the binary value (i.e., 0 or 1), representing the Boolean values. We apply ILP to ECI since ILP is widely used for solving NPhard problems.
We denote the above formalization as ILPECI. Here all variables including the value of reaction compound and enzyme nodes take either 1 or 0. Thus, \(v_{r_{i}}\) can be either 0 or 1, and \(v_{c_{i}}\) and \(v_{e_{i}}\) also take 0 or 1. In this example, \(v_{r_{i}}=0\) (resp. \(v_{r_{i}}=1\)) indicating that the value of reaction i takes 0 is represented by FRi=1 (resp. TRi=1) which implies that the reaction is inactivated (otherwise, the TRi=1 implies the reaction is activated). Therefore, TRi=0 (equivalent to FRi=1) means the corresponding value for true reaction takes 0, which implies the reaction is inactive. And FRi=0 (equivalent to TRi=1) indicates the corresponding value for false reaction takes 0, which implies the reaction is active. Thus, TRi+FRi=1 holds for any node i in the network. Similarly, TCi and FCi are used to represent the values of compound nodes. For instance, TC2=1 means that \(v_{c_{2}}=1\) and in other words, FC2=0 since TCi+FCi=1. Furthermore, \(v_{r_{i}}\) corresponds to “AND” node which implies that if \(v_{e_{i}}=0\) will inactivate \(v_{r_{i}}\).
Thus Eq. (3) is obtained. Similarly, Eqs. (4)(5) represent the constraints of \(v_{r_{2}}\) and \(v_{r_{3}}\), respectively.
For a compound node with indegree is 1 which indicates the node has only one incoming edge, the value of the predecessor is just copied. For instance, since \(v_{c_{2}}\) has only one predecessor \(v_{r_{1}}\), \(v_{c_{2}}\) is just copied from \(v_{r_{1}}\) as shown in Eq. (6). Similarly, \(v_{c_{4}}\) is just copied from \(v_{r_{2}}\) which is shown in Eq. (7).
Equation (6)  Eq. (12) represent the constraints of \(v_{c_{1}},\cdots,v_{c_{9}}\) respectively. Equation (12) means that \(v_{c_{1}}, v_{c_{3}}\) and \(v_{c_{6}}\) are 1 since their indegrees is 0.
Equation (13) means that “T” and “F” correspond to “true (1)” and “false (0)”, respectively, and complement each other. X in Eq. (13) means any component or reaction in the metabolic network.
The above formalization can clearly solve ECI and obtain the correct solution {E_{2}}. Besides, the number of variables is O(m+n) in the above formalization where m and n are the number of compounds and reactions, respectively.
It is noted that solving ILP is NPcomplete, however, a problem that can be formalized as ILP is not always NPcomplete. Thus in the following, we prove that ECI is NPcomplete.
Theorem
ECI is NPcomplete problem with the maximum indegree and outdegree being bounded by 2.
Proof
Results and discussions
In this section, we verify the biological validity of our proposed method by using known drugs. Besides, the performance of the ILPECI algorithm is evaluated by using the execution time which indicates the total time taken by the method. For the experimental data, we extracted the metabolic network information from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [24]. KEGG is database which provides known drug molecules along the the enzymes they inhibit and their therapeutic category. Then we use drugs at this database as our benchmarks and we report two of them due to the space limitation. And the value in parenthesis that starts with letter “C”, “D”, and “E” (e.g., E1.13.11.34) is the unique identifier assigned to the corresponding compound, drug, and enzyme respectively in KEGG.
Another experiment we conducted is the histidine metabolism network (hsa00340). The enzyme amine oxidase (E.1.4.3.4) appeared in hsa00340 can be inhibited by drug (Rasagiline (D02562)). According to our graph model, the removal of enzyme E.1.4.3.4 will result in eliminating two compounds Methylimidazoleacetic acid (C05828) and Methylimidazole acetaldehyde (C05827). It should be noted that the level of prosmethylimidazoleacetic acid is closely related to severity of Parkinson disease in patients [31, 32]. Running ILPECI with Methylimidazoleacetic acid and Methylimidazole acetaldehyde as the target compounds finds amine oxidase as the optimal enzyme. It takes only 3.26 s to run ILPECI. This experiment verifies that Rasgiline targets the optimal enzyme. An important advantage of our Boolean model is its capability of detecting the lack of substrates where the connectivitybased methods fail to handle this. Another advantage of this model is its capability of handling branches and cycles in a pathway from the source compound to the target compound. However, there are still have some limitations in this method. One of the major drawbacks is the assumption that all the compounds and enzymes are of equal importance. However, different compounds and enzymes may have different levels of importance in the metabolic networks. Our future work will focus on developing other models which include the weight of different nodes.
Conclusions
In this paper, we formulate the optimal enzyme combination identification (ECI) problem as an optimization problem in Boolean metabolic networks. We have proven that ECI in the Boolean model is NPcomplete and the target enzyme set is uniquely determined when the target compounds are given. Furthermore, considering that an exhaustive search cannot be used to solve ECI when the network is large, we developed an ILPbased algorithm for ECI. Considering that the computational time of IPbased method is exponential to the number of variables, to improve the scalability of the developed method, it is vital to reduce the number of variables appearing in IP formalization. And our proposed IPbased method needs O(m+n) variables.
The efficiency and effectiveness are validated by the computational experiments in which datasets were downloaded from the KEGG database. The results demonstrate that the proposed model can accurately identify the target enzymes for known successful drugs in the literature. Specifically, ILPECI has found a different enzyme set for the target compounds of Benoxaprofen which indicates that our method has a great potential to be better than Benoxaprofen. The reason is that the solution of our algorithm damages only one nontarget compound while Benxaprofen damages 4 nontarget compounds including the compound damaged by ILPECI’s solution. Besides, ILPECI has found that the same target enzyme like Rasagiline when its target compounds are given in advance. It is to be noted that our problem needs only O(m+n) variables in the IP formalization. The experiments also show that ILPECI can solve the problem in a short time which confirms the efficiency of our algorithm.
Notes
Acknowledgements
The authors would like to thank anonymous reviewers for your helpful and constructive comments.
Funding
This research is supported in part of Natural Science Foundation of SZU (Grant No. 2017058) and National Natural Science Foundation of China NSFC (Grant Nos. 91730301, 11626229 and 11671158). The publication costs are funded by National Natural Science Foundation of China NSFC (Grant Nos. 91730301).
Availability of data and materials
All the data sets are publicly available and can be accessed from the KEGG databases.
About this supplement
This article has been published as part of BMC Systems Biology Volume 12 Supplement 1, 2018: Selected articles from the 16th Asia Pacific Bioinformatics Conference (APBC 2018): systems biology. The full contents of the supplement are available online at https://bmcsystbiol.biomedcentral.com/articles/supplements/volume12supplement1.
Authors’ contributions
YQ designed the research. YQ and WKC proposed the methods and did theoretical analysis. YQ and HJ collected the data. YQ, HJ and XC conducted the experiments and analyze the results. YQ, HJ, WC and XC wrote the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
References
 1.Bro C, Regenberg B, Förster J, Nielsen J. In silico aided metabolic engineering of saccharomyces cerevisiae for improved biothanol production. Metab Eng. 2006; 8(2):102–11.CrossRefPubMedGoogle Scholar
 2.Lee SK, Chou H, Ham TS, Lee TS, Keasling JD. Metabolic engineering of microorganisms for biofuels production: from bugs to synthetic biology to fuels. Curr Opin Biotechnol. 2008; 19:556–63.CrossRefPubMedGoogle Scholar
 3.Alper H, Jin YS, Moxley JF, Stephanopoulos G. Identifying gene tarets for the metebolic engineering of lycopene biosynthesis in escherichia coli. Metab Eng. 2005; 7:155–64.CrossRefPubMedGoogle Scholar
 4.Soh KC, Hatzimanikatis V. Dreams of metabolism. Trends Biotechnol. 2010; 28:501–8.CrossRefPubMedGoogle Scholar
 5.Handorf T, Ebenhöh O, Heinrich R. Expanding metabolic networks: scopes of compounds, robustness, and evolution. J Mol Evol. 2005; 61:498–512.CrossRefPubMedGoogle Scholar
 6.Smart AG, Amaral LA, Ottino JM. Cascading failure and robustness in metabolic networks. Proc Natl Acad Sci. 2008; 105:13223–8.CrossRefPubMedPubMedCentralGoogle Scholar
 7.Lemke N, Herédia F, Barcellos CK, Dos Reis AN, Mombach JC. Essentiality and damage in metabolic networks. Bioinformatics. 2004; 20:115–9.CrossRefPubMedGoogle Scholar
 8.Tamura T, Takemoto K, Akutsu T. Finding minimum reation cuts of metabolic networks under a boolean model using integer programming and feedback vertex sets. Int J Knowl Discov Bioinformatics (IJKDB). 2010; 1:14–31.CrossRefGoogle Scholar
 9.Tamura T, Akutsu T. Exact algorithms for fidning a minimum reaction cut under a boolean model of metabolic networks. IEICE Trans Fundam Electron. 2010; 93:1497–1507.CrossRefGoogle Scholar
 10.Li Z, Wang RS, Zhang XS, Chen L. Detecting drug targets with minimum side effects in metabolic networks. IET Syst Biol. 2009; 3:523–33.CrossRefPubMedGoogle Scholar
 11.Takemoto K, Tamura T, Akutsu T. Theoretical estimation of metabolic network robustness against multiple reaction knockouts using branching process approximation. Physica A Stat Mech Appl. 2003; 392:5525–535.CrossRefGoogle Scholar
 12.Lee D, Goh KI, Kahng B. Branching process approach for boolean bipartite networks of metabolic reations. Phys Rev E. 2012; 86:027101.CrossRefGoogle Scholar
 13.Lu W, Tamura T, Song J, Akutsu T. Integer programmingbased method for designing synthetic metabolic networks by minimum reation insertion in a boolean model. Plos ONE. 2014; 9:92637.CrossRefGoogle Scholar
 14.Drews J. Drug discovery: A historical perpective. Science. 2000; 287:1960–4.CrossRefPubMedGoogle Scholar
 15.Smith C. Hitting the target. Nature. 2003; 422:341–7.CrossRefGoogle Scholar
 16.Takenaka T. Classical vs reverse pharmacology in drug discovery. BJU Int. 2001; 88:7–10.CrossRefPubMedGoogle Scholar
 17.Li Z, Wang RS, Zhang XS. Twostage flux balance analysis of metabolic networks for drug target identification. BMC Syst Biol. 2011; 5:11.CrossRefGoogle Scholar
 18.Surtees R, Blau N. The neurochemistry of phenylketonuria. Eur J Pediatr. 2000; 159:109–13.CrossRefGoogle Scholar
 19.Sridhar P, Song B, Kahveci T, Ranka S. Mining metabolic networks for optimal drug targets. Pac Symp Biocomput. 2008; 13:291–302.Google Scholar
 20.Schrijver A. Theory of Linear and Integer Programming. New York: John Wiley & Sons, Inc; 1986.Google Scholar
 21.Zhenping L, Zhang S, Wang Y, Zhang XS, Chen L. Alignment of molecular networks by integer quadratic programming. Bioinformatics. 2007; 23:1631–9.CrossRefPubMedGoogle Scholar
 22.Cheng X, Qiu Y, Hou W, Ching WK. Integer programmingbased method for observability of singleton attractors in boolean networks. IET Syst Biol. 2017; 11:30–5.CrossRefPubMedGoogle Scholar
 23.IBM 2010, IBM ILOG CPLEX Optimizer. http://www01.ibm.com/software/commerce/optimization/cplexoptimizer/index.html. Accessed 2 Mar 2017.
 24.Kanehisa M, Goto S. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28:27–30.CrossRefPubMedPubMedCentralGoogle Scholar
 25.Fogh K, Herlin T, Kragballe K. In vitro inhibition of leukotriene b4 formation by exogeneous 5lipoxygenase inhibitors is associated with enhanced generation of 15hydroxyeicosatetraenoic acid (15hete) by human neutrophils. Arch Dermatol Res. 1988; 280:430–6.CrossRefPubMedGoogle Scholar
 26.Lötzer K, Funk CD, Habenicht AJ. The 5lipoxygenase pathway in arterial wall biology and atherosclerosis. Biochim Biophys Acta. 2005; 1736:30–7.PubMedGoogle Scholar
 27.Zakharov S, Kotikova K, Nurieva O, Hlusicka J, Kacer P, Urban P, Vaneckova M, Seidl Z, Diblik P, Kuthan P, Navratil T, Pelclova D. Leukotrienemediated neuroinflammation, toxic brain damage, and neurodegeneration in acute methanol poisoning. lin Toxicol (Phila). 2017; 55:249–59.CrossRefGoogle Scholar
 28.Sweeney FJ, Eskra JD, Carty TJ. Development of a system for evaluating 5lipoxygenase inhibitors using human whole blood. Prostaglandins Leukot Med. 1987; 28:73–93.CrossRefPubMedGoogle Scholar
 29.Rao NL, Dunford PJ, Xue X, Jiang X, Lundeen KA, Coles F, Riley JP, Williams KN, Grice CA, Edwards JP, Karlsson L, Fourie AM. Antiinflammatory activity of a potent, selective leukotriene a4 hydrolase inhibitor in comparison with the 5lipoxygenase inhibitor zileuton. J Pharmacol Exp Ther. 2007; 321:1154–60.CrossRefPubMedGoogle Scholar
 30.TorresGalván MJ1, Ortega N, SánchezGarcía F, Blanco C, Carrillo T, Quiralte J. Ltc4synthase a444c polymorphism: lack of association with nsaidinduced isolated periorbital angioedema in a spanish population. Ann Allergy Asthma Immunol. 2001; 87:506–10.CrossRefPubMedGoogle Scholar
 31.Blandina P, Cherici G, Moroni F, Prell GD, Green JP. Release of glutamate from striatum of freely moving rats by prosmethylimidazoleacetic acid. J Neurochem. 1995; 64:788–93.CrossRefPubMedGoogle Scholar
 32.Prell GD, Khandelwal JK, Burns RS, Blandina P, Morrishow AM, Green JP. Levels of prosmethylimidazoleacetic acid: Correlation with severity of parkinson’s disease in csf of patients and with the depletion of striatal dopamine and its metabolites in mptptreated mice. J Neural Transm. 1991; 3:1435–63.Google Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.