Time lagged information theoretic approaches to the reverse engineering of gene regulatory networks
 3.1k Downloads
 9 Citations
Abstract
Background
A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of timeseries microarray expression data in terms of the number of timepoints. In this paper, we study this problem by analyzing the behaviour of three algorithms based on information theory and dynamic Bayesian network (DBN) models. These algorithms were implemented on different sizes of data generated by synthetic networks. Experiments show that the inference accuracy of these algorithms reaches a saturation point after a specific data size brought about by a saturation in the pairwise mutual information (MI) metric; hence there is a theoretical limit on the inference accuracy of information theory based schemes that depends on the number of time points of microarray data used to infer GRNs. This illustrates the fact that MI might not be the best metric to use for GRN inference algorithms. To circumvent the limitations of the MI metric, we introduce a new method of computing time lags between any pair of genes and present the pairwise time lagged Mutual Information (TLMI) and time lagged Conditional Mutual Information (TLCMI) metrics. Next we use these new metrics to propose novel GRN inference schemes which provides higher inference accuracy based on the precision and recall parameters.
Results
It was observed that beyond a certain number of timepoints (i.e., a specific size) of microarray data, the performance of the algorithms measured in terms of the recalltoprecision ratio saturated due to the saturation in the calculated pairwise MI metric with increasing data size. The proposed algorithms were compared to existing approaches on four different biological networks. The resulting networks were evaluated based on the benchmark precision and recall metrics and the results favour our approach.
Conclusions
To alleviate the effects of data size on information theory based GRN inference algorithms, novel time lag based information theoretic approaches to infer gene regulatory networks have been proposed. The results show that the time lags of regulatory effects between any pair of genes play an important role in GRN inference schemes.
Keywords
Mutual Information Boolean Network Minimum Description Length Dynamic Bayesian Network Joint EntropyBackground
A GRN is a complex set of highly interconnected processes that govern the rate at which different genes in a cell are expressed in time, space, and amplitude. Such a network is commonly represented by many pairs of proteins and genes, in which one protein/gene regulates the abundance and/or activity of another protein/gene [1]. GRN’s can be modelled and simulated using various mathematical and computational approaches [2]. The modelling and simulation of GRN’s is performed over the cDNA microarray data. There are two types of DNA microarray data: time series and time independent (or steady state). The time series data are obtained by sampling temporally the measurement process, whereas timeindependent data sets are obtained by recording the gene expressions from independent sources, for example, different individuals, tissues, and experiments [3]. As time series data would enable one to capture the time varying nature of a GRN, it is the preferred form of data used in reverse engineering algorithms. Moreover, time series data only gives the expression levels of genes without any knowledge of other cellular elements like protein/metabolite concentrations. In this paper a GRN is represented as a graph which consists of a set of nodes that represent genes and a set of edges that represent the interactions between genes. Thus the GRN inference problem investigated in this paper refers to finding the regulatory relationship between the genes of an organism.
Reverse engineering of gene regulatory networks remains a major issue and area of interest in the field of bioinformatics and systems biology. A survey paper [4] discusses a number of models related to this area, viz. Bayesian Networks [6], Dynamic Bayesian Networks [7], Boolean Networks [8], Probabilistic Boolean Networks [9, 10], Differential Equation Models [11] and Information Theory Models [3, 12, 13, 14, 15]. There is no gold standard method to reverse engineer gene regulatory networks; each method has its own advantages and disadvantages. Based on simulations of different models, it has been observed that differential equation models and dynamic Bayesian networks provide higher accuracy, but they are computationally expensive and hence, are applicable for only a small data set. Boolean networks can be used to study the coarse grained properties of genetic networks [9]. Such binary representation of gene expression is clearly an approximation, as most biological phenomena manifest their properties in the continuous domain. Even though it is inherently deterministic, the Boolean formalism has enjoyed success in predicting biological behaviour, such as the accurate qualitative distinction between known tumor subclasses [9, 10]. This work suggests that meaningful biological information is not lost when measured, continuousdomain, gene expression data is made binary. Information Theoretic methods to reverseengineer GRNs build on such Boolean network models of gene expression and have gained popularity due to their simplicity and less computational cost [4]. Each of the information theoretic schemes discussed in this paper as well as DBNs however can be easily extended to handle multiple levels of quantization to achieve higher inference accuracy at the cost of computational overhead.
ARACNE [14] and REVEAL [15] are two popular Information Theoretic approaches towards GRN inference. Both of these methods establish relationships between genes based on the MI metric. Zhao [3] analyzed the limitations of MI and proposed the conditional mutual information (CMI) based approach to infer GRNs. One of the major disadvantages of Information Theoretic approaches is the selection of the MI and CMI thresholds, for which Zhao [12] proposed the Minimum Description Length (MDL) principle and showed its effectiveness in selecting the best MI threshold. The MDL principle states that if multiple theories exist, the one with the minimum description length is the optimal. However the definition of description length varies for different models and applications. In their MDL implementation, Zhao [12] defined the description length as the sum of the model length (expressed as the memory usage of the inference algorithm) and data length (expressed as the overall entropy of the inferred network). One limitation of the MDL principle was that the model length quantity in the description length expression could make the implementation arbitrary [16]. To circumvent this problem, we have earlier proposed the Predictive Minimum Description Length (PMDL) principle approach [17] wherein we showed that by removing the model length quantity from description length and using CMI a higher inference accuracy can be obtained.
MI and CMI metrics are central in establishing the relationships between genes in information theory models. Hence, in order to design a smart GRN inference algorithm, it is important to study the behaviour of these MI and CMI metrics on microarray data of various sizes. The MDL implementation of Zhao [12] will henceforth be referred to as “network MDL” in the rest of the paper.
Another major disadvantage in information theory based models is that MI and CMI do not give directions between relationships. A unit time delay was assumed in our earlier PMDL [17] implementation. Zou [18] showed that the time lags in regulating one gene by another play an important role in inference accuracy as evident in their Dynamic Bayesian Network based approach. To incorporate the effects of timelags in information theoretic methods, we propose a new time lag computation method in this paper, which is used to modify the standard MI and CMI computations. Based on our modified MI and CMI metrics, we next present novel timelagged GRN inference schemes that show promising results in terms of improving the inference accuracy.
 1.
We show that the performance of the inference algorithms saturate beyond a certain data size due to the saturation in the information theory metric mutual information. Note that we have only varied the data size in our experiments to understand the effects of regulatory timelags between genes on the algorithms. The overall performance of the algorithms would also be affected by other factors (e.g., the number of replicates and number of external chemicals used in the experiments to name a few), which might lead to other novel innovations that need to be considered in designing reverseengineering schemes. This is however outside the scope of this paper.
 2.
A new way of computing time lags between any pair of genes is presented. Our scheme makes sure that time lags cannot be negative and we argue that a more biologically pragmatic view is that a gene can affect another gene only when it is upregulated. This assumption makes more sense in the Boolean network formalism of GRNs where a gene can only be in two possible states: ON (i.e., upregulated) and OFF (i.e., downregulated).
 3.
We introduce the time lagged Mutual Information (TLMI) and time lagged Conditional Mutual Information (TLCMI) quantities.
 4.
We present novel GRN inference schemes based on TLMI, TLCMI, MDL and PMDL principles that provide higher accuracy over the existing information theoretic methods.
Results
In this section, we first report the results of the existing inference schemes that were run on the timeseries microarray data of varying size and illustrate that the performance of the methods saturates beyond a certain number of time points. We also report how the pairwise MI metric saturates beyond a certain data size. We then present our new time lag computation scheme and the modified version of the network MDL algorithm wherein, we replace the MI metric which considers unit time delay with the TLMI metric (considering a timelag of τ). We next present a modified version of the PMDL algorithm, by replacing the MI and CMI metrics with the TLMI and TLCMI metrics. Finally the results from the network MDL, PMDL and modified network MDL and PMDL algorithms are compared.
Parameters to evaluate inference accuracy
Benchmark measures recall R and precision P are used to evaluate the performance of the algorithms. Although different definitions for recall and precision exist in the literature [20], in this paper, R is defined as C_{e}/(C_{e}+M_{e}) and P is defined as C_{e}/(C_{e}+F_{e}), where C_{e} denotes the edges that exist in the true network and in the inferred network, M_{e} are the edges that exist in the true network but not in the inferred network, and F_{e} are edges that do not exist in the true network but do exist in the inferred network.
Synthetic data generation methodology for the in silico experiments
The performance of information theory and DBN based algorithms over different data size was carried out over random synthetic networks which were generated by the Genenetweaver tool [21, 22, 23].
It was imperative for us to use synthetic data over time series microarray experimental data in this phase due to the following reasons:

Very few experimental data sets have equal time intervals between experiments and also the data size is generally limited to around 20 time points. In our in silico runs, we wanted to keep equal time intervals between each time point data such that we can understand the true effects of regulatory timelags between genes on the inference accuracy. It is generally not possible to assign a single timelag value to a genepair if the expression readings for each time point were not evenly spaced as mentioned in Zou [18].

Also, the saturation in inference accuracy generally requires a larger data size (> 30 time points as shown later) and it would have been difficult to identify the role of MI in bringing about this theoretical limit on the accuracy of information theoretic schemes with a smaller biological data set (of ~20 time points).

It should be noted that the Genenetweaver software derives the in silico GRNs from the prior knowledge database of yeast (Saccharomyces cerevisiae) which contains 4441 genes and 12873 interactions. Thus in order to create a sample GRN with 10 nodes, Genenetweaver clusters the yeast transcriptomic network into modules and chooses the module having number of genes closest to the given input (in this case 10 genes) to create the in silico network. Each such module maps to a particular biological function and this strategy essentially ensures that there is minimum crosstalk of these set of genes with the others in the yeast network resulting in a higher efficacy of the inference algorithms that use them.
Biological network data generation methodology to evaluate performance of proposed algorithms
The time series DNA microarray data from Spellman et al [25] was used to infer gene regulatory networks using the proposed algorithms. The Spellman experiment was chosen because it provides a comprehensive series of gene expression datasets for the Yeast cell cycle. Four time series expression datasets were generated using four different cell synchronization methods: cdc15, cdc28, alphafactor and elutriation with 24, 17, 18 and 14 time points respectively. The alphafactor dataset contained more time points than cdc28 and Elutriation datasets with fewer missing values than cdc15. Therefore, we chose to use time series expression data from the alphafactor method to infer the gene regulatory networks.
We used the same preprocessing steps as in [12]. Initially the data is quantized to 0 or 1. In order to quantize the expression values of every gene, they are sorted in ascending order and the first and last values of the sorted list are discarded as outliers; then the upper 50% is converted to 1 and the lower 50% is converted to 0. Any missing time points are set to the mean of their respective neighbors. If the missing time point is the first or the last one, it is set to the nearest time point value.
Four separate biological networks (as discussed later) used for comparison purposes were derived from the yeast cell cycle pathway [26, 27, 28]. The fine tuning parameter required by the network MDL based algorithms is set to 0.1 to retain most of the connections (see [12] for more details on this).
Effects of data size on GRN inference
Effects on Information Theory models
For the PMDL algorithm, it was observed that the precision increased until 55 time points, and, beyond that, the precision remains relatively stable for the two smaller tested networks (with 20 and 30 genes respectively). For the larger network (with 40 genes), the precision increased until 70 time points before saturation. The recall for PMDL algorithm increased until 40 time points before saturation for each of the 3 tested networks. For the network MDL algorithm, it was observed that precision increased until 35 time points and fluctuated after that. The recall for the network MDL algorithm kept increasing for all the test cases with considerable fluctuations.
Performance saturation points
MethodNo. of Genes  PMDL  Network MDL 

20  55  30, 60 
30  40  30, 60 
40  45  30, 45, 70 
Effects on DBN based scheme
Why do information theory based models saturate?
The plots conclude that the saturation in the methods was due to the saturation in the mutual information quantity which goes close to zero even though the entropy increases in the network. This would conceptually mean that there is room to improve on the inference accuracy (due to high entropy), yet the mutual information metric will not be able to point us to the right direction. Other information theoretic algorithms, like REVEAL [15] use the ratio of MI and entropy to infer the network for this purpose which supposedly gives good performance. However, from the entropy and mutual information curves in Figure 5, we can see that the ratio of mutual information and entropy will also saturate, as the entropy increases in the network, and hence this ratio might also not be the right metric to achieve better accuracy by making use of more time point’s data. The recently proposed Directed Mutual Information metric [24] might be a better metric than the conventional MI based algorithms. We do plan to conduct similar studies on the performance of GRN inference algorithms based on these different metrics as a function of the number of time points in the future. It is imperative to identify the right metric for the research community to decide which class of GRN inference algorithms can work best with timeseries data and also understand the ideal data size for them.
The saturation in MI due to increasing number of time points would suggest that the MI should not be computed for the entire range of time points of microarray data available from the experiments. GRNs are inherently time varying, and hence the pairwise MI between any 2 genes needs to be computed over the time range where the first gene will have substantial regulatory effect on the other one. This can be best approximated by estimating the regulatory timelags between each gene pair, and subsequently computing the MI between them for this particular time range. This concept was used to compute the timelags between genes and the TLMI and TLCMI metrics as discussed in the Methods section. Note that, the time lag computation concept initially proposed in [18] to implement timelagged DBN needs to change to avoid the case of negative timelags.
TLMI based network MDL implementation
TLMI and TLCMI based PMDL Implementation
Performance of MI and TLMI based PMDL
Method Metric  PMDL  Time Lagged PMDL 

Precision  22.73%  19.23% 
Recall  22.73%  18.52% 
Performance: Time and Space complexities of proposed algorithms
Step 4 of the PMDL algorithm iterates n^{2}(m − τ) times where n is number of genes, m is the number of time points, and τ is the time lag. From line 5 to line 18 the algorithm iterates n^{4} times, lines 15 and 16 of the algorithm iterates n^{3}(m − τ) times. Finally from lines 20 to 31 the algorithm iterates n^{3} times. Thus the time complexity of the overall algorithm is Θ(n^{4} + n^{3}(m − τ)).
From the time complexity it can be seen that if the number of genes is larger than the number of time points then the run time of PMDL algorithm depends on the number of genes. And if the number of time points is larger than the number of genes then the run time depends on the number of time points. In the worst case is zero for all genes and the algorithm runs in Θ(n^{4} + n^{3}m) time.
 1.
Restrict the number of parents.
 2.
Take the next smallest description length, instead of using the smallest one.
The first approach will guarantee results when the number of parents is restricted to a small value but this may lower the accuracy of the result. The second approach may take more time to run but as we are not restricting the number of parents the accuracy of the algorithm is not affected. Some bench marking studies are required on the above two approaches to see which one works best.
In the MDL based implementation we discard the lines 20 to 31 from the PMDL based implementation. The worst case time complexity is again Θ(n^{4} + n^{3}m).
Conclusions
In this paper, we have studied the effects of cDNA microarray data size on three algorithms: PMDL, network MDL, and a DBN based approach. The study shows that the data size plays an important role in the inference accuracy of each of these algorithms. The experiments were carried out on synthetically generated timeseries data and the performance saturation points were listed for these algorithms. The immediate benefit of this work lies in helping biologists to devise cDNA microarray experiments intelligently depending on the class of GRN inference algorithms they are likely to use to achieve maximum accuracy. In a bid to understand the performance saturation of the information theoretic approaches, we also found out that mutual information saturates and effectively tends to zero as the entropy in the network increases with increase in the data size. These observations lead us to believe that MI by itself might not be the best metric in devising information theoretic approaches for GRN inference. The DBN approach however showed good performance only for a smaller data size which is nonintuitive and requires further analysis for validation. Based on these findings, we introduced two new information theory metrics viz. TLMI and TLCMI and used them in the network MDL and PMDL based algorithms to develop two novel GRN inference algorithms. The results indicate that transcriptional time lags play an important role in gene regulatory network inference methods as evidenced by the higher accuracy provided by our algorithm.
Methods
Time Lags
The concept of time lags was first introduced by Zou [18], where they proposed that the time difference between the initial expression change of a potential regulator (parent) and its target gene represents a biologically relevant time period. Here potential regulators are those set of genes whose initial expression change happened before the target gene. Also initial expression change is up or downregulation (ON or OFF) of genes.
In biological networks the A↔ B schema is quiet common. Hence Zou’s time lag computation scheme needs to change to handle such cases. We also argue that a gene can affect another gene only when it is upregulated (ON). Based on the above discussion, we propose time lags as the difference between initial upregulation of first gene and initial expression change of the second gene after the upregulation of first gene. This solves the issue of negative time lags besides being biologically more relevant as compared to the existing method of calculating time lags. Figure 10 also illustrates the proposed time lag computation based on our approach. In the figure, Ua and Ub indicate the initial upregulation of genes A and B at time points two and three respectively. Ca and Cb indicate that time points six and three are the time points where the expression values of genes A and B changed after the initial upregulation of genes B and A. Time lag between A and B is calculated as τ1 = Cb  Ua and time lag between B and A is calculated as τ2 = Ca  Ub respectively. In this example time lag between X and Y is one and time lag between Y and X is three.
Information Theoretic metrics
Entropy, joint entropy, mutual information and conditional mutual information
Entropy, H, is the measure of the average uncertainty in a random variable [19]. If p_{ i } is the probability of observing a particular symbol in a sequence then entropy is given as Open image in new window
As the proposed algorithm quantizes the microarray data to two levels, a gene takes two values a 0 and 1 corresponding to being in OFF and ON states respectively. In this case, the entropy of a gene A is defined as Open image in new window , where p_{ 0 } and p_{ 1 } are the probabilities of observing a gene A as 0 and 1 respectively over the sequence A [15]. Here sequence A contains the values taken by a gene in the time series data; thus if we have a time series data of m time points then sequence A is of length m.
This standard definition of entropy has been used in the MDL and PMDL schemes where the entropy was computed for sequence length of m1 to simulate a default timelag of 1, i.e., compute the entropy between A (considering its expression values from 1, 2,..., m1) and B (considering its expression values from 2, 3,..., m). In order to implement the TLMI and TLCMI metrics, we need to compute the entropies between A (considering its expression values from 1, 2,..., mτ) and B (considering its expression values from τ+1, τ+2,..., m).
Joint Entropy between two sequences A and B , H(A, B) is defined as Open image in new window
Thus joint entropy between two variables is an extension of entropy where the two sequences (A, B) are considered to be a single vector valued random variable dependent on each other [19].
As stated before, the proposed algorithm quantizes the microarray data into two levels, in this case the joint entropy between two sequences A and B H(A, B) is defined as Open image in new window where p_{0,0}, p_{0,1}, p_{1,1} and p_{1,1} are the probabilities of observing both zeros, a zero and a one, a one and a zero and both ones in sequences A and B respectively.
Mutual Information measures the amount of information that can be obtained about one random variable by observing another one [19].
MI in terms of entropies is defined as Open image in new window [19] and the classical MI metric is symmetric i.e. Open image in new window
Time lagged mutual information (TLMI) and time lagged conditional mutual information (TLCMI)
After implementing time lags, we no longer compute the entropy and joint entropy over the complete sequences of gene(s) as discussed before. If a time lag τ is computed between two genes A and B we remove the last τ symbols of sequence A and first τ symbols of sequence B to obtain reduced sequences (of length mτ each) for A and B respectively. Computing the MI over these reduced sequences gives TLMI.
TLMI is not a symmetric quantity though, i.e. Open image in new window .
Considering a time lag, τ, between A and B, we compute TLCMI(A;BC) by deleting the last τ symbols in sequences A and C (i.e., look into the sequences from time points 1, 2,..., mτ) and first τ symbols in sequence of B (i.e., look into the sequences from time points τ+1, τ+2,..., m) and computing the CMI of these reduced sequences to obtain TLCMI.
Minimum Description Length (MDL) and Predictive Minimum Description Length (PMDL) Principle
Estimating the MI threshold is one of the major drawbacks in information theory based model. Zhao . [12] first proposed the MDL principle to solve the problem. Based on the microarray data, the MI matrix is computed. If the microarray data has n genes then two n*n matrices viz. one connectivity matrix and one MI matrix are stored. A time lag of one unit is assumed, thus the MI computations are not symmetric. Using every MI value as a threshold over the MI matrix, n^{2} models are obtained. For every model, the description lengths (model length + data length) are computed and the model with the minimum description length is selected as the best model. This algorithm involved a fine tuning parameter for the model length algorithm which also makes the MDL principle method arbitrary [16]. To overcome this issue we earlier proposed a PMDL based inference algorithm [17]. In the PMDL principle, we discard the model length while computing the data length of each model. The results increased both true edges and false edges in inferred networks. In order to reduce the false edges, CMI was applied over the best model and hence the overall PMDL based approach improved the inference accuracy over the conventional network MDL scheme [17].
Time lagged based MDL and PMDL implementation
The basic information theoretic metrics with unit time lag were replaced in the network MDL and PMDL algorithms with the TLMI and TLCMI metrics. Figure 9 illustrates the existing and proposed algorithms. Figure 9A shows the MI and TLMI based PMDL algorithm. While both the existing and our proposed algorithms are shown in the same figure, it is to be noted that the difference lies in using the right information theoretic metrics (lines 4, 19 and 24 in Figure 9A). Figure 9B shows the MI and TLMI based network MDL algorithms. Further explanations on the PMDL and network MDL algorithms can be found in [17] and [12] respectively.
Notes
Acknowledgements
This work was supported by NSF through contract EPS0903787 awarded to CZ and PG1, and the US Army Corps of Engineers Environmental Quality Program under contract # W912HZ0820011. Permission was granted by the Chief of Engineers to publish this information.
This article has been published as part of BMC Bioinformatics Volume 11 Supplement 6, 2010: Proceedings of the Seventh Annual MCBIOS Conference. Bioinformatics: Systems, Biology, Informatics and Computation. The full contents of the supplement are available online at http://www.biomedcentral.com/14712105/11?issue=S6.
References
 1.Björn JunkerH, Schreiber F: Analysis of biological networks. WileyInterscience; 2008.Google Scholar
 2.Chen L, Wang RS, Zhang XS: Biomolecular Networks: Methods and Applications in Systems Biology. John Wiley and Sons; 2009.CrossRefGoogle Scholar
 3.Zhao W, Serpedin E, Dougherty ER: Inferring connectivity of genetic regulatory networks using informationtheoretic criteria. IEEE, Transactions on Computational Biology and Bioinformatics 2008, 5(2):262–274. 10.1109/TCBB.2007.1067CrossRefPubMedGoogle Scholar
 4.Hecker M, Lambeck S, Toepfer S, Someren EV, Guthke R: Gene regulatory network inference: Data integration in dynamic models  A review. Bio Systems 2009, 96(1):86–103. 10.1016/j.biosystems.2008.12.004CrossRefPubMedGoogle Scholar
 5.Heckerman D, Geiger D, Chickering DM: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 1995, 20: 197–243.Google Scholar
 6.Akutsu T, Miyano S, Kuhara S: Algorithms for inferring qualitative models of biological networks. Pacific Symposium on Biocomputing 2000, 4: 17–28.Google Scholar
 7.Murphy K, Mian S: Modelling gene expression data using dynamic Bayesian networks. In Technical report, Computer Science Division University of California, Berkeley, CA 1999.Google Scholar
 8.Kauffman SA: Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 1969, 22: 437–467. 10.1016/00225193(69)900150CrossRefPubMedGoogle Scholar
 9.Schmulevich I, Dougherty ER, Kim S, Zhang W: Probabilistic Boolean Networks: A rulebased uncertainty model for gene regulatory networks. Bioinformatics 2002, 18(2):261–274. 10.1093/bioinformatics/18.2.261CrossRefGoogle Scholar
 10.Shmulevich I, Dougherty ER, Zhang W: From boolean to probabilistic boolean networks as models of genetic regulatory networks. Proceedings of the IEEE 2002, 90(11):1778–1792. 10.1109/JPROC.2002.804686CrossRefGoogle Scholar
 11.Chen T, He HL, Church GM: Modeling gene expression with differential equations. Pacific Symposium Biocomputing 1999, 4: 29–40.Google Scholar
 12.Zhao W, Serpedin E, Dougherty ER: Inferring gene regulatory networks from time series data using the minimum description length principle. Bioinformatics 2006, 22(17):2129–2135. 10.1093/bioinformatics/btl364CrossRefPubMedGoogle Scholar
 13.Dougherty J, Tabus I, Astola J: Inference of gene regulatory networks based on a universal minimum description length. EURASIP Journal on Bioinformatics and Systems Biology 2008. Article ID: 482090, 11 pages Article ID: 482090, 11 pagesGoogle Scholar
 14.Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A: ARACNE: An algorithm for reconstruction of genetic networks in a mammalian cellular context. BMC Bioinformatics 2006, 7: S7. 10.1186/147121057S1S7PubMedCentralCrossRefPubMedGoogle Scholar
 15.Shoudan L: REVEAL: A general reverse engineering algorithm for inference of genetic network architectures. Pacific Symposium on Biocomputing 1998, 3: 18–29.Google Scholar
 16.Hansen MH, Yu B: Model Selection and the Principle of Minimum Description Length. Journal of the American Statistical Association 2001, 96(454):746–774. 10.1198/016214501753168398CrossRefGoogle Scholar
 17.Chaitankar V, Zhang C, Ghosh P, Perkins EJ, Gong P, Deng Y: Gene Regulatory Network Inference Using Predictive Minimum Description Length Principle and Conditional Mutual Information. Proceedings of International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing 2009, 487–490. full_textGoogle Scholar
 18.Zou M, Conzen SD: A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 2005, 21(1):71–79. 10.1093/bioinformatics/bth463CrossRefPubMedGoogle Scholar
 19.Cover TM, Thomas JA: Elements of information theory. WileyInterscience, New York 1991.Google Scholar
 20.Zhang X, Baral C, Kim S: An Algorithm to Learn Causal Relations Between Genes from Steady State Data: Simulation and Its Application to Melanoma Dataset. Proceedings of 10th Conference on Artificial Intelligence in Medicine 2005, 524–534.CrossRefGoogle Scholar
 21.Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G: Revealing strengths and weaknesses of methods for gene network inference. PNAS, in press.Google Scholar
 22.Prill RJ, Marbach D, SaezRodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, AltanBonnet G, Stolovitzky G: Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 2010, 5(2):e9202. 10.1371/journal.pone.0009202PubMedCentralCrossRefPubMedGoogle Scholar
 23.Marbach D, Schaffter T, Mattiussi C, Floreano D: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology 2009, 16(2):229–239. 10.1089/cmb.2008.09TTCrossRefPubMedGoogle Scholar
 24.Rao A, Hero AO 3rd, States DJ, Engel JD: Inference of Biologically Relevant Gene Influence Networks Using the Directed Information Criterion. ICASSP Proceedings 2006., 2(IIII):Google Scholar
 25.Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive Identification of Cell Cycleregulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization. Molecular Biology of the Cell 1998, 9: 3273–3297.PubMedCentralCrossRefPubMedGoogle Scholar
 26.Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res 2008, 36: D480D484. 10.1093/nar/gkm882PubMedCentralCrossRefPubMedGoogle Scholar
 27.Kanehisa M, Goto S, Hattori M, AokiKinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34: D354–357. 10.1093/nar/gkj102PubMedCentralCrossRefPubMedGoogle Scholar
 28.Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 2000, 28: 27–30. 10.1093/nar/28.1.27PubMedCentralCrossRefPubMedGoogle Scholar
Copyright information
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.