Robust inference of the context specific structure and temporal dynamics of gene regulatory network
- 2.6k Downloads
Response of cells to changing endogenous or exogenous conditions is governed by intricate molecular interactions, or regulatory networks. To lead to appropriate responses, regulatory network should be 1) context-specific, i.e., its constituents and topology depend on the phonotypical and experimental context including tissue types and cell conditions, such as damage, stress, macroenvironments of cell, etc. and 2) time varying, i.e., network elements and their regulatory roles change actively over time to control the endogenous cell states e.g. different stages in a cell cycle.
A novel network model PathRNet and a reconstruction approach PATTERN are proposed for reconstructing the context specific time varying regulatory networks by integrating microarray gene expression profiles and existing knowledge of pathways and transcription factors. The nodes of the PathRNet are Transcription Factors (TFs) and pathways, and edges represent the regulation between pathways and TFs. The reconstructed PathRNet for Kaposi's sarcoma-associated herpesvirus infection of human endothelial cells reveals the complicated dynamics of the underlying regulatory mechanisms that govern this intricate process. All the related materials including source code are available at http://compgenomics.utsa.edu/tvnet.html.
The proposed PathRNet provides a system level landscape of the dynamics of gene regulatory circuitry. The inference approach PATTERN enables robust reconstruction of the temporal dynamics of pathway-centric regulatory networks. The proposed approach for the first time provides a dynamic perspective of pathway, TF regulations, and their interaction related to specific endogenous and exogenous conditions.
KeywordsRegulatory Network Expression Fold Change Functional Node Exogenous Condition KSHV Infection
Gene and protein regulatory networks depict the regulatory circuitry that controls the complicated biological processes. The regulatory networks enable visualization and identification of protein and gene functions that cannot otherwise be revealed by a gene- or protein-centric approach. Through revealing various system level interactions including transcription regulation, signal transductions, and metabolic reactions, regulatory networks provide insights into the molecular machinery of cell. The systems perspective also reveals global regulatory properties of cell including functional modularity and robustness to perturbation. Regulatory networks are also at the heart of personalized medicine, where they serve to aid disease diagnosis and prognosis, drug target discovery, intervention, and prediction of pharmacological effects .
Reconstructing regulatory networks comprises one of the most active areas of research in computational biology. The goal of network construction is to elucidate the gene/protein interaction relationships, in an effort to obtain more realistic and predictive models of molecular regulation. This prompts us to study the complex patterns of regulatory network that depend on temporal and condition-specific context.
Regulatory networks are specific to exogenous context
Since gene or protein in the network can acquire different functions under different exogenous conditions or phenotypic contexts, the network constituents and interactions would be context dependent, i.e., regulatory network is composed of different genes and proteins under different conditions and their interactions are specifically determined by the context. For instance, the oncogenes such as MYC, ERK, and WNT are known to down-regulated in normal cells. Up-regulation of oncogenes leads to a series of modification in signal transduction and transcription regulation, which interfere with the controlled cell death and eventually turn the normal cell into cancer cell. Therefore, the revelation of context-specific differences of regulatory network in cancer is essential for understanding the process and mechanism of cancer. In fact, the context-dependent regulation accounts for the heterogeneity of diverse cell phenotypes.
Regulatory networks are time varying according to the endogenous cell states
While the long run behaviour of regulatory networks is often stationary, the transient characteristic is intrinsically time varying for they need to respond and adapt themselves to the new exogenous condition. Temporal dynamics of regulatory networks is highly prevalent to many important processes such as cell cycle, cell differentiation, apoptosis, and viral infection to account for different endogenous cellular states. Even under a fixed exogenous condition, the regulation of genes and protein are carried out in a temporal, sequential manner. The cell cycle is known to have multiple stages and transitions in between and are triggered and guided by temporal changes of regulatory networks. Some of the transcription factors such as Fos are known to be early genes because they regulate commonly at the early stage of a biological event. It is also well recognized that the regulatory pathway does not always persist over all times. Such temporal cascade of regulation is essential for elucidating the mechanism leading to different phenotypes. The two key characteristics of time-varying networks include 1) the temporal topology of regulatory networks, i.e., the network constituents and wiring could change with time, and 2) time-varying regulatory impact. It is important and necessary to understand both aspects of temporal regulation when reconstructing the time varying networks.
Existing network models and approaches are insufficient for inferring context-specific time-varying networks.
Reconstruction of regulatory networks based on high throughput data is currently a research focus in systems biology. In the past, a large number of different network architectures have been proposed. One of the most popular architectures is the molecular-based network architecture, where nodes are individual genes or proteins and links denote gene-gene, gene-protein, or protein-gene regulations. This architecture aims to reveal the detailed genomic interaction and is capable of uncovering transcription regulation, signalling, and metabolic pathways. However, the quality of the inferred molecular-based network is greatly restricted by the limitation of data availability and data quality; in practice, precise inference of large scale networks at a molecular level is very difficult and not robust due to the large number of the unknowns and limited and noisy genomics or proteomics data. The reconstructed networks normally have extremely high false positive links, making the subsequent analysis and interpretation at a molecular level difficult and unreliable. A common remedy to this problem is to take a global view of the networks and study the features such as scale free and modularity. Although this approach has been successful in revealing global properties of molecular networks, it is still unable to identify pathways as intended by this architecture. An alternative and equally popular architecture is the module networks . Since nodes in a module network are clusters of genes, or modules, the inference task for module networks is less demanding and higher quality inference can be achieved. However, module networks cannot directly uncover functional pathways even with high quality data and thus are often difficult to interpret biologically. Post-processing based on gene ontology (GO) or pathway enrichment is required to reveal the possible functions of each modules. However, the module networks cannot reveal beyond the prediction that a set of functions of a modular would regulate another set in another module. This prediction is useful for pointing out the possible further studies to follow; it is nevertheless far from sufficient to uncover important pathways and transcription regulations. Apparently, a new architecture that is robust to data noise and capable of revealing directly pathways will be much more appealing.
Given a network structure, a mathematical or statistical model needs to be imposed to model the dynamics of regulation. Many models have been proposed for this purpose including ordinary differential equations , (probabilistic) Boolean networks , , information theory based models . However, the principled assumption underlying these models is overwhelmingly stationary; that is the dynamic models do not change over time. While the long run behaviour of regulatory networks is reasonably considered stationary, the transient characteristic is intrinsically time varying for they need to respond and adapt themselves to meet the need of varying conditions. As a result, the inferred networks based on stationary assumption conceal the intricate temporal regulatory patterns. Despite the prevalent temporal nature of regulatory, limited work exists to model and infer the time varying network and they include regime state space model , temporal hidden Markov model , the wavelet dynamic vector autoregressive method , and context-sensitive probabilistic Boolean networks . These models all involve a high degree of parametric complexity that hinders robust reconstruction of large scale networks. For instance, context-sensitive probabilistic Boolean networks in  and regime state space model  involved only 10 and 47 genes, respectively. Alternatively, module networks were assumed in [7, 8, 9] to increase the inference robustness but nevertheless reduce the interpretability of result. Some remarkable results about systems level reconstruction of both static and transient transcription regulatory networks of yeast for different cell states were reported in  .The networks inference utilized a large collection of prior knowledge and a simple model of differential expression in microarray. However, specific time regimes for cell states of interest need to be given; the approach cannot further uncover by itself endogenous temporal patterns within the given regime. In addition, the adopted gene networks architecture is sensitive to data noise and prone to producing false positive links.
Aside from modelling network dynamics, investigation of the context-specific networks is given increasing attention, although the main body of existing work still concerns the reconstruction of context-independent networks. To this end,  explicitly introduced a context node in the Bayesian network model to model different context condition. GO annotations were applied in  to the protein-protein interaction (PPI) network of yeast to dissect the context (function)-specific subÂ¬networks. Again, in , large scale transcription networks under different exogenous conditions were constructed for yeast. Despite some success in identifying context-specific network motifs and transcription factors, the results were not robust and unreliable at the molecular scale due to the molecular-based network architecture..
To sum up, making robust statistical inference from diverse types of genomic data to discover context-specific and temporal pathways of biological function presents a formidable challenge. The current network architectures and approaches are insufficient for this goal. The main issue is that the network architecture, model, and inference approach cannot produce robust network inference that can be easily understood.
Towards a pathway-centric network model and functional enrichment inference approach
To overcome the problems of existing network architecture, model, and inference approaches, we advocate a new pathway-centric network (PathRNet) and an enrichment inference approach. The nodes of the PathRNet are regulators such as transcription factors (TFs) ,pathways including signaling pathways and metabolic pathways, and the links bear specific functional meaning, i.e., a link going from TF to pathway defines the transcription regulation of pathway by TF and a link from signaling pathway to TF suggests signal transduction induced modification of TF. The advantages of this model architecture are in four folds. First, this network design is geared directly to biologists, who normally focus on specific pathways and regulators when designing experiments and seeking answers to phenotypic outcomes. Secondly, PathRNet speaks the language of biologists and no interpretation is even needed for the results. Unlike in most of gene or protein networks, where links only imply a degree of association and hence further interpretable is needed, in PathRNet, the linkage such as Signaling pathway A -> TF B -| Metabolic pathway C reads naturally as Signaling pathway A activates TF B, which further inhibits metabolic pathway C. Thirdly, past research has produced considerable data about pathways and transcription regulation, and it is prudent and necessary to understand their regulatory roles in different context before new pathways and regulations can be inferred. Fourthly, PathRNet is more robust to data noise. Since nodes are pathways instead of individual genes, PathRNet inherits the robustness of module networks; given these advantages, we believe that PathRNet can adequately address the aforementioned deficiency of the existing network architectures.
To robustly infer the context specific structure and its temporal dynamics of PathRNet, we propose the Pathway Enrichment Analysis of Temporal Regulatory Networks or PATTERN. The key of PATTERN is the method of the enrichment based time dependent clustering algorithm, ECTDISA , which identifies temporal clusters of pathways and regulated gene sets that are enriched in a context-specific time series microarray data. Once again, the inference in PATTERN moves away from the gene-centric approach to the pathway or gene-set enrichment based concept. The gene set enrichment (GSE) concept formulated by the GSEA algorithm  has gained great success in differential analysis of gene expression data and also extended its applications to drug effect prediction and clustering of coregulated gene. The main advantages of GSE approach are 1) the inference is robust to excessive noise on one or a few genes in the gene set, and 2) the result has straightforward biologically implications since gene sets are mostly pathways or coregulated genes. Recalling the previously mentioned problems of current inference approaches for network reconstruction, we see that these advantages of GSE are the exact answers to these problems. However, GSE has not been used for network construction. It is thus our objective to develop an enrichment approach for network reconstruction. Coupled with PathRNet, the GSE approach will provide added robustness to inference, enabling reliable discovery of network temporal dynamics from noisy microarray data of commonly small sample size.
Overview of our approach
We discuss next the proposed methods for implementing each module.
Generic PathRNet Construction
A generic PathRNet is first constructed based on the existing knowledge database. The prior knowledge databases are collected from NCI-NC  ,Transfac  and MSigDB . A generic PathRNet is constructed with each node corresponding to a signaling pathway, a transcription factor or a metabolic pathway; and an edge is placed:
a) From a signaling pathway to TFs if TFs are the downstream of the signaling pathways regulate transcription factors.
b) From a TF to signaling or metabolic pathways if TF regulated at least one genes in the pathway.
In the end, a prior network is obtained, which illustrates a generic picture of the interactions between pathways and transcription factors inside a human cell.
Noted that in our approach, there are additional steps to refine the constructed generic PathRNet by removing redundant nodes and edges:
In the context-specific PathRNet construction module, all the nonfunctional nodes and their edges were removed, leaving only the enriched functional pathways /TFs and edges between them.
In the regulatory PathRNet construction module, Bayesian information criterion (BIC) was applied for model selection, and all the edges that are not chosen were removed from the constructed regulatory PathRNet.
In the dynamic PathRNet construction module, (dynamic) edges that indicate no or nominal contributions were further removed from the constructed temporal PathRNet.
Since all the following steps remove redundant nodes/edges, we hope keep as many associations as possible when constructing the generic PathRNet to ensure a high sensitivity of reconstruction. Thus, an edge is placed between a transcription factor (TF) and a pathway as long as the TF regulates at least one gene in the Pathway (the pathway and the TF target gene sets share at least one gene). However, the degree of overlapping between the pathway genes and the TF target gene set might itself indicates the strength of their association. Although this information is not used in our algorithm, it is possible to include this information in further extension to better estimate the regulatory impacts of transcription factors.
Context specific PathRNet construction
To capture the functional components of generic PathRNet and construct a context-specific PathRNet, expression profile is then incorporated. The goal is to identify coregulated pathways enriched by the gene expression profile and their associated time period. To this end, we apply the ECTDISA  to the microarray gene expression data. ECTDISA can not only cluster the coregulated genes that are functionally most enriched by a functional annotation database such as pathway databases but also retrieve the timing of the co-regulating clusters.
where x, y are the expression profile of a functional node (pathway or TF), and α is used to penalize smaller fold changes.
It can be seen from the figures that C score reaches its peak when α = 0.3 and at the same time A score is relatively high. This is the value we choose when the approach was applied to real data. To achieve the best result, the use of different α are recommended.
Regulatory PathRNet construction
Note that for now the edges in the functional networks are still defined by the prior knowledge. With the expression data, the edges can be further refined to reflect the regulatory status including up- or down- regulation and its weight. To this end, the expression profiles of the nodes are first determined. For the real case, since even the genes in the same pathways can behave differently, i.e., some are up regulated while some others are down regulated, it possible that the same pathway are enriched in both the up-regulated module and the down-regulated module. To capture the main trend of pathway, we consider the most significantly enriched trend in a pathway by using ECTDISA algorithm. Specifically, for a signaling pathway or a metabolic pathway (nodes labelled with "S-" and "P-" in the network), the expression profile is estimated by averaging all the pathway genes that are most enriched in their temporal trends. Since the trends of their expression profiles are very similar, the averaging expression of those genes is a good estimate of the main trend of the pathway. For example, if at 1 hour, the expressions of 5 genes that are involved in the same pathways are 2, 2.1, 2.2, 0, -3; and the first 3 gene are enriched in a module identified by ECTDISA, then the fold change of the corresponding pathway at 1 hour will be Open image in new window . Similarly, the expression fold change at other time points can be calculated. Note that, since the expression is calculated by averaging along the gene dimension, the expression fold change of different genes are acquired from exactly the same experimental condition; moreover, the microarray data has been properly normalized 
If a pathway is enriched in a temporal transcription module, then the pathway is only active during the inferred time period that the temporal module is active; otherwise its expression is 0. For a transcription factor, the expression fold change profile is inferred by averaging the expression fold changes of its encoding genes.
Next, the regulatory impact is inferred based on the expression of nodes. Let y denote the expression profiles of a node and X = [x1, x2,...,x k ] the expression profiles of its parent nodes. Then, a linear regulatory model is introduced as:
y = X · β + ε
where n is the number of data points in x i , k is the number of parameters to be estimated, and the residual sum of squares (RSS) is the sum of squares of residuals, which defines the discrepancy between the data and our estimation model:
RSS=|| y - ŷ||2
Temporal PathRNet construction
where, x t = [x1, t , x2, t ,…,x p,t ], and P is the number of parent nodes of y. The dynamic contribution of all the parent nodes is (x t – x t –1)âˆ™β, and the dynamic contribution of a parent node from t – 1 to t is: Open image in new window . Then, the dynamic regulatory networks can be obtained by refining the edges at each t. Particularly, if Open image in new window is small, the corresponding edge can be removed indicating no or nominal contribution. Otherwise, the width and the color of the edge will be defined by Open image in new window to illustrate the regulatory influence of the node onto a child node during a particular sample time period. Although small regulatory contribution Open image in new window does lead to edge removal and thus change of network structure, it is mainly the regulatory time period inferred by the ECTDISA that defines the dynamics of the network structure.
The data pattern of each module was generated according to an AR (1) model:
where G m denotes the set of genes in module m and T m is the duration of module m.
In order to simulate the regulatory impact, the pattern of node 8 is modelled as a linear combination of the patterns of node 1 and 2:
X8t =β1X1 t +β2X2 t
where, β3 = β4 = β5 = 0. Then, each expression level in a module is multiplied with a random number from Normal (0,12 );this is used to simulate the difference in expression strength among genes. Finally, Gaussian additive noise was added and data were arranged in a matrix similar to microarray; rows of the data matrix were also shuffled to mimic real biological data.
The simulated system is similar to the one used in , except that additional network structure was specified. The goal of PATTERN is to reconstruct the embedded network structure and estimate the regulatory coefficients β i given the prior knowledge and gene expression data. In the following experiments, we evaluated the impacts of noise effect and prior knowledge on the performance of proposed PATTERN.
In the first experiments, PATTERN was evaluated under different noise variance by considering whether the correct network model was recovered and the percentage of correct model prediction is calculated. A network model is considered to be correctly identified when all the nodes within the network are predicted correctly, i.e., all the functional nodes are predicted to be functional, and all the nonfunctional nodes are predicted to be nonfunctional. For instance, in the network shown in Fig. 4, the network structure is correctly predicted if and only if nodes 1, 2, 8 are predicted functional, and meanwhile node 3-5 are predicted to be nonfunctional. (Node 6, 7 are independent nodes and thus are not considered.). The percentage of correct model prediction is defined as the ratio of the number of correct model predictions to the total number of experimental trials. For example, if among 100 trials, models are predicted correctly 80 times, the percentage of correct model prediction is 80%.
Note that in PATTERN the enrichment analysis serves as a model selection approach. The performance improvement of PATTERN over the direct method in β i estimation demonstrates the importance of model selection. As seen in this example, to recover the true network structure, we need to 1) remove the non-functional nodes 3-5 from network, and 2) decide whether node 1 and 2 regulates node 8. In PATTERN, objective 1) is achieved by the enrichment analysis in the context-specific PATTERN construction module, and objective 2) is tackled by the BIC model selection in regulatory PATTERN construction module. However, in the direct method, objective 1) is not tackled.
Then, we evaluated the impact of annotation database on performance. Even though great effort has been made to construct and improve various prior knowledge databases, the existing databases are still very noisy and often inconsistent with prior knowledge, i.e., annotation is often incomplete and contains error. This scenario was simulated by first assigning a function category only to a fraction of genes in an embedded module and leaving the rest without any annotation, and then introducing randomly selected genes into the function category. The validation result was shown in Fig. 5-(c) and (d). It can be seen from the figure that the ability of PATTERN to identify the correct network structure and estimate the coefficients was not significantly affected as long as more than 50% of coregulated genes are correctly annotated. PATTERN outperforms the direct method under all tested conditions. These results imply that PATTERN is robust to the incomplete and erroneous annotations in the prior knowledge databases. In this experiment, In order to maintain the network structure and acquire a comparable result, the annotations of genes that are shared in several pathways (and thus determine the edges and edge directions between nodes) are unchanged; only the annotation of remaining genes can change.
It can be seen from the figure that, the proposed algorithm PATTERN achieves relatively better precision and recall performance at low noise level; its performance decreases as noise variance increases. Particularly, for the noise variance at 0.35, the precisions of nodes and edges drop below 1 only when respective recalls are above 0.8 and 0.65.
It can be seen from the figures that, when noise variance is small, the proposed algorithm PATTERN performs similarly on the datasets for three different noise distributions; however, when noise variance is large, PATTERN performs much better on the Laplacian noise contaminated datasets than on the other two, which indicates that the proposed approach is more robust to heavy tail noise distributions when the standard deviation of noise is the same.
Reconstruction of PathRNet for KSHV infection
We applied PATTERN to reconstruct a PathRNet for KSHV infection of human primary endothelial cells . Previous studies have shown that KSHV infection of primary human umbilical vein endothelial cells (HUVEC) can be divided into two broad phases: entry phase and postentry phase. The entry phase is an acute event involving the binding of the virus glycoproteins to cellular receptors, delivery of virus encapsidated viral and cellular proteins and RNAs, and activation of various cellular pathways. The postentry phase is composed of a series of non-acute events involving the expression of viral genes and virus replication as well as subsequent virus packaging and egress (lytic) or establishment of viral latency, all of which closely interact with different cellular pathways and the expression of cellular genes. The process of KSHV infection including lytic replication and latency are known to be complicated with various cellular genes and pathways regulated during different time periods under different schemes. It is highly desirable and important to elucidate the dynamic regulations of pathways and gene expressions during infection. The detailed result of the reconstructed PathRNet for KSHV infection produced by PATTERN is discussed next.
Context-specific PathRNet revealed by ECTDISA
ECTDISA was then applied to the time series data to obtain the coregulated gene clusters that are most enriched by the nodes of the generic PathRNet and at the same time retrieve the timing of coregulation.
Examples of enriched pathways and TFs in ECTDISA
P38 MAPK SIGNALING PATHWAY
ERK1/ERK2 MAPK SIGNALING PATHWAY
GENES INVOLVED IN BASAL CELL CARCINOMA
GENES INVOLVED IN DEGRADATION OF GLYCAN STRUCTURES
Specifically, PATTERN was applied to a simulated dataset which consists of 400 genes 10 samples. The PathRNet embedded is shown in Fig. 4. Different p-value thresholds are applied and compared. It can be seen from Fig. 11 that when the threshold is too strict (-40) or too loose (-2.5) the proposed algorithm cannot retrieve the true network structure. The result is better when a reasonable threshold is chosen. To this end, we used a p-value threshold of 0.01 to keep relatively more nodes in the network.
By keeping the only enriched or functional nodes (table 1) of the generic PathRNet, the context-specific PathRNet for KSHV infection was obtained. Since a number of enriched nodes are independent, those nodes are removed from the network. The reconstructed PathRNet consists of 38 nodes and 208 regulations (Fig. 10). Each node corresponds to a signaling pathway (S-), transcription factor (TF-) or a metabolic pathway (P-). The node enriched in the same cluster is highlighted by the same color and also indicated by the class ID "C#". Each directed edge represents the regulatory relationship.
Regulatory and temporal PathRNet
The temporal PathRNets for the KSHV infection of HUVEC were constructed to reveal the dynamics in regulation at each sampling time of the microarray experiment.
In the Fig 13, 14, 15, 16, 17, 18, 19, 20, 21, the gray color of the nodes and links indicates that they are not functional. As shown by these figures, the colors of nodes and links vary over time, revealing the time-dependent characteristics of the network. Regarding the variation of the network structure, two aspects of the algorithm are important. First, the time-dependent module defines the start and the end of an enriched connection in the module. For instance, since P-128 is enriched in a temporal transcription module that is not defined between 0 to 3 h post infection (hpi), P-128 does not function in the first 2 frames of the time varying network (Fig. 13 and 14) and thus the nodes and links are in gray. However, they become active after 3 hpi (Fig. 15-16 and 20-21). Secondly, when the regulatory contribution of a parent node is sufficiently small, the parent node and the link are also marked gray to indicate nominal regulation. For example, while TF-75 regulates P-123, -126, -129 and -100 at different time points; it only regulates some or even none at some instances, say, between 3-6 hpi, or it only regulates P-129 and P-126 at other time points. Furthermore, it is easy to observe that the contributions for many nodes vary at different time points. Interestingly, a positive feedback loop exists between TF-12 and S-7.
Next, interesting regulatory cascades can be revealed by incorporating the annotations with the time varying networks. In Fig. 13 from 0-1hpi, the p38 MAPK pathway (S7) is up-regulated, which also induces the TF STAT1 (TF-61). This suggests that in the early stage of infection KSHV activates p38 MAPK pathway. Indeed, the activation of the p38 MAPK pathway is shown to be essential for KSHV infection and inhibition of the p38 MAPK pathway reduced KSHV infectivity . Induction of STAT1 likely indicates the activation of cellular innate immune response to viral infection . At 3 hpi in Fig. 14, it is suggested that STAT1 also activates SMAD2/3 signaling (S47), which is in agreement with the existing knowledge that activation MAPK pathway enhances Smad2 transcriptional activity . In addition, TF CREB1 (TF-75) is also activated during both 1 and 3 hpi, which is expected since it is at the downstream of the MAPK pathway. Interestingly, the entire cascade from p38 MAPK to CREB1 becomes downregulated at 6 hpi. Following the first wave of activation of the cascade, we have also observed a second wave of activation of the p38 MAPK pathway and TF STAT1 at 10 hpi. The signal is again transduced to SMAD2/3 at 16 hpi accompanying the decay of the p38 MAPK pathway and TF STAT1, before being further transduced to CREB1 at 24 hpi. Activation of this second wave of the p38 MAPK pathway has indeed been described in [21, 23].
PATTERN is a novel approach for analyzing large scale time series gene microarray datasets to uncover dynamic regulations of pathways and gene expression. We discuss next a few distinct features of PATTERN.
First, PATTERN relies on a robust and biological driven network architecture, PathRNet, which directly utilizes the existing functional databases. The enrichment scheme in PATTERN enables the identification of functional components of PathRNet, which enables the subsequent network inferences to focus on the context-specific part of the network structure, and thus significantly reduces inference complexity. As shown by the validation on the simulated network, PATTERN is also robust to the noisy and inconsistent annotations. All of these indicate the great capability of PATTERN to correctly identify the functional part of network structure.
Second, the time-varying network structure and dynamic temporal regulatory impacts can be inferred as a result of the ECTDISA, which not only identifies coregulated functional modules but also provides their timing information. This is critical information for all the related inference tasks in the reconstruction of dynamic network structure. Moreover, PATTERN can accurately estimate the dynamic regulatory weights.
Third, since the goal of PATTTERN is to uncover dynamic network structure, it is especially useful for analyzing non-stationary biological processes, where different regulations exist at different time periods. This is the motivation of studying KSHV infection. Compared with other biological processes such as cell cycles, temporal transcription patterns are more likely to exist in this dataset. In contrast, transcription patterns of cell cycle related genes are often expected to be cyclic and sinusoidal in shape, which also exist in the entire cell cycle process and are not time dependent by our definition.
Fourth, the computational complexity of the proposed PATTERN mainly lies in the context-specific PathRNet construction module, where biologically meaningful modules are retrieved by ECTDISA. The computational complexity of ECTDISA is proportional to the number of genes, the number of samples (conditions), the number of initial gene sets, the window length of the dependent analysis, the number of parameters to be optimized in the enrichment step, and the number of iterations required achieving convergence. (Please refer to [18, 24] for the details)
Firstly, the performance of the proposed method partially relies on the quality of available prior knowledge. It cannot retrieve information of pathways that have not been annotated in existing knowledge databases. The absence of relatively complete and accurate prior knowledge may severely jeopardize its performance. However, as knowledge databases including GO, KEGG, and others continue to grow and improve, the proposed algorithm will improve its prediction performance accordingly.
Secondly, the proposed method cannot capture any intra-pathway interactions including feedback loops. More intelligent approaches are required to study the dynamics of intra-pathway associations. We will explore this modeling in our future research.
A new paradigm that aims at reconstructing robust context-specific and time varying regulatory networks was proposed. The novelties of our approach that enable the discovery are: 1) A robust and biological driven network architecture, PathRNet, which builds upon the existing functional databases; 2) A functional enrichment based algorithm that assesses the enrichment and time span of the regulation of TFs and pathways in the context-specific time series microarray data. Due to PathRNet and the enrichment algorithm, we are able to robustly infer complex time-varying and nonlinear regulatory interactions that are otherwise impossible by the existing models and approaches. The resulting network provides unequivocal interpretation of biological functions and their impacts. The proposed approach for the first time provides a dynamic perspective of pathway, TF regulations, and their interactions related to specific endogenous and exogenous conditions. This context-specific, temporal landscape of regulation will help reveal directly the functional biomarkers that result in different phenotypes and consequently are important to drug design and therapeutic targets. In addition, this algorithm framework will enable the investigation of context specificity and time variation of other types of regulatory impacts such as miRNA regulation, thus capable of providing more complete knowledge of regulation.
Yufei Huang is supported by an NSF Grant CCF-0546345.
Shou-Jiang Gao is supported by NIH grants CA096512 and CA124332.
Publication of this supplement was made possible with support from the International Society of Intelligent Biological Medicine (ISIBM).
This article has been published as part of BMC Genomics Volume 11 Supplement 3, 2010: The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/11?issue=S3.
- 7.Rao A, Hero Iii AO, States DJ, Engel JD: "Inferring time-varying network topologies from gene expression data,". EURASIP J Bioinform Syst Biol. 2007, 51947-Google Scholar
- 14.Meng J, Gao SJ, Huang Y: "Enrichment Constrained Time-Dependent Clustering Analysis for Finding Meaningful Temporal Transcription Modules,". Bioinformatics. 2009Google Scholar
- 15.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,". Proc Natl Acad Sci U S A. 2005, vol. 102: pp. 15545-50.CrossRefGoogle Scholar
- 23.Sadagopan S, Sharma-Walia N, Veettil MV, Raghu H, Sivakumar R, Bottero V, Chandran B: "Kaposi's sarcoma-associated herpesvirus induces sustained NF-kappaB activation during de novo infection of primary human dermal microvascular endothelial cells that is essential for viral gene expression,". J Virol. 2007, vol. 81: pp. 3949-68. 10.1128/JVI.02333-06.CrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.