Dynamic functional modules in co-expressed protein interaction networks of dilated cardiomyopathy
- 5.8k Downloads
Molecular networks represent the backbone of molecular activity within cells and provide opportunities for understanding the mechanism of diseases. While protein-protein interaction data constitute static network maps, integration of condition-specific co-expression information provides clues to the dynamic features of these networks. Dilated cardiomyopathy is a leading cause of heart failure. Although previous studies have identified putative biomarkers or therapeutic targets for heart failure, the underlying molecular mechanism of dilated cardiomyopathy remains unclear.
We developed a network-based comparative analysis approach that integrates protein-protein interactions with gene expression profiles and biological function annotations to reveal dynamic functional modules under different biological states. We found that hub proteins in condition-specific co-expressed protein interaction networks tended to be differentially expressed between biological states. Applying this method to a cohort of heart failure patients, we identified two functional modules that significantly emerged from the interaction networks. The dynamics of these modules between normal and disease states further suggest a potential molecular model of dilated cardiomyopathy.
We propose a novel framework to analyze the interaction networks in different biological states. It successfully reveals network modules closely related to heart failure; more importantly, these network dynamics provide new insights into the cause of dilated cardiomyopathy. The revealed molecular modules might be used as potential drug targets and provide new directions for heart failure therapy.
KeywordsGene Ontology Protein Interaction Network Closeness Centrality Large Connected Component Vascular Endothelial Growth Factor Pathway
Protein-protein interactions (PPI) are of central importance for most biological processes, and thus the protein interaction network (PIN) provides a global picture of cellular mechanisms. With the accumulation of interactome and transcriptome data, the integration of gene expression profiles has revealed the dynamics of protein interaction networks. For example, Han et al.  analyzed the protein interaction network of yeast and uncovered two types of hub proteins: "party" hubs and "date" hubs, which displayed condition- or location-specific features in the interactome network. Xue et al.  developed a new analytic method to discover the dynamic modular structure of the human protein interaction network in their aging study. Recently, Taylor et al.  also proposed another two types of hub proteins: intermodular hubs and intramodular hubs, and identified whether the interactions between proteins were context specific or constitutive in the human protein interaction network. Similar techniques were also applied to reveal disease related genes or modules. Chuang et al.  improved the prognostic predictive performance of gene expression signatures by incorporating interactome data in breast cancer. Taylor et al.  used a method analogous to a previous study  and revealed that dynamic modularity of the human protein interaction network may be a good indicator of breast cancer prognosis. In the context of heart failure, Camargo and Azuaje[5, 6] integrated gene expression profiles with the protein interaction network in human dilated cardiomyopathy and efficiently identified potential novel DCM signature genes and drug targets. These studies suggest that the integration of interactome data with transcriptome information may facilitate the identification or discovery of disease biomarkers.
Heart failure is one of the main causes of death in the world and is the consequence of many complex factors including genetics, diet, environment and lifestyle. Heart failure is a physiological state in which the heart cannot provide sufficient output of blood to meet the body's needs. Dilated cardiomyopathy (DCM), the major cause of heart failure, impairs the blood pumping ability of the heart and leads to insufficient blood flow to vital organs due to the enlargement and weakening of the heart . Previous studies [8, 9, 10] on gene expression profiles have provided distinct perspectives on the etiology of DCM. Barth et al.  pointed out the significant immune response processes involved in end-stage DCM and presented a robust gene expression signature of this disease. Wittchen et al.  suggested novel therapeutic targets by gene expression profile analysis of human inflammatory cardiomyopathy. Kabb et al.  analyzed microarray dataset of human myocardial tissue to obtain region- and DCM-specific transcription profiles and determined the gene expression fingerprint of DCM. Even though various causes of DCM have been revealed, the underlying molecular mechanism of this disease remains unclear.
Here, we developed a network-based analysis approach to discover DCM or non-DCM related functional subnetworks by integrating DCM related gene expression profiles with the human protein interaction network and Gene Ontology (GO) annotations. A comparative analysis was utilized to extract DCM exclusive subnetworks as heart failure related modules. These modules could be used to classify normal and disease samples. We further investigated the co-expressed protein interaction network structures of each module for DCM and non-DCM and observed dynamic variations of the identified modules between the two states. Our results suggest that the modular changes between DCM and non-DCM could imply plausible molecular mechanisms involved in heart failure progression.
Condition-Specific Co-expressed Protein Interaction Networks
Structural information of DCM and non-DCM networks
Hub Proteins in CePINs Tend to Be Differentially Expressed
Comparison of key topological properties
Top 10 significant level-6 GO annotations of hubs
enzyme linked receptor protein signaling pathway
protein modification process
regulation of apoptosis
positive regulation of cellular metabolic process
positive regulation of programmed cell death
paraxial mesoderm morphogenesis
paraxial mesoderm development
positive regulation of transcription
Identification of Two DCM-Related Functional Modules
Summary of the two identified DCM-related modules
Subsequently we compared conditional gene expression levels and correlations of gene co-expression between DCM and non-DCM samples to illustrate the dynamic features of identified functional modules. We found that the average expression levels of member genes in each module changed between two conditions by an amount larger than expected from random subnetworks of equal size (organ morphogenesis: 0.59 vs. 0.39 for average gene expression difference, Z = 2.58, P = 0.01; muscle contraction: 0.58 vs. 0.39, Z = 1.7, P = 0.06). Moreover, the average change of member PPI gene expression correlation between conditions were also significantly higher (organ morphogenesis: 0.58 vs. 0.44, Z = 2.56, P = 0.01; muscle contraction: 0.56 vs. 0.44, Z = 1.75, P = 0.05).
We further studied the largest connected components of these two modules with respect to the co-expressed protein interaction difference between DCM and non-DCM to determine their dynamic features. In the muscle contraction module, DTNA, SNTA1, SNTB1, and DMD were shown to be highly correlated with each other in non-DCM CePIN, but not in DCM CePIN (see Figure 6A). Proteins encoded by DTNA, SNTA1, and SNTB1 are components of the cytoplasmic part of dystrophin-associated protein complex (DAP) . In addition, pivot proteins in both non-DCM and DCM, which have relatively more co-expressed interacting partners, were observed to change from DTNA and DMD to UTRN.
Similar dynamic features were also observed in the organ morphogenesis module. We roughly defined this module into two major regions: the upper diamond, formed by INSR, CRKL, IGF1R and RASA1, and the bottom triangle, by FLT1, NRP2 and NRP1 (see Figure 6B). From the results, it is evident that the communication between the diamond and the triangle in DCM CePIN was bridged by KDR and CTNNB1, but disconnected in non-DCM CePIN. Moreover, the diamond structure in non-DCM CePIN was observed to have collapsed. These changes in the muscle contraction and organ morphogenesis modules may hold some clues to the progression of DCM.
Protein-protein interaction networks cover all possible interactions regardless of when or where the interaction takes place. In this sense, they are static. By integrating gene expression profiles of DCM with human protein-protein interaction networks, we successfully extracted two co-expressed protein interaction networks (CePINs), i.e. DCM and non-DCM. Here, we showed that DCM and non-DCM CePINs exhibit substantial differences in co-expressed protein-protein interactions, even though their network structures are similar. The differences may be attributed to gene expression variations and interaction rewiring between DCM and non-DCM conditions. We suggest that CePINs are able to reveal condition-specific interactions and the dynamic features hidden in static protein-protein interaction networks.
Next, we showed that hub proteins in CePINs tended to be SDEGs compared to non-hub proteins. In CePINs, proteins with higher degrees imply that they have more direct interacting partners co-expressed in gene expression levels; therefore any significant modification in their expression levels might influence more interacting partners. This observation suggests that once gene expression of hub proteins is changed, it is expected to cause greater expression variations to its neighboring interaction network in DCM.
Since our analysis relies on PCC calculated from gene expression data to define CePPIs and construct condition-dependent networks, we have carried out some further examinations about its robustness. First, we performed the same analysis with tightened PCC threshold, P < 0.01, and obtained consistent results (see Table S1 - S4, Figure S1 - S2 in Additional File 1). Second, Li et al.  recently revealed that correlations of gene expression levels to disease states could vary a lot with randomly selected subsets of the samples from one single microarray data set. Under this light, we have performed re-sampling of our gene expression data and found that both the recovery rate of CePPIs and the identification rates of the two DCM-related modules decreased as we lowered the sample size (Figure S3 - S4 in Additional File 1). However, the identification rates of the two DCM-related modules were much higher than any other modules, indicating that these two identified modules were robustly related to DCM.
Since the human PPI data are still incomplete and noisy, there are different curated collections of human PPI data available. To examine whether our analysis was robust against the PIN we used, we performed the same analysis with an expanded PIN integrating the PPI data from HPRD  and BioGRID  databases and obtained consistent results (see Table S5 - S8, Figure S5 - S6 in Additional File 1).
One of the major symptoms of heart failure is the inability of the heart to sufficiently supply blood flow to the rest of the body. This is strongly related to heart muscle contraction efficiency. The failing heart undergoes morphological changes and becomes weakened and enlarged in DCM, the most common form of cardiomyopathy. Our findings in DCM-related modules of muscle contraction and organ morphogenesis were consistent with the known symptoms of cadiomyopathy. Consequently, we further investigated these two modules in relation to the underlying molecular mechanisms of dilated cardiomyopahty.
In the muscle contraction module, three SDEGs, DMD, DTNA and UTRN, form a closed loop, implying topological significance. Dystrophin, encoded by DMD, is a recessive, fatal X-linked disorder. It appears to stabilize the sarcolemma and protects muscle fibers from long-term contraction-induced damage and necrosis , though its precise roles at the cellular level are still to be elucidated. Dystrobrevin-alpha, encoded by DTNA, belongs to the dystrobrevin subfamily of the dystrophin family and is a component of the dystrophin-associated protein complex (DAP) . Disruption of DAP is associated with various forms of muscular dystrophy. Dystrophin binds to the intracellular cytoskeleton by associating with actin filaments at its N-terminus, whereas at its C-terminus dystrophin interacts with members of the DAP, such as β-dystroglycan, which is encoded by DAG1. Dystrophin therefore links the intracellular microfilament network of actin to a complex series of linking proteins in the cell membrane, and hence to the extracellular matrix . In the muscle contraction module, both DMD and DTNA were significantly down-regulated. The absence of dystrophin was suggested to cause the collapse of the entire DAP and plasma membrane, leading to muscle damage . On the other hand, mutations of DTNA are associated with left ventricular non-compaction with congenital heart defects . These defects might lead to cardiac muscle damage and possibly DCM.
On the other hand, in the organ morphogenesis module, we noticed that the largest connected component contains two major clusters. The two major clusters consisted of insulin pathway-related genes, including IGF1R (insulin-like growth factor 1 receptor) and INSR (insulin receptor) and vascular endothelial growth factor (VEGF) pathway-related genes, including FLT1 (VEGFR1), NRP1, NRP2 and KDR (VEGFR2). These two pathways have both been reported to be important in cardiac remodeling . Proper IGF1R and INSR signaling plays an essential role in cardiac function, and the disruption of this signaling induces the onset of DCM in knockout mice [24, 25]; while the VEGF pathway is crucial in vasculogenesis and angiogenesis, which was reported to be altered in DCM . The significant up-regulation of these two clusters of genes can therefore signify autosomal repair for damage caused by hypoxia induced by early DCM symptoms. Malfunction of these activated pathways are possible reasons for the disease progression.
With focus on these pathways, we compared the subnetworks in DCM and non-DCM patients and found several notable points. First, these two clusters were not independent from each other, but were linked by a string of genes: RASA1, KDR, CTNNB1, and FLT1. RASA1, encoding p120-RasGAP which activates RAS GTPase, is best known for its negative regulation of the RAS/MAPK pathway downstream of several growth factor pathways responsible for cell proliferation, including the IGF-1, insulin and VEGF pathway. Proper activation of the RAS-dependent pathway is important for the functions of these pathways. The up-regulation and the linkage between RASA1 and INSR may infer a possible negative regulation of the Ras-dependent pathway. We also found CrkL, a protein that mediates Ras-dependent activation, to be significantly down-regulated in DCM patients. These observations imply the negative regulation of growth factor signaling involving insulin and insulin-like growth factor. Second, the down-regulation of CTNNB, which encodes beta-catenin in VE-cadherin essential for contact inhibition of VEGF-induced proliferation [26, 27], infers a malfunction in the control of VEGF-induced vasculogenesis. The failure of beta-catenin regulation and defective vascularization have been observed in idiopathic DCM .
Our findings regarding the organ morphogenesis module successfully revealed possible integration of two important pathways in DCM and the crucial role that RASA1 up-regulation and CTNNB1 down-regulation might play in the etiology of DCM.
Altogether we have developed a network-based comparative analysis approach that integrates protein-protein interactions with gene expression profiles and biological function annotations to reveal dynamic functional modules under different biological states. Application to DCM reveals two functional modules with dynamic features accounting for the underlying disease mechanisms. The revealed molecular modules might be used as potential drug targets and provide new directions for heart failure therapy.
Protein Interaction Network and Expression Data
The human protein interaction network (PIN) was downloaded from Human Protein Reference Database (HPRD) , and only the largest connected component, containing 9,059 proteins and 34,869 interactions, was studied.
The gene expression data of DCM were retrieved from Gene Expression Omnibus (GEO), accession number GSE3586 , containing 37,530 genes and 28 samples (13: DCM, 15: non-DCM) in total, with 6,475 genes involved in HPRD PIN. Another set of expression profiles of DCM was retrieved from GEO accession number GSE4172  to evaluate the classification performance of identified modules.
Co-expressed Protein Interaction Networks
where n is the number of condition-specific samples; Exp(X,i)(Exp(Y,i)) is the expression level of gene X (Y) in the sample i under a specific condition (DCM or non-DCM); () represents the average expression level of gene X (Y) and σ(X) (σ(Y)) represents the standard deviation of expression level of gene X (Y). Larger absolute values of PCC indicate higher correlation between evaluated gene pairs. Those with a P-value of less than or equal to 0.05 were considered as significantly correlated. Protein-protein interactions between proteins encoded by significantly correlated gene pairs are defined as co-expressed protein-protein interactions (CePPIs). Based on this definition, the co-expressed protein interaction network (CePIN) is defined as the set of CePPIs. Note that we defined CePPI as co-expression in gene expression levels instead of protein concentrations since we were lack of the corresponding proteome data. Although a gene expression level cannot always represent its protein concentration, previous studies have observed notable correlations between them . If proteome data are available, the same analysis procedures described here can be applied with replacement of Exp(X) and Exp(Y) by protein concentrations.
Significantly Differentially Expressed Genes and Hub Proteins
Significantly differentially expressed genes (SDEGs) were determined by Wilcoxon rank sum test (P ≤ 0.05, DCM against non-DCM). Up-regulated and down-regulated SDGEs were defined as genes expressed significantly higher or lower respectively in DCM than in non-DCM. DCM hub proteins were defined as nodes involved with more than 23 DCM CePPIs, since these proteins were among the top 1% of the CePPI degree distribution of the DCM CePIN.
Topological Analysis of Networks
in which e NB is the number of observed interactions between interacting partners of protein i, and |NB i | represents the number of its interacting partners. gives the number of all possible interactions among its interacting partners. In this study, only the largest connected component of each CePIN was considered.
Identification of Condition-Specific Functional Modules
X denotes the evaluated functional category in GO. N represents the number of GO annotated genes, which appeared in our microarray data as well as in HPRD PIN, while m represents that in DCM or non-DCM CePIN. n represents the number of genes which are annotated as the evaluated GO functional category in HPRD PIN. Thus, this formula calculated the probability of the evaluated functional category that had k genes in CePIN. The calculated P-value is then adjusted by applying the Benjamini and Hochberg multiple testing procedure  to control the false discovery rate (FDR) at significance level of 0.05.
e is the abbreviation of the functional dyad. Each symbol represents the same meaning with the previous one in the original hypergeometric test, but the counting targets are changed from functional genes to functional dyads. The same multiple testing procedure is performed to adjust the P-value.
Functional subnetworks mapped in FDCM_exclusive will be designated as candidate DCM related modules. For functional specificity, only candidate DCM related modules with GO annotations greater than or equal to level 5 were considered as DCM related modules.
Evaluation of Classification Accuracy
Hierarchical clustering was used to group samples into two categories (i.e. DCM or non-DCM) according to the gene expression levels of members in each module. Gene expression distance between different samples was calculated by Euclidean distance. The classification tree produced by hierarchical clustering was separated at the root into two groups (sub-trees). The group with more DCM samples is defined as positive and the other group is negative. The number of DCM samples in the positive group is defined as true positive (TP) and the number of non-DCM samples clustered into this group is false positive (FP). The number of non-DCM samples in the negative group is defined as true negative (TN) and the number of the rest DCM samples is false negative (FN). They were used to evaluate sensitivity () and specificity (). Then, accuracy was calculated by and was used to measure the classification ability of each module. The receiver operating characteristic curve (ROC) was obtained according to the module activity score of each sample, which was defined as the average expression level of all member genes in the module.
This work was supported by National Science Council of Taiwan; National Health Research Institutes, Taiwan (NHRI-EX98-9819PI); and National Taiwan University Frontier and Innovative Research Projects.
- 8.Barth AS, Kuner R, Buness A, Ruschhaupt M, Merk S, Zwermann L, Kaab S, Kreuzer E, Steinbeck G, Mansmann U, et al.: Identification of a common gene expression signature in dilated cardiomyopathy across independent microarray studies. J Am Coll Cardiol. 2006, 48: 1610-1617. 10.1016/j.jacc.2006.07.026CrossRefPubMedGoogle Scholar
- 9.Kaab S, Barth AS, Margerie D, Dugas M, Gebauer M, Zwermann L, Merk S, Pfeufer A, Steinmeyer K, Bleich M, et al.: Global gene expression in human myocardium-oligonucleotide microarray analysis of regional diversity and transcriptional regulation in heart failure. J Mol Med. 2004, 82: 308-316. 10.1007/s00109-004-0527-2CrossRefPubMedGoogle Scholar
- 10.Wittchen F, Suckau L, Witt H, Skurk C, Lassner D, Fechner H, Sipo I, Ungethum U, Ruiz P, Pauschinger M, et al.: Genomic expression profiling of human inflammatory cardiomyopathy (DCMi) suggests novel therapeutic targets. J Mol Med. 2007, 85: 257-271. 10.1007/s00109-006-0122-9PubMedCentralCrossRefPubMedGoogle Scholar
- 26.Roura S, Planas F, Prat-Vidal C, Leta R, Soler-Botija C, Carreras F, Llach A, Hove-Madsen L, Pons Llado G, Farre J, et al.: Idiopathic dilated cardiomyopathy exhibits defective vascularization and vessel formation. Eur J Heart Fail. 2007, 9: 995-1002. 10.1016/j.ejheart.2007.07.008CrossRefPubMedGoogle Scholar
- 27.Grazia Lampugnani M, Zanetti A, Corada M, Takahashi T, Balconi G, Breviario F, Orsenigo F, Cattelino A, Kemler R, Daniel TO, Dejana E: Contact inhibition of VEGF-induced proliferation requires vascular endothelial cadherin, beta-catenin, and the phosphatase DEP-1/CD148. J Cell Biol. 2003, 161: 793-804. 10.1083/jcb.200209019CrossRefPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.