Computational drug repositioning through heterogeneous network clustering
- 4.9k Downloads
Given the costly and time consuming process and high attrition rates in drug discovery and development, drug repositioning or drug repurposing is considered as a viable strategy both to replenish the drying out drug pipelines and to surmount the innovation gap. Although there is a growing recognition that mechanistic relationships from molecular to systems level should be integrated into drug discovery paradigms, relatively few studies have integrated information about heterogeneous networks into computational drug-repositioning candidate discovery platforms.
Using known disease-gene and drug-target relationships from the KEGG database, we built a weighted disease and drug heterogeneous network. The nodes represent drugs or diseases while the edges represent shared gene, biological process, pathway, phenotype or a combination of these features. We clustered this weighted network to identify modules and then assembled all possible drug-disease pairs (putative drug repositioning candidates) from these modules. We validated our predictions by testing their robustness and evaluated them by their overlap with drug indications that were either reported in published literature or investigated in clinical trials.
Previous computational approaches for drug repositioning focused either on drug-drug and disease-disease similarity approaches whereas we have taken a more holistic approach by considering drug-disease relationships also. Further, we considered not only gene but also other features to build the disease drug networks. Despite the relative simplicity of our approach, based on the robustness analyses and the overlap of some of our predictions with drug indications that are under investigation, we believe our approach could complement the current computational approaches for drug repositioning candidate discovery.
KeywordsHeterogeneous Network Graph Cluster Hidradenitis Suppurativa Vismodegib Mouse Phenotype
Drug development in general is time-consuming, expensive with extremely low success and relatively high attrition rates. To overcome or by-pass this productivity gap and to lower the risks associated with drug development, more and more companies are resorting to approaches, commonly referred to as "Drug Repositioning" or "Drug Repurposing". Drug repositioning is nothing but identifying and developing new uses for existing or abandoned pharmacotherapies . Since the starting point is usually approved compounds with known bioavailability and safety profiles, proven formulation and manufacturing routes, and well-characterized pharmacology, repositioned drugs can enter clinical phases more rapidly and at a fraction of costs incurred in the discovery and development of completely novel compounds . This new indication discovery has already yielded several successes that include the repositioning of sildenafil from an anti-angina drug to erectile dysfunction treatment and repositioning thalidomide, a withdrawn drug, for leprosy and multiple myeloma. Indeed, it is not surprising that in recent years, repositioned drugs account for ~30% of the new medicines that reach their first markets. Although there are several advantages, rational drug repositioning poses formidable challenges primarily because the molecular basis and the underlying mechanisms of most diseases and drug actions are either elusive or poorly understood, intricate, or are not readily amenable to human or computational data mining techniques.
Drug repositioning is predominantly dependent on two principles: i) the "promiscuous" nature of the drug and ii) targets relevant to a specific disease or pathway may also be critical for other diseases or pathways [3, 4]. The latter may be represented as a shared gene or feature (biological process, pathway, or phenotype) between a disease-disease, drug-drug, or a disease-drug. Based on this principle, some computational approaches (see recent review ) have been developed and applied to identify drug repositioning candidates ranging from mapping gene expression profiles with drug response profiles [6, 7, 8, 9, 10, 11, 12], to side-effect based similarities [13, 14, 15].
An increasing number of network-based methods built on "guilt by association" principle have also been used to identify drug repositioning candidates. For instance, Chiang and Butte computed disease-disease similarity network to identify drug repositioning candidates , while some other approaches used either drug-drug similarities [13, 17] or both disease-disease and drug-drug similarities [18, 19, 20]. However, most of these approaches were either drug-centric or disease-centric and not "indications-centric". In other words, few studies have used a direct disease-drug-centric approach. While there have been studies using heterogeneous networks [17, 21, 22, 23, 24] for drug repositioning, to the best of our knowledge there have been no previous reports that (a) undertook a direct analysis of heterogeneous disease-drug network and (b) used network clustering-based approaches on heterogeneous networks to identify drug repositioning candidates.
In the current study, we built a gene and feature-based (shared biological processes, pathways, phenotype) disease and drug heterogeneous network and applied network clustering to identify drug repositioning candidates. We used two state-of-art network clustering approaches [25, 26] to identify the modules of diseases-drugs. We validated the robustness of our methodology by removing ten percent of the edges and calculating the recovery rate of our predictions. Finally, we performed a literature and clinical trials data search to check for potential overlap of our discovered novel indications.
Disease-gene and drug-gene associations
Known disease-gene and drug-target associations were downloaded from KEGG Medicus (Feb, 2013), . There were a total of 1301 diseases and 3613 drugs with at least one known gene association along with 1976 known indications (representing 364 diseases and 1066 drugs). To augment the drug targets, we also used drug-target data from DrugBank  using KeggDrug-DrugBank mappings (see Additional file 1 for a complete list of disease-genes and drug-targets).
Generation of disease-disease, drug-drug, and disease-drug pairs based on shared genes or features
Graph clustering of weighted drug-disease heterogeneous network
We applied graph clustering to the weighted drug-disease heterogeneous network to extract densely connected clusters of diseases and drugs and mined them to extract potential candidates for drug repositioning. We used two state-of-art graph clustering algorithms, namely ClusterONE  and Louvain's modularity  for the module detection.
where represents the edge between node and , is the sum of the weights of edges associated with node , is the community that node is assigned to, was 1 if and 0 if otherwise and . Although the partitioning seems like an approximate method and nothing ensures that the global maximum of modularity is attained, several tests have shown that it provides a decomposition in communities with modularity that is close to optimality . The implementation is available as a plug-in in Gephi .
where, denotes the total weight of edges within a group of vertices V, denotes the total weight of edges connecting this group to the rest of the graph while is the penalty term. We used ClusterONE because of its ability to identify overlapping cohesive sub networks in weighted networks and was shown previously to detect meaningful local structures in various biological networks [31, 32]. We used the ClusterONE plug-in available in Cytoscape  for implementation.
Analyses of known indications in disease-drug network
Starting with 1976 known indications (disease-drug pairs) from Kegg Medicus, we first filtered out diseases and drugs that do not have a known gene association in the Kegg database of disease genes and drug targets. This resulted in 1041 known indications representing 203 diseases and 588 drugs (Additional File 2). Using this data, we found that of the 1041 known indications (disease-drug pairs) only 132 pairs share at least one common gene (i.e., a disease-associated gene is also a drug target). We then checked if any of the known indications share a pathway. To do this, we used the disease-pathway and drug-pathway annotations from Kegg Medicus. While this also revealed that only 116 disease-drug pairs share a common pathway, what was surprising was that only 36 disease-drug pairs share both a pathway and a gene. This demonstrates that disease-drug relationships cannot be captured just through gene-centric approaches.
To analyze the characteristics of known indications further, we computed a distance measure between each of the known indication pairs in the human protein interactome (downloaded from NCBI's Entrez Gene ). We calculated the shortest path for all known indications (i.e., shortest path between a known disease and drug pair) in the protein interactions network using JUNG . Of the 1041 known indications, we were able to compute the shortest paths for 1008 disease-drug pairs. For the remaining pairs, we were unable to compute the shortest paths because their encoded proteins were either absent in the interactome or were not reachable (e.g., a disease protein and drug target present in two different connected components of the protein interactome). The average distance between a disease-drug of known indications is 3.75 (median distance of 4), a finding concurred by previous reports . These preliminary analyses, and our previous studies  with rare disease networks where we noted that the relationship between diseases cannot be fully captured by the genes network alone, motivated us to build a feature-based functional connectivity map between diseases and drugs.
Disease-disease, drug-drug, and disease-drug pairs - edge pruning and weighted heterogeneous network generation
Using the disease-gene, drug-target, and the enriched features of diseases and drugs (based on functional enrichment analyses of diseases and drugs), we built a gene and feature-based network where nodes represent disease or drug while the edges represent shared gene and/or enriched features (biological process, mouse phenotype and pathways; p-value ≤0.05 Bonferroni correction). We used Jaccard score to measure the feature similarity between each pair of the nodes. In order to retain only edges that represent significant potentially significant relationships, we used a cutoff of 0.5 on Jaccard indexes across the four networks (gene-based and the 3 feature-based networks). Thus, the final network contained edges which were a union of pairs that passed the 0.5 Jaccard score threshold in each individual category.
Based on whether a pair of nodes (disease-disease, disease-drug, and drug-drug) shares genes or enriched features or both, we assigned weights to all the edges in the filtered pairs. For instance, a pair of nodes with a weighted edge of 1 indicates that they share either a gene or one of the three features whereas a weight of 4 indicates that the two nodes showed significant associations (sharing not only a gene but also the three features, namely, biological process, pathway, and phenotype). The resulting weighted heterogeneous network consisted of 657 disease nodes and 3489 drug nodes. The total number of edges in this network is 116493; 680 edges were between two diseases, 1626 were between a disease-drug and 114187 between two drugs (Additional File 3).
Modularity analyses of the disease-drug network
We used two graph clustering algorithms to detect disease-drug modules in this weighted heterogeneous network of diseases and drugs. Using Louvain's method, we could identify 293 modules. Of these, 98 modules comprised nodes of both diseases and drugs. Using ClusterONE, we were able to partition the disease-drug heterogeneous network into 312 clusters (p value ≤ 0.05), of which, 110 clusters comprised both diseases and drugs (see Additional file 4 for a complete list of ClusterONE and Louvain method based modules) (Figure 1).
Using the ClusterONE and Louvain detected communities we generated all possible disease-drug combinations on a per cluster basis. We call these the "drug repositioning candidates". To test the robustness of these novel drug repositioning candidate pairs, we removed 10% of the edges at a time and calculated the recovery rate of our predictions in a repetitive manner. Briefly, in each run, we randomly removed 10% of edges from the heterogeneous weighted disease-drug network and performed graph clustering (both ClusterONE and Louvain methods) to detect the communities and extract drug repositioning candidate pairs. We repeated this for ten times and compared the drug repositioning candidates with those from the original network (before randomly removing the 10% edges). The average recovery rate in case of drug repositioning candidates generated by ClusterONE was ~95% while in case of Louvain clustering it was ~85%. This demonstrates that the drug repositioning candidates we have discovered are robust and that additional edge removal or addition will not affect the output significantly.
Drug repositioning candidates and literature-based evaluation
Examples of some of the drug repositioning candidates along with their count of PubMed references (see Additional file 6 for more details)
Basal cell carcinoma
Familial male precocious puberty
Vismodegib and Gorlin syndrome
γ-secretase inhibitors, NSAID, Alzheimer's and Hidradenitis suppurativa
While the overlap of our discovered drug repositioning candidates with those under clinical trials (and literature evidences) demonstrates the utility of our approach, it also shows the limitations of computational approaches. In other words, while the computational approaches can provide potential candidates for drug repositioning, it may not be easy to foresee their failure in clinical trials. Nevertheless, the feature details (e.g., shared pathways, biological processes, phenotypes) our approach provides for the disease and candidate drug connectivity may not only help in understanding the molecular basis of side-effects but also make more informed decisions.
Our approach to predict novel indications by representing disease-drug combinations as combinations of their molecular and mechanistic features, including biological processes, pathways, and phenotypes, not only led to the proposal of drug repositioning candidates but also allowed mechanistic insights into them. The robustness of our predictions and their overlap with those reported in the literature and clinical trials demonstrate that this approach can effectively identify new indications with the enriched feature patterns as an indicator for the mode of action. Although we have looked beyond the gene-based relationships, a limitation of this method is that it relies on the feature patterns enriched in diseases and drugs which themselves are generated using the genes associated with diseases or drugs. Thus, diseases and drugs that currently lack gene annotations are left out. Nevertheless, some of the discovered novel indications are far from being obvious and may also help in understanding the molecular basis of side effects. As Novac points out in a recent review , while it is too early to evaluate the success of repositioning efforts, the obvious candidates for repositioning may have already been exhausted. Thus, a much more thorough analysis and investment has to be done to reposition the rest of the candidates .
This work was supported in part by Cincinnati Digestive Health Center (NIH P30 DK078392) and Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center.
Funding for the publication fee and open access charge is from Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
This article has been published as part of BMC Systems Biology Volume 7 Supplement 5, 2013: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM 2013): Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/7/S5.
- 6.Iorio F, Bosotti R, Scacheri E, Belcastro V, Mithbaokar P, Ferriero R, Murino L, Tagliaferri R, Brunetti-Pierri N, Isacchi A, di Bernardo D: Discovery of drug mode of action and drug repositioning from transcriptional responses. Proc Natl Acad Sci USA. 2010, 107 (33): 14621-14626. 10.1073/pnas.1000138107.PubMedCentralCrossRefPubMedGoogle Scholar
- 9.Dudley JT, Sirota M, Shenoy M, Pai RK, Roedder S, Chiang AP, Morgan AA, Sarwal MM, Pasricha PJ, Butte AJ: Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci Transl Med. 2011, 3 (96): 96ra76-10.1126/scitranslmed.3002648.PubMedCentralCrossRefPubMedGoogle Scholar
- 10.Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet J-P, Subramanian A, Ross KN, Reich M, Hieronymus H, Wei G, Armstrong SA, Haggarty SJ, Clemons PA, Wei R, Carr SA, Lander ES, Golub TR: The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science. 2006, 313 (5795): 1929-1935. 10.1126/science.1132939.CrossRefPubMedGoogle Scholar
- 11.Iskar M, Zeller G, Blattmann P, Campillos M, Kuhn M, Kaminska KH, Runz H, Gavin AC, Pepperkok R, van Noort V, Bork P: Characterization of drug-induced transcriptional modules: towards drug repositioning and functional understanding. Mol Syst Biol. 2013, 9: 662-PubMedCentralCrossRefPubMedGoogle Scholar
- 19.Assaf G, Gideon YS, Eytan R, Roded S: PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology. 2011, 7 (1):Google Scholar
- 20.Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB, Whaley R, Glennon RA, Hert J, Thomas KL, Edwards DD, Shoichet BK, Roth BL: Predicting new molecular targets for known drugs. Nature. 2009, 462 (7270): 175-181. 10.1038/nature08506.PubMedCentralCrossRefPubMedGoogle Scholar
- 21.Wu Z, Wang Y, Chen L: Network-based drug repositioning. Mol Biosyst. 2013Google Scholar
- 22.Lee HS, Bae T, Lee JH, Kim DG, Oh YS, Jang Y, Kim JT, Lee JJ, Innocenti A, Supuran CT, Chen L, Rho K, Kim S: Rational drug repositioning guided by an integrated pharmacological network of protein, disease and drug. BMC Syst Biol. 2012, 6: 80-10.1186/1752-0509-6-80.PubMedCentralCrossRefPubMedGoogle Scholar
- 24.Suthram S, Dudley JT, Chiang AP, Chen R, Hastie TJ, Butte AJ: Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol. 2010, 6 (2): e1000662-10.1371/journal.pcbi.1000662.PubMedCentralCrossRefPubMedGoogle Scholar
- 30.Bastian M, Heymann S, Jacomy M: Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media: 2009. 2009Google Scholar
- 32.Van Landeghem S, De Bodt S, Drebert ZJ, Inzé D, Van de Peer Y: The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study on Arabidopsis. The Plant Cell Online. 2013Google Scholar
- 35.Madadhain J, Fisher D, Smyth P, Boey Y: Analysis and visualization of network data using JUNG. Journal of Statistical Software. 2005, 10: 1-35.Google Scholar
- 39.Sayers E: E-utilities quick start. 2008Google Scholar
- 40.Sekulic A, Migden MR, Oro AE, Dirix L, Lewis KD, Hainsworth JD, Solomon JA, Yoo S, Arron ST, Friedlander PA, Marmur E, Rudin CM, Chang AL, Low JA, Mackey HM, Yauch RL, Graham RA, Reddy JC, Hauschild A: Efficacy and safety of vismodegib in advanced basal-cell carcinoma. N Engl J Med. 2012, 366 (23): 2171-2179. 10.1056/NEJMoa1113713.CrossRefPubMedGoogle Scholar
- 41.Von Hoff DD, LoRusso PM, Rudin CM, Reddy JC, Yauch RL, Tibes R, Weiss GJ, Borad MJ, Hann CL, Brahmer JR, Mackey HM, Lum BL, Darbonne WC, Marsters JC, de Sauvage FJ, Low JA: Inhibition of the hedgehog pathway in advanced basal-cell carcinoma. N Engl J Med. 2009, 361 (12): 1164-1172. 10.1056/NEJMoa0905360.CrossRefPubMedGoogle Scholar
- 42.Tang JY, Mackay-Wiggan JM, Aszterbaum M, Yauch RL, Lindgren J, Chang K, Coppola C, Chanana AM, Marji J, Bickers DR, Epstein EH: Inhibiting the hedgehog pathway in patients with the basal-cell nevus syndrome. N Engl J Med. 2012, 366 (23): 2180-2188. 10.1056/NEJMoa1113538.PubMedCentralCrossRefPubMedGoogle Scholar
- 45.Bateman RJ, Siemers ER, Mawuenyega KG, Wen G, Browning KR, Sigurdson WC, Yarasheski KE, Friedrich SW, Demattos RB, May PC, Paul SM, Holtzman DM: A gamma-secretase inhibitor decreases amyloid-beta production in the central nervous system. Ann Neurol. 2009, 66 (1): 48-54. 10.1002/ana.21623.PubMedCentralCrossRefPubMedGoogle Scholar
- 51.Novac N: Challenges and opportunities of drug repositioning. Trends Pharmacol Sci. 2013Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.