Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

A Novel Method for Identifying the Potential Cancer Driver Genes Based on Molecular Data Integration

  • 174 Accesses


The identification of the cancer driver genes is essential for personalized therapy. The mutation frequency of most driver genes is in the middle (2–20%) or even lower range, which makes it difficult to find the driver genes with low-frequency mutations. Other forms of genomic aberrations, such as copy number variations (CNVs) and epigenetic changes, may also reflect cancer progression. In this work, a method for identifying the potential cancer driver genes (iPDG) based on molecular data integration is proposed. DNA copy number variation, somatic mutation, and gene expression data of matched cancer samples are integrated. In combination with the method of iKEEG, the "key genes" of cancer are identified, and the change in their expression levels is used for auxiliary evaluation of whether the mutated genes are potential drivers. For a mutated gene, the concept of mutational effect is defined, which takes into account the effects of copy number variation, mutation gene itself, and its neighbor genes. The method mainly includes two steps: the first step is data preprocessing. First, DNA copy number variation and somatic mutation data are integrated. Then, the integrated data are mapped to a given interaction network, and the diffusion kernel is used to form the mutation effect matrix. The second step is to obtain the key genes by using the iKGGE method, and construct the connection matrix by means of the gene expression data of the key genes and mutation impact matrix of the mutated genes. Experiments on TCGA breast cancer and Glioblastoma multiforme datasets demonstrate that iPDG is effective not only to identify the known cancer driver genes but also to discover the rare potential driver genes. When measured by functional enrichment analysis, we find that these genes are clearly associated with these two types of cancers.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249

  2. Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC, Pochanard P, Mozes E, Garraway LA, Pe'er D (2010) An integrated approach to uncover drivers of cancer. Cell 143(6):1005–1017

  3. Amgalan B, Lee H (2015) DEOD: uncovering dominant effects of cancer-driver genes based on a partial covariance selection method. Bioinformatics 31(15):2452–2460

  4. An O, Dall'Olio GM, Mourikis TP, Ciccarelli FD (2016) NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings. Nucleic Acids Res 44(D1):D992–D999.

  5. Babaei S, Hulsman M, Reinders M, de Ridder J (2013) Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion. Bmc Bioinf 14:29.

  6. Bachman KE, Argani P, Samuels Y, Silliman N, Ptak J, Szabo S, Konishi H, Karakas B, Blair BG, Lin C et al (2004) The PIK3CA gene is mutated with high frequency in human breast cancers. Cancer Biol Ther 3(8):772–775

  7. Banerji S, Cibulskis K, Rangel-Escareno C, Brown KK, Carter SL, Frederick AM, Lawrence MS, Sivachenko AY, Sougnez C, Zou LH et al (2012) Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486(7403):405–409

  8. Bashashati A, Haffari G, Ding JR, Ha G, Lui K, Rosner J, Huntsman DG, Caldas C, Aparicio SA, Shah SP (2012) DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol 13(12):R124.

  9. Bertrand D, Chng KR, Sherbaf FG, Kiesel A, Chia BKH, Sia YY, Huang SK, Hoon DSB, Liu ET, Hillmer A et al (2015) Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles. Nucleic Acids Res 43(7):e44.

  10. Cervigne NK, Machado J, Goswami RS, Sadikovic B, Bradley G, Perez-Ordonez B, Galloni NN, Gilbert R, Gullane P, Irish JC et al (2014) Recurrent genomic alterations in sequential progressive leukoplakia and oral cancer: drivers of oral tumorigenesis? Hum Mol Genet 23(10):2618–2628

  11. Cheng FX, Zhao JF, Zhao ZM (2016) Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform 17(4):642–656

  12. Chin L, Meyerson M, Aldape K, Bigner D, Mikkelsen T, VandenBerg S, Kahn A, Penny R, Ferguson ML, Gerhard DS et al (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216):1061–1068

  13. Cho A, Shim JE, Kim E, Supek F, Lehner B, Lee I (2016) MUFFINN: cancer gene discovery via network analysis of somatic mutation data. Genome Biol 17:129.

  14. Cizkova M, Vacher S, Meseure D, Trassard M, Susini A, Mlcuchova D, Callens C, Rouleau E, Spyratos F, Lidereau R, Bièche I (2013) PIK3R1 underexpression is an independent prognostic marker in breast cancer. BMC Cancer 13:545.

  15. Dees ND, Zhang QY, Kandoth C, Wendl MC, Schierding W, Koboldt DC, Mooney TB, Callaway MB, Dooling D, Mardis ER et al (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res 22(8):1589–1598

  16. Ding PJ, Luo JW, Liang C, Xiao Q, Cao BW (2018) Human disease MiRNA inference by combining target information based on heterogeneous manifolds. J Biomed Inform 80:26–36

  17. Estival A, Pineda E, Martinez-Garcia M, Marruecos J, Mesia C, Lucas A, Macia M, Gil M, Gallego O, Verger E et al (2016) MGMT methylated (Met) patients (p) with glioblastoma (GBM) have a better prognosis with an earlier response (ER) than those who have a late response or pseudoprogression (LR/PsP). Results of the Gliocat study. Ann Oncol 27:338.

  18. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR (2004) A census of human cancer genes. Nat Rev Cancer 4(3):177–183

  19. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Edkins S et al (2007) Patterns of somatic mutation in human cancer genomes. Nature 446(7132):153–158

  20. Haber DA, Settleman J (2007) Cancer—drivers and passengers. Nature 446(7132):145–146

  21. Hofree M, Shen JP, Carter H, Gross A, Ideker T (2013) Network-based stratification of tumor mutations. Nat Methods 10(11):1108–1115

  22. Hou JP, Ma J (2014) DawnRank: discovering personalized driver genes in cancer. Genome Med 6:56.

  23. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57

  24. Hudson TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabe RR, Bhan MK, Calvo F, Eerola I, Gerhard DS et al (2010) International network of cancer genome projects. Nature 464(7291):993–998

  25. Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang ZM, Welch R, Hutchinson A et al (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39(7):870–874

  26. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M, Sauter G, Monni O, Elkahloun A et al (2002) Impact of DNA amplification on gene expression patterns in breast cancer. Can Res 62(21):6240–6245

  27. Inthal A, Zeitlhofer P, Zeginigg M, Morak M, Grausenburger R, Fronkova E, Fahrner B, Mann G, Haas OA, Panzer-Grümayer R (2012) CREBBP HAT domain mutations prevail in relapse cases of high hyperdiploid childhood acute lymphoblastic leukemia. Leukemia 26(8):1797–1803.

  28. Jia PL, Zhao ZM (2014) VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data. PLoS Computl Biol 10(2):e1003460

  29. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462

  30. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006

  31. Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Icml. pp 315–322.

  32. Kumar R, Neilsen PM, Crawford J, McKirdy R, Lee J, Powell JA, Saif Z, Martin JM, Lombaerts M, Cornelisse CJ et al (2005) FBXO31 is the chromosome 16q24.3 senescence gene, a candidate breast tumor suppressor, and a component of an SCF complex. Cancer Res 65(24):11304–1313.

  33. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA et al (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457):214–218

  34. Leiserson MD, Vandin F, Wu HT, Dobson JR, Raphael BR (2014) Pan-cancer identification of mutated pathways and protein complexes. Cancer Res 74(19):5324.

  35. Liao B, Jiang Y, Liang W, Zhy W, Cai L, Cao Z (2014) Gene selection using locality sensitive laplacian score. IEEE/ACM Trans Comput Biol Bioinform 11(6):1146–1156.

  36. Liu JL, Liu TJ, Aldape KD, Mao ZY, LaFortune TA, Yung WKA (2006) Nuclear PTEN as a potential therapeutic molecule in GBM. Neuro-Oncology 8(4):398–399

  37. Lu X, Li X, Liu P, Qian X, Miao Q, Peng S (2018) The integrative method based on the module-network for identifying driver genes in cancer subtypes. Molecules 23(2):183

  38. Lu X, Qian X, Li X, Miao Q, Peng S (2019) DMCM: a data-adaptive mutation clustering method to identify cancer-related mutation clusters. Bioinformatics 35(3):389–397.

  39. Mansour WY, Tennstedt P, Volquardsen J, Oing C, Kluth M, Hube-Magg C, Borgmann K, Simon R, Petersen C, Dikomey E et al (2018) Loss of PTEN-assisted G2/M checkpoint impedes homologous recombination repair and enhances radio-curability and PARP inhibitor treatment response in prostate cancer. Sci Rep 8:3947.

  40. Mearini L (2017) Frequency and prognostic value of PTEN loss in patients with upper tract urothelial carcinoma treated with radical nephroureterectomy EDITORIAL COMMENT. J Urol 198(6):1277–1278

  41. Network CGAR (2012) Comprehensive genomic characterization of squamous cell lung cancers The Cancer Genome Atlas Research Network (vol 489, pg 519, 2012). Nature 491(7423):288–288

  42. Ng S, Collisson EA, Sokolov A, Goldstein T, Gonzalez-Perez A, Lopez-Bigas N, Benz C, Haussler D, Stuart JM (2012) PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis. Bioinformatics 28(18):I640–I646

  43. Page K, Wiszniewska J, Basehore M, Watral M, Eng C, Gururangan S (2007) Rhabdomyosarcoma (RMS) of extremity and cerebral glioblastoma multiforme (GBM) in a child with Li-fraumeni syndrome and germline TP53 splice mutation. Neuro-Oncology 9(4):544–544

  44. Pirooznia M, Goes FS, Zandi PP (2015) Whole-genome CNV analysis: advances in computational approaches. Front Genet 6:138.

  45. Qiao N, Huang Y, Naveed H, Green CD, Han JDJ (2013) CoCiter: an efficient tool to infer gene function by assessing the significance of literature co-citation. PloS ONE 8(9):e74074.

  46. Ramadoss A, Leu S, Ritz MF, Schaefer T, Tintignac L, Tostado C, Frank S, Mariani L, Boulay JL (2016) Act locally: the 3q26 genes SOX2, PIK3CA, MFN1 and OPA1 co-regulate GBM cell invasion. Neuro-Oncology 18:74–74

  47. Raphael BJ, Dobson JR, Oesper L, Vandin F (2014) Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med 6:5

  48. Rozenchan PB, Mundim FG, Roela RA, Katayama ML, Pasini FS, Brentani H, Lyra EC, Folgueira MAAK, Brentani MM (2014) RHOA, RAC1 and PAK1 evaluation in paired stromal fibroblasts of breast cancer primary and of lymph node metastasis: Importance of these biomarkers in lymph node invasion. Cancer Res 74(19).

  49. Santra MK, Wajapeyee N, Green MR (2009) F-box protein FBXO31 mediates cyclin D1 degradation to induce G1 arrest after DNA damage. Nature 459(7247):722–725.

  50. Shi K, Gao L, Wang BB (2016) Discovering potential cancer driver genes by an integrated network-based approach. Mol BioSyst 12(9):2921–2931

  51. Stratton MR, Campbell PJ, Futreal PA (2009) The cancer genome. Nature 458(7239):719–724

  52. Suo C, Hrydziuszko O, Lee D, Pramana S, Saputra D, Joshi H, Calza S, Pawitan Y (2015) Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival. Bioinformatics 31(16):2607–2613

  53. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39(Database issue):561–568.

  54. Vandin F, Upfal E, Raphael BJ (2011) Algorithms for detecting significantly mutated pathways in cancer. J Comput Biol 18(3):507–522

  55. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou SB, Diaz LA, Kinzler KW (2013) Cancer genome landscapes. Science 339(6127):1546–1558

  56. Wei PJ, Zhang D, Xia JF, Zheng CH (2016) LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network. Bmc Bioinf 2016, 17:467.

  57. Wu LL, Wang YZ, Liu Y, Yu SY, Xie H, Shi XJ, Qin S, Ma F, Tan TZ, Thiery JP et al (2014) A central role for TRPS1 in the control of cell cycle and cancer development. Oncotarget 5(17):7677–7690

  58. Xi JN, Wang MH, Li A (2017) Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. Mol BioSyst 13(10):2135–2144

  59. Xiao Q, Luo JW, Liang C, Cai J, Ding PJ (2018) A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 34(2):239–248

  60. Yi SH, Park JHY (2004) Down-regulation of ErbB2 and ErbB3 levels by curcumin in MCF-7 human breast cancer cells. Faseb J 18(4):A126–A126

  61. Youn A, Simon R (2011) Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics 27(2):175–181

  62. Zhang W, Wang S (2017) An integrated framework for identifying mutated driver pathway and cancer progression. IEEE/ACM Trans Comput Biol Bioinf 1–1.

  63. Zhang W, Wang SL (2018) An efficient strategy for identifying cancer-related key genes based on graph entropy. Comput Biol Chem 74:142–148

  64. Zhao JF, Zhang SH, Wu LY, Zhang XS (2012) Efficient methods for identifying mutated driver pathways in cancer. Bioinformatics 28(22):2940–2947

  65. Zheng CH, Zhang L, Ng VTY, Shiu SCK, Huang DS (2011) Molecular pattern discovery based on penalized matrix decomposition. Ieee Acm T Comput Bi 8(6):1592–1603

Download references


This work is supported by the National Natural Science Foundation of China (Grant Nos. 61672011, 61472467 and 61471169), and the Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province.

Author information

Correspondence to Shu-Lin Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Wang, S. A Novel Method for Identifying the Potential Cancer Driver Genes Based on Molecular Data Integration. Biochem Genet 58, 16–39 (2020). https://doi.org/10.1007/s10528-019-09924-2

Download citation


  • Driver genes
  • DNA copy numbers variation data
  • Somatic mutation data
  • Gene expression data
  • Diffusion kernel