Integrating Multi-scale Gene Features for Cancer Diagnosis

  • Peng Hang
  • Mengjun Shi
  • Quan Long
  • Hui Li
  • Haifeng ZhaoEmail author
  • Meng MaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10996)


Cancer is one of the major diseases that threaten human life. The advancement of high-throughput sequencing technology provides a way to accurately diagnose cancer and reveal the pathogenesis of cancer at the molecular level. In this study, we integrated the differentially expressed genes, and differential DNA methylation patterns, and applied multiple machine learning methods to conduct cancer diagnosis. The experimental results show that the performance of cancer diagnosis can be significantly improved with the integrated multi-scale gene features of RNA and epigenetic level. The AUC of classifier can be increased by 7.4% with multi-scale gene features compared to only differentially expressed genes, which verifies the effectiveness of the integration of multi-scale gene features for cancer diagnosis.


Cancer diagnosis Machine learning Gene expression  DNA methylation High-Throughput sequencing technology 



The project sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry (NO. 48, 2014-1685) and the Key Natural Science Project of Anhui Provincial Education Department (KJ2017A016).


  1. 1.
    Schuster, S.C.: Next-generation sequencing transforms today’s biology. J. Nat. Methods 5(1), 16–18 (2008)CrossRefGoogle Scholar
  2. 2.
    Zhou, X.G., Ren, L.F., Li, Y.T., et al.: The next-generation sequencing technology: a technology review and future perspective. J. Sci China Life Sci. 53(1), 44–57 (2010)CrossRefGoogle Scholar
  3. 3.
    Maglogiannis, I., Zafiropoulos, E., Anagnostopoulos, I.: An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. J. Appl. Intell. 30(1), 24–36 (2009)CrossRefGoogle Scholar
  4. 4.
    Chen, A.H., Huang, Z.-W.: A new multi-task learning technique to predict classification of leukemia and prostate cancer. In: Zhang, D., Sonka, M. (eds.) ICMB 2010. LNCS, vol. 6165, pp. 11–20. Springer, Heidelberg (2010). Scholar
  5. 5.
    Hijazi, H., Chan, C.: A classification framework applied to cancer gene expression profiles. J. Healthcare Eng. 4(4), 255–284 (2013)CrossRefGoogle Scholar
  6. 6.
    Nakkeeran, R., Victoire, T.A.A.: Hybrid approach of data mining techniques, PCA, EDM and SVM for cancer gene feature selection and classification. J. Eur. J. Sci. Res. 79, 638–652 (2012)Google Scholar
  7. 7.
    Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)CrossRefGoogle Scholar
  8. 8.
    Kuan, P.F., Wang, S., Zhou, X., Chu, H.: A statistical framework for Illumina DNA methylation arrays. J. Bioinform. 26, 2849–2855 (2010)CrossRefGoogle Scholar
  9. 9.
    Baylin, S.B., Ohm, J.E.: Epigenetic gene silencing in cancer - a mechanism for early oncogenic pathway addiction. J. Nat. Rev. Cancer 6, 107–116 (2006)CrossRefGoogle Scholar
  10. 10.
    Kulis, M., Esteller, M.: DNA methylation and cancer. J. Adv. Gene. 70, 27–56 (2010)Google Scholar
  11. 11.
    Wang, S.: Method to detect differentially methylated loci with case-control designs using Illumina arrays. J. Genet. Epidemiol. 35, 686–694 (2011)CrossRefGoogle Scholar
  12. 12.
    Robinson, M.D., McCarthy, D.J., Smyth, G.K.: EdgeR: a Bioconductor package for differential expression analysis of digital gene expression data. J. Bioinform. 26, 139–140 (2010)CrossRefGoogle Scholar
  13. 13.
    Wang, D., Yan, L., Hu, Q., et al.: IMA: an R package for high-throughput analysis of Illumina’s 450K Infinium methylation data. J. Bioinform. 28(5), 729–730 (2012)CrossRefGoogle Scholar
  14. 14.
    Ahn, S., Wang, T.: A powerful statistical method for identifying differentially methylated markers in complex diseases. J. Pac. Symp. Biocomput. 69–79 (2013). NIH Public AccessGoogle Scholar
  15. 15.
    Huang, H., Chen, Z., Huang, X.: Age-adjusted nonparametric detection of differential DNA methylation with case-control designs. J. BMC Bioinform. 14, 86–94 (2013)CrossRefGoogle Scholar
  16. 16.
    Zhang, Y., Zhang, J., Shang, J.: Quantitative identification of differentially methylated loci based on relative entropy for matched case-control data. J. Epigenomics 5, 631–643 (2013)CrossRefGoogle Scholar
  17. 17.
    Jaffe, A.E., Murakami, P., Lee, H., et al.: Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. J. Int. J. Epidemiol. 41(1), 200–209 (2012)CrossRefGoogle Scholar
  18. 18.
    Sofer, T., Schifano, E.D., Hoppin, J.A., et al.: A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure. J. Bioinform. 29(22), 2884–2891 (2013)CrossRefGoogle Scholar
  19. 19.
    Ong, M.L., Holbrook, J.D.: Novel region discovery method for Infinium 450K DNA methylation data reveals changes associated with aging in muscle and neuronal pathways. J. Aging Cell. 13(1), 142–155 (2014)CrossRefGoogle Scholar
  20. 20.
    Wang, Y., Teschendorff, A.E., Widschwendter, M., Wang, S.: Accounting for differential variability in detecting differentially methylated regions. J. Brief. Bioinform. (2017). bbx097Google Scholar
  21. 21.
    Du, P., Zhang, X., et al.: Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. J. BMC Bioinform. 11, 587–596 (2010)CrossRefGoogle Scholar
  22. 22.
    The Cancer Genome Atlas Research Network., Weinstein, J.N., et al.: The cancer genome atlas Pan-Cancer analysis project. J. Nat. Genet. 45(10), 1113–1120 (2013)Google Scholar
  23. 23.
    Ge, S., Xia, X., Ding, C., et al.: A proteomic landscape of diffuse-type gastric cancer. J. Nat. Commun. 9(1), 1012–1028 (2018)CrossRefGoogle Scholar
  24. 24.
    Mertins, P., Mani, D.R., Ruggles, K.V., et al.: Proteogenomics connects somatic mutations to signalling in breast cancer. J. Nature 534, 55–62 (2016)CrossRefGoogle Scholar
  25. 25.
    Zhang, H., Liu, T., Zhang, Z., et al.: Integrated proteogenomic characterization of human high-grade serous ovarian cancer. J. Cell. 166(3), 755–765 (2016)CrossRefGoogle Scholar
  26. 26.
    Zhang, B., Wang, J., Wang, X., et al.: Proteogenomic characterization of human colon and rectal cancer. J. Nature 513, 382–403 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Key Lab of Intelligent Computing and Signal Processing of MOE and School of Computer and TechnologyAnhui UniversityHefeiPeople’s Republic of China
  2. 2.Icahn School of Medicine at Mount SinaiNew YorkUSA
  3. 3.Departments of Biochemistry & Molecular Biology, Medical Genetics, and Mathematics & Statistics, Alberta Children’s Hospital Research Institute and O’Brien Institute for Public HealthUniversity of CalgaryCalgaryCanada

Personalised recommendations