DEGnet: Identifying Differentially Expressed Genes Using Deep Neural Network from RNA-Seq Datasets

  • Tulika Kakati
  • Dhruba K. BhattacharyyaEmail author
  • Jugal K. Kalita
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11942)


Differential expression (DE) analysis and identification of differentially expressed genes (DEGs) provide insights for discovery of therapeutic drugs and underlying mechanisms of disease. Statistical methods, such as DESeq2, edgeR, and limma-voom produce a number of false positives and false negatives and fail to differentiate between the DEGs as up-regulating (UR) and down-regulating (DR) genes linking them to disease progression. Machine learning (ML) including deep learning (DL) methods to identify DEGs from RNA-seq data face challenges due to smaller sample sizes (n) compared to number of genes (g). In this work, we propose a deep neural network (DNN) called DEGnet to predict the UR and DR genes from Parkinson’s disease (PD) and breast cancer (BRCA) RNA-seq datasets. The accuracies we obtained from PD and BRCA were 100% and 87.5% respectively, higher than ML-based methods on the same datasets. However, to the best of our knowledge, we are the first to apply DNN on for classification of DEGs into UR and DR, and identify significant UR and DR genes that play role in progression of a disease. Experimental results show that DEGnet is a good performer and can be applied in other RNA-seq data, despite the n \(<<\) g issue.


Deep neural network RNA-seq Parkinson’s disease Breast cancer 


  1. 1.
    Dembélé, D., Kastner, P.: Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinform. 15(1), 14 (2014)CrossRefGoogle Scholar
  2. 2.
    Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)CrossRefGoogle Scholar
  3. 3.
    Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12), 550 (2014)CrossRefGoogle Scholar
  4. 4.
    Robinson, M.D., McCarthy, D.J., Smyth, G.K.: edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010)CrossRefGoogle Scholar
  5. 5.
    Law, C.W., Chen, Y., Shi, W., Smyth, G.K.: VOOM: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15(2), R29 (2014)CrossRefGoogle Scholar
  6. 6.
    Wang, L., Xi, Y., Sung, S., Qiao, H.: RNA-seq assistant: machine learning based methods to identify more transcriptional regulated genes. BMC Genom. 19(1), 546 (2018)CrossRefGoogle Scholar
  7. 7.
    Sekhon, A., Singh, R., Qi, Y.: DeepDiff: deep-learning for predicting differential gene expression from histone modifications. Bioinformatics 34(17), i891–i900 (2018)CrossRefGoogle Scholar
  8. 8.
    Kong, Y., Yu, T.: A deep neural network model using random forest to extract feature representation for gene expression data classification. Sci. Rep. 8(1), 16477 (2018)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kleinbaum, D.G., Klein, M.: Logistic Regression. Springer, New York (2002). Scholar
  10. 10.
    Sarkar, M., Leong, T.-Y.: Application of k-nearest neighbors algorithm on breast cancer diagnosis problem. In: Proceedings of the AMIA Symposium, p. 759. American Medical Informatics Association (2000)Google Scholar
  11. 11.
    Polat, K., Güneş, S.: Breast cancer diagnosis using least square support vector machine. Digit. Signal Proc. 17(4), 694–701 (2007)CrossRefGoogle Scholar
  12. 12.
    Soria, D., Garibaldi, J.M., Ambrogi, F., Biganzoli, E.M., Ellis, I.O.: A ‘non-parametric’ version of the naive Bayes classifier. Knowl.-Based Syst. 24(6), 775–784 (2011)CrossRefGoogle Scholar
  13. 13.
    Singireddy, S., Alkhateeb, A., Rezaeian, I., Rueda, L., Cavallo-Medved, D., Porter, L.: Identifying differentially expressed transcripts associated with prostate cancer progression using RNA-seq and machine learning techniques. In: 2015 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–5. IEEE (2015)Google Scholar
  14. 14.
    Liaw, A., Wiener, M., et al.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)Google Scholar
  15. 15.
    Chen, J., Bardes, E.E., Aronow, B.J., Jegga, A.G.: ToppGene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37(suppl\(\_\)2), W305–W311 (2009)CrossRefGoogle Scholar
  16. 16.
    Tomczak, K., Czerwińska, P., Wiznerowicz, M.: The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 19(1A), A68 (2015)Google Scholar
  17. 17.
    Santen, R.J., et al.: The role of mitogen-activated protein (MAP) kinase in breast cancer. J. Steroid Biochem. Mol. Biol. 80(2), 239–256 (2002)CrossRefGoogle Scholar
  18. 18.
    Kim, J.-M., et al.: Identification of genes related to Parkinson’s disease using expressed sequence tags. DNA Res. 13(6), 275–286 (2006)CrossRefGoogle Scholar
  19. 19.
    Zucchi, I., et al.: Gene expression profiles of epithelial cells microscopically isolated from a breast-invasive ductal carcinoma and a nodal metastasis. Proc. Natl. Acad. Sci. 101(52), 18147–18152 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Tulika Kakati
    • 1
  • Dhruba K. Bhattacharyya
    • 1
    Email author
  • Jugal K. Kalita
    • 2
  1. 1.Department of Computer Science and EngineeringTezpur UniversityTezpurIndia
  2. 2.Department of Computer ScienceUniversity of ColoradoColorado SpringsUSA

Personalised recommendations