Advertisement

Using Deep Learning to Classify Class Imbalanced Gene-Expression Microarrays Datasets

  • A. Reyes-NavaEmail author
  • H. Cruz-Reyes
  • R. Alejo
  • E. Rendón-Lara
  • A. A. Flores-Fuentes
  • E. E. Granda-Gutiérrez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11401)

Abstract

Performance of deep learning neural networks to classify class imbalanced gene-expression microarrays datasets is studied in this work. The low number of samples and high dimensionality of this type of datasets represent a challenging situation. Three sampling methods which have shown favorable results to deal with the class imbalance problem were used, namely: Random Over-Sampling (ROS), Random Under-Sampling (RUS) and Synthetic Minority Oversampling Technique (SMOTE). Moreover, artificial noise and greater class imbalance were included in the datasets in order to analyze these situations in the context of classification of gene-expression microarrays datasets. Results show that the noise or separability of the dataset is more determinant than its dimensionality in the classifier performance.

Keywords

Gene-expression microarrays Deep neural networks Class-imbalance 

References

  1. 1.
    Abdel-Zaher, A.M., Eldeib, A.M.: Breast cancer classification using deep belief networks. Expert Syst. Appl. 46, 139–144 (2016)CrossRefGoogle Scholar
  2. 2.
    Alejo, R., Monroy-de Jesús, J., Ambriz-Polo, J.C., Pacheco-Sánchez, J.H.: An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem. Neural Comput. Appl. 28(10), 2843–2857 (2017).  https://doi.org/10.1007/s00521-017-2938-3CrossRefGoogle Scholar
  3. 3.
    Alejo, R., Monroy-de Jesús, J., Pacheco-Sánchez, J., López-González, E., Antonio-Velázquez, J.: A selective dynamic sampling back-propagation approach for handling the two-class imbalance problem. Appl. Sci. 6(7), 200 (2016).  https://doi.org/10.3390/app6070200CrossRefGoogle Scholar
  4. 4.
    Blagus, R., Lusa, L.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013).  https://doi.org/10.1186/1471-2105-14-106CrossRefGoogle Scholar
  5. 5.
    Chen, D., Qian, G., Shi, C., Pan, Q.: Breast cancer malignancy prediction using incremental combination of multiple recurrent neural network. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10635, pp. 43–52. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-70096-0_5CrossRefGoogle Scholar
  6. 6.
    Cleofas-Sánchez, L., Sánchez, J.S., García, V.: Gene selection and disease prediction from gene expression data using a two-stage hetero-associative memory. Progress Artif. Intell. (2018).  https://doi.org/10.1007/s13748-018-0148-6
  7. 7.
    Danaee, P., Reza, G., Hendrix, D.A.: A deep learning approach for cancer detection and relevant gene identification. In: Pacific Symposium on Biocomputing, Honolulu, pp. 219–229 (2016)Google Scholar
  8. 8.
    Dittman, D., Khoshgoftaar, T., Wald, R., Napolitano, A.: Comparison of data sampling approaches for imbalanced bioinformatics data. In: Proceedings of the 27th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2014, pp. 268–271 (2014)Google Scholar
  9. 9.
    Dong, Q., Gong, S., Zhu, X.: Imbalanced deep learning by minority class incremental rectification. CoRR abs/1804.10851 (2018)Google Scholar
  10. 10.
    Dwivedi, A.K.: Artificial neural network model for effective cancer classification using microarray gene expression data. Neural Comput. Appl. 29(12), 1545–1554 (2018).  https://doi.org/10.1007/s00521-016-2701-1CrossRefGoogle Scholar
  11. 11.
    Fernandez, A., Garcia, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Geman, O., Chiuchisan, I., Covasa, M., Doloc, C., Milici, M.-R., Milici, L.-D.: Deep learning tools for human microbiome Big Data. In: Balas, V.E., Jain, L.C., Balas, M.M. (eds.) SOFA 2016. AISC, vol. 633, pp. 265–275. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-62521-8_21CrossRefGoogle Scholar
  13. 13.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  14. 14.
    Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., Lew, M.S.: Deep learning for visual understanding: a review. Neurocomputing 187, 27–48 (2016)CrossRefGoogle Scholar
  15. 15.
    Hanson, J., Yang, Y., Paliwal, K., Zhou, Y.: Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks. Bioinformatics 33, 685–692 (2016)Google Scholar
  16. 16.
    He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009).  https://doi.org/10.1109/TKDE.2008.239CrossRefGoogle Scholar
  17. 17.
    Hira, Z., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 1–13 (2015)CrossRefGoogle Scholar
  18. 18.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)CrossRefGoogle Scholar
  19. 19.
    López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013).  https://doi.org/10.1016/j.ins.2013.07.007CrossRefGoogle Scholar
  20. 20.
    Maqlin, P., Thamburaj, R., Mammen, J.J., Manipadam, M.T.: Automated nuclear pleomorphism scoring in breast cancer histopathology images using deep neural networks. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 269–276. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-26832-3_26CrossRefGoogle Scholar
  21. 21.
    Reyes-Nava, A., Sánchez, J.S., Alejo, R., Flores-Fuentes, A.A., Rendón-Lara, E.: Performance analysis of deep neural networks for classification of gene-expression microarrays. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A., Sarkar, S. (eds.) MCPR 2018. LNCS, vol. 10880, pp. 105–115. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-92198-3_11CrossRefGoogle Scholar
  22. 22.
    Ruder, S.: An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016)Google Scholar
  23. 23.
    Salaken, S.M., Khosravi, A., Khatami, A., Nahavandi, S., Hosen, M.A.: Lung cancer classification using deep learned features on low population dataset. In: IEEE 30th Canadian Conference on Electrical and Computer Engineering, Windsor, pp. 1–5 (2017)Google Scholar
  24. 24.
    Zaharia, M., et al.: Apache spark: a unified engine for Big Data processing. Commun. ACM 59(11), 56–65 (2016).  https://doi.org/10.1145/2934664CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • A. Reyes-Nava
    • 1
    Email author
  • H. Cruz-Reyes
    • 2
  • R. Alejo
    • 2
  • E. Rendón-Lara
    • 2
  • A. A. Flores-Fuentes
    • 1
  • E. E. Granda-Gutiérrez
    • 1
  1. 1.UAEM University Center at AtlacomulcoUniversidad Autónoma del Estado de MéxicoAtlacomulcoMexico
  2. 2.Division of Postgraduate Studies and ResearchNational Institute of Technology of Mexico (TecNM)MetepecMexico

Personalised recommendations