Skip to main content

Advanced Machine Learning Models for Large Scale Gene Expression Analysis in Cancer Classification: Deep Learning Versus Classical Models

  • Conference paper
  • First Online:
  • 1230 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Abstract

Analysis of large gene expression datasets for cancer classification is a crucial task in bioinformatics and a very challenging one as well. In this paper, we explore the potential of using advanced models in machine learning namely those based on deep learning to handle such task. For this purpose we propose a deep feed forward neural network architecture. In addition, we also investigate other classical yet very popular machine learning classifiers namely, support vector machine, naive bayes, k-nearest neighbours and shallow neural networks. The main objective is to appreciate the extent to which they are able to deal with the increasing size of these datasets. We conducted our experimental study using a high-performance computing platform with 32 compute nodes, each consisting of two Intel (R) Xeon (R) CPU E5-2650 2.00 GHz processors. Each processor is made up of 8 cores. Five data sets available at the omnibus library have been used to test the five models . Experimental results show the effectiveness of deep learning and its ability to deal with large scale data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bumgarner, R.: Overview of DNA microarrays: types, applications, and their future. Curr. Protoc. Mol. Biol. 22.1.1–22.1.11 (2013)

    Google Scholar 

  2. Zhang, X., Zhou, X., Wang, X.: Basics for bioinformatics. In: Jiang, R., Zhang, X., Zhang, M.Q. (eds.) Basics of Bioinformatics, pp. 1–25. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38951-1_1

    Chapter  Google Scholar 

  3. Xu, Y., Cui, J., Puett, D.: Omic data, information derivable and computational needs. In: Xu, Y., Cui, J., Puett, D. (eds.) Cancer Bioinformatics, pp. 41–63. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-1381-7_2

    Chapter  Google Scholar 

  4. Harrington, C.A., Rosenow, C., Retief, J.: Monitoring gene expression using dna microarrays. Curr. Opin. Microbiol. 3(3), 285–291 (2000)

    Article  Google Scholar 

  5. Bhola, A., Tiwari, A.: Machine learning based approaches for cancer classification using gene expression data. Mach. Learn. Appl.: Int. J. 2, 01–12 (2015)

    Article  Google Scholar 

  6. Kriti, Virmani, J., Agarwal, R.: Evaluating the efficacy of gabor features in the discrimination of breast density patterns using various classifiers. In: Dey, N., Ashour, A., Borra, S. (eds.) Classification in BioApps, LNCVB, vol. 26, pp. 105–131. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-65981-7_5

  7. Kubat, M.: Similarities: nearest-neighbor classifiers. An Introduction to Machine Learning, pp. 43–64. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20010-1_3

    Chapter  MATH  Google Scholar 

  8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  9. Cleophas, T.J., Zwinderman, A.H.: Support vector machines. In: Cleophas, T.J., Zwinderman, A.H. (eds.) Machine Learning in Medicine, pp. 155–161. Springer, Dordrecht (2013). https://doi.org/10.1007/978-94-007-6886-4_15

    Chapter  Google Scholar 

  10. Vanitha, C.D.A., Devaraj, D., Venkatesulu, M.: Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput. Sci. 47(Supplement C), 13–21 (2015). Graph Algorithms, High Performance Implementations and Its Applications (ICGHIA 2014)

    Article  Google Scholar 

  11. Kubat, M.: Inter-class boundaries: linear and polynomial classifiers. An Introduction to Machine Learning, pp. 65–90. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20010-1_4

    Chapter  MATH  Google Scholar 

  12. Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York (2014)

    Book  Google Scholar 

  13. An, Y., Sun, S., Wang, S.: Naive Bayes classifiers for music emotion classification based on lyrics. In: 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), pp. 635–638, May 2017

    Google Scholar 

  14. McCallum, A., Nigam, K., et al.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, vol. 752, pp. 41–48 (1998)

    Google Scholar 

  15. Sharmila, A., Geethanjali, P.: Dwt based detection of epileptic seizure from EEG signals using naive bayes and k-NN classifiers. IEEE Access 4, 7716–7727 (2016)

    Article  Google Scholar 

  16. Karthick, G., Harikumar, R.: Comparative performance analysis of Naive Bayes and SVM classifier for oral X-ray images. In: 2017 4th International Conference on Electronics and Communication Systems (ICECS), pp. 88–92, February 2017

    Google Scholar 

  17. Yann, L., Yoshua, B., Geoffrey, H.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  18. Min, S., Lee, B., Yoon, S.: Deep Learning in Bioinformatics. ArXiv e-prints, March 2016

    Google Scholar 

  19. Elleuch, M., Maalej, R., Kherallah, M.: A new design based-SVM of the CNN classifier architecture with dropout for offline arabic handwritten recognition. Procedia Comput. Sci. 80(C), 1712–1723 (2016)

    Article  Google Scholar 

  20. Wen, X., Fuhrman, S., Michaels, G.S., Carr, D.B., Smith, S., Barker, J.L., Somogyi, R.: Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. 95(1), 334–339 (1998)

    Article  Google Scholar 

  21. Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831–838 (2015)

    Article  Google Scholar 

  22. Michaels, G.S., Carr, D.B., Askenazi, M., Fuhrman, S., Wen, X., Somogyi, R.: Cluster analysis and data visualization of large-scale gene expression data. Pac. Symp. Biocomput. 3, 42–53 (1998)

    Google Scholar 

  23. Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)

    Article  Google Scholar 

  24. Li, L., Darden, T.A., Weingberg, C., Levine, A., Pedersen, L.G.: Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb. Chem. High Throughput Screen. 4(8), 727–739 (2001)

    Article  Google Scholar 

  25. Li, Y., Kang, K., Krahn, J.M., Croutwater, N., Lee, K., Umbach, D.M., Li, L.: A comprehensive genomic pan-cancer classification using the cancer genome atlas gene expression data. BMC Genomics 18(1), 508 (2017)

    Article  Google Scholar 

  26. Begum, S., Chakraborty, D., Sarkar, R.: Cancer classification from gene expression based microarray data using SVM ensemble. In: 2015 International Conference on Condition Assessment Techniques in Electrical Systems (CATCON), pp. 13–16, December 2015

    Google Scholar 

  27. Ang, J.C., Haron, H., Hamed, H.N.A.: Semi-supervised SVM-based feature selection for cancer classification using microarray gene expression data. In: Ali, M., Kwon, Y.S., Lee, C.-H., Kim, J., Kim, Y. (eds.) IEA/AIE 2015. LNCS (LNAI), vol. 9101, pp. 468–477. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19066-2_45

    Chapter  Google Scholar 

  28. Chen, H., Zhao, H., Shen, J., Zhou, R., Zhou, Q.: Supervised machine learning model for high dimensional gene data in colon cancer detection. In: 2015 IEEE International Congress on Big Data, pp. 134–141, June 2015

    Google Scholar 

  29. Urda, D., Montes-Torres, J., Moreno, F., Franco, L., Jerez, J.M.: Deep learning to analyze RNA-seq gene expression data. In: Rojas, I., Joya, G., Catala, A. (eds.) IWANN 2017. LNCS, vol. 10306, pp. 50–59. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59147-6_5

    Chapter  Google Scholar 

  30. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of the International Conference on Machine Learning (2013)

    Google Scholar 

  31. Bhat, R.R., Viswanath, V., Li, X.: Deepcancer: detecting cancer through gene expressions via deep generative learning. CoRR abs/1612.03211 (2016)

    Google Scholar 

  32. Danaee, P., Ghaeini, R., Hendrix, D.A.: A deep learning approach for cancer detection and relevent gene identification, pp. 219–229. World Scientific (2016)

    Google Scholar 

  33. Xiao, Y., Wu, J., Lin, Z., Zhao, X.: A deep learning-based multi-model ensemble method for cancer prediction. Comput. Methods Programs Biomed. 153, 1–9 (2018)

    Article  Google Scholar 

  34. Mills, K.I., Kohlmann, A., Williams, P.M., Wieczorek, L., Liu, W.M., Li, R., Wei, W., Bowen, D.T., Loeffler, H., Hernandez, J.M., Hofmann, W.K., Haferlach, T.: Microarray-based classifiers and prognosis models identify subgroups with distinct clinical outcomes and high risk of AML transformation of myelodysplastic syndrome. Blood 114(5), 1063–1072 (2009)

    Article  Google Scholar 

  35. Woodward, W.A., Krishnamurthy, S., Yamauchi, H., El-Zein, R., Ogura, D., Kitadai, E., Niwa, S.I., Cristofanilli, M., Vermeulen, P., Dirix, L., Viens, P., van Laere, S., Bertucci, F., Reuben, J.M., Ueno, N.T.: Genomic and expression analysis of microdissected inflammatory breast cancer. Breast Cancer Res. Treat. 138(3), 761–772 (2013)

    Article  Google Scholar 

  36. Fujiwara, T., Hiramatsu, M., Isagawa, T., Ninomiya, H., Inamura, K., Ishikawa, S., Ushijima, M., Matsuura, M., Jones, M.H., Shimane, M., Nomura, H., Ishikawa, Y., Aburatani, H.: ASCL1-coexpression profiling but not single gene expression profiling defines lung adenocarcinomas of neuroendocrine nature with poor prognosis. Lung Cancer 75(1), 119–125 (2012)

    Article  Google Scholar 

  37. Urquidi, V., Goodison, S., Cai, Y., Sun, Y., Rosser, C.J.: A candidate molecular biomarker panel for the detection of bladder cancer. Cancer Epidemiol. Prev. Biomark. 21(12), 2149–2158 (2012)

    Article  Google Scholar 

  38. Wojtas, B., Pfeifer, A., Oczko-Wojciechowska, M., Krajewska, J., Czarniecka, A., Kukulska, A., Eszlinger, M., Musholt, T., Stokowy, T., Swierniak, M., Stobiecka, E., Chmielik, E., Rusinek, D., Tyszkiewicz, T., Halczok, M., Hauptmann, S., Lange, D., Jarzab, M., Paschke, R., Jarzab, B.: Gene expression (mRNA) markers for differentiating between malignant and benign follicular thyroid tumours. Int. J. Mol. Sci. 18(6) (2017)

    Google Scholar 

Download references

Acknowledgement

We express our sincere gratitude to every one that help us to accomplish this work. This was granted access to the HPC ressources of UCI-UFMC ‘(Unité de Calcul Intensif)’ of the University FRERES MENTOURI CONSTANTINE1. This work has been supported by the national research project CNEPRU under-grant N:B*07120140037.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imene Zenbout .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zenbout, I., Meshoul, S. (2018). Advanced Machine Learning Models for Large Scale Gene Expression Analysis in Cancer Classification: Deep Learning Versus Classical Models. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96292-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96291-7

  • Online ISBN: 978-3-319-96292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics