Soft Computing

, Volume 23, Issue 24, pp 13409–13421 | Cite as

Novel machine learning approach for classification of high-dimensional microarray data

  • Rabia Aziz MusheerEmail author
  • C. K. Verma
  • Namita Srivastava
Methodologies and Application


Independent component analysis (ICA) is a powerful concept for reducing the dimension of big data in many applications. It has been used for the feature extraction of microarray gene expression data in numerous works. One of the merits of ICA is that a number of extracted features are always equal to the number of samples. When ICA is applied to microarray data, whenever, it faces the challenges of how to find the best subset of genes (features) from extracted features. To resolve this problem, in this paper, we propose a new (artificial bee colony) ABC-based feature selection approach for microarray data. Our approach is based on two stages: ICA-based extraction approach to reduce the size of data and ABC-based wrapper approach to optimize the reduced feature vectors. To validate our proposed approach, extensive experiments were conducted to compare the performance of ICA + ABC with the results obtained from recently published and other previously suggested methods of gene selection for Naïve Bayes (NB) classifier. To compare the performance of the proposed approach with other algorithms, a statistical hypothesis test was employed with six benchmark cancer classification datasets of the microarray. The experimental result shows that the proposed approach demonstrates an improvement over all the algorithms for NB classifier with a certain level of significance.


Independent component analysis (ICA) Artificial bee colony (ABC) Naïve Bayes (NB) Cancer classification 


Compliance with ethical standards

Conflict of interest

Rabia Aziz, C. K. Verma, Namita Srivastava declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Abo-Hammour Z, Abu Arqub O, Mohammad Momani S, Shawagfeh N (2014) Optimization solution of Troesch’s and Bratu’s problems of ordinary type using novel continuous genetic algorithm. Discrete Dyn Nat Soc 2014.
  2. Abu-Mouti FS, El-Hawary ME (2012) Overview of artificial bee colony (ABC) algorithm and its applications. In: Systems conference (SysCon), 2012 IEEE international. IEEE, pp 1–6Google Scholar
  3. Ahmadi MA (2011) Prediction of asphaltene precipitation using artificial neural network optimized by imperialist competitive algorithm. J Pet Explor Prod Technol 1(2–4):99–106CrossRefGoogle Scholar
  4. Ahmadi MA (2015a) Connectionist approach estimates gas–oil relative permeability in petroleum reservoirs: application to reservoir simulation. Fuel 140:429–439CrossRefGoogle Scholar
  5. Ahmadi MA (2015b) Developing a robust surrogate model of chemical flooding based on the artificial neural network for enhanced oil recovery implications. Math Probl EngGoogle Scholar
  6. Ahmadi MA (2016) Toward reliable model for prediction drilling fluid density at wellbore conditions: a LSSVM model. Neurocomputing 211:143–149CrossRefGoogle Scholar
  7. Ahmadi M-A, Bahadori A (2015) A LSSVM approach for determining well placement and conning phenomena in horizontal wells. Fuel 153:276–283CrossRefGoogle Scholar
  8. Ahmadi MA, Bahadori A (2016) Prediction performance of natural gas dehydration units for water removal efficiency using a least-square support vector machine. Int J Ambient Energy 37(5):486–494CrossRefGoogle Scholar
  9. Ahmadi MA, Ebadi M (2014) Evolving smart approach for determination dew point pressure through condensate gas reservoirs. Fuel 117:1074–1084CrossRefGoogle Scholar
  10. Ahmadi MA, Golshadi M (2012) Neural network based swarm concept for prediction asphaltene precipitation due to natural depletion. J Pet Sci Eng 98:40–49CrossRefGoogle Scholar
  11. Ahmadi MA, Mahmoudi B (2016) Development of robust model to estimate gas–oil interfacial tension using least square support vector machine: experimental and modeling study. J Supercrit Fluids 107:122–128CrossRefGoogle Scholar
  12. Ahmadi MA, Shadizadeh SR (2012) New approach for prediction of asphaltene precipitation due to natural depletion by using evolutionary algorithm concept. Fuel 102:716–723CrossRefGoogle Scholar
  13. Ahmadi M-A, Ahmadi MR, Hosseini SM, Ebadi M (2014a) Connectionist model predicts the porosity and permeability of petroleum reservoirs by means of petro-physical logs: application of artificial intelligence. J Pet Sci Eng 123:183–200CrossRefGoogle Scholar
  14. Ahmadi MA, Ebadi M, Hosseini SM (2014b) Prediction breakthrough time of water coning in the fractured reservoirs by implementing low parameter support vector machine approach. Fuel 117:579–589CrossRefGoogle Scholar
  15. Ahmadi MA, Ebadi M, Marghmaleki PS, Fouladi MM (2014c) Evolving predictive model to determine condensate-to-gas ratio in retrograded condensate gas reservoirs. Fuel 124:241–257CrossRefGoogle Scholar
  16. Ahmadi MA, Ebadi M, Yazdanpanah A (2014d) Robust intelligent tool for estimating dew point pressure in retrograded condensate gas reservoirs: application of particle swarm optimization. J Pet Sci Eng 123:7–19CrossRefGoogle Scholar
  17. Ahmadi MA, Masoumi M, Askarinezhad R (2014e) Evolving connectionist model to monitor the efficiency of an in situ combustion process: application to heavy oil recovery. Energy Technol 2(9–10):811–818CrossRefGoogle Scholar
  18. Ahmadi MA, Masumi M, Kharrat R, Mohammadi AH (2014f) Gas analysis by in situ combustion in heavy-oil recovery process: experimental and modeling studies. Chem Eng Technol 37(3):409–418CrossRefGoogle Scholar
  19. Ahmadi MA, Soleimani R, Bahadori A (2014g) A computational intelligence scheme for prediction equilibrium water dew point of natural gas in TEG dehydration systems. Fuel 137:145–154CrossRefGoogle Scholar
  20. Ahmadi M-A, Bahadori A, Shadizadeh SR (2015a) A rigorous model to predict the amount of dissolved calcium carbonate concentration throughout oil field brines: side effect of pressure and temperature. Fuel 139:154–159CrossRefGoogle Scholar
  21. Ahmadi M-A, Pouladi B, Javvi Y, Alfkhani S, Soleimani R (2015b) Connectionist technique estimates H2S solubility in ionic liquids through a low parameter approach. J Supercrit Fluids 97:81–87CrossRefGoogle Scholar
  22. Ahmadi M, Hasanvand MZ, Bahadori A (2015c) A LSSVM approach to predict temperature drop accompanying a given pressure drop for the natural gas production and processing systems. Int J Ambient Energy 38:122–129CrossRefGoogle Scholar
  23. Ahmadi MA, Ebadi M, Samadi A, Siuki MZ (2015d) Phase equilibrium modeling of clathrate hydrates of carbon dioxide + 1,4-dioxine using intelligent approaches. J Dispers Sci Technol 36(2):236–244CrossRefGoogle Scholar
  24. Ahmadi MA, Lee M, Bahadori A (2015e) Prediction of a solid desiccant dehydrator performance using least squares support vector machines algorithm. J Taiwan Inst Chem Eng 50:115–122CrossRefGoogle Scholar
  25. Ahmadi MA, Masoumi M, Askarinezhad R (2015f) Evolving smart model to predict the combustion front velocity for in situ combustion. Energy Technol 3(2):128–135CrossRefGoogle Scholar
  26. Ahmadi MA, Zahedzadeh M, Shadizadeh SR, Abbassi R (2015g) Connectionist model for predicting minimum gas miscibility pressure: application to gas injection process. Fuel 148:202–211CrossRefGoogle Scholar
  27. Ahmadi MH, Ahmadi MA, Sadatsakkak SA, Feidt M (2015h) Connectionist intelligent model estimates output power and torque of stirling engine. Renew Sustain Energy Rev 50:871–883CrossRefGoogle Scholar
  28. Akay B, Karaboga D (2009) Parameter tuning for the artificial bee colony algorithm. In: International conference on computational collective intelligence. Springer, pp 608–619Google Scholar
  29. Ali Ahmadi M, Ahmadi A (2016) Applying a sophisticated approach to predict CO2 solubility in brines: application to CO2 sequestration. Int J Low-Carbon Technol 11(3):325–332CrossRefGoogle Scholar
  30. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRefGoogle Scholar
  31. Alshamlan H, Badr G, Alohali Y (2015a) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. BioMed Res Int 2015.
  32. Alshamlan HM, Badr GH, Alohali YA (2015b) Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60CrossRefGoogle Scholar
  33. Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41–47CrossRefGoogle Scholar
  34. Arqub OA, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415MathSciNetzbMATHCrossRefGoogle Scholar
  35. Aziz R, Verma C, Srivastava N (2016) A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genomics Data 8:4–15CrossRefGoogle Scholar
  36. Baghban A, Ahmadi MA, Pouladi B, Amanna B (2015) Phase equilibrium modeling of semi-clathrate hydrates of seven commonly gases in the presence of TBAB ionic liquid promoter based on a low parameter connectionist technique. J Supercrit Fluids 101:184–192CrossRefGoogle Scholar
  37. Chen J, Huang H, Tian S, Qu Y (2009) Feature selection for text classification with Naïve Bayes. Expert Syst Appl 36(3):5432–5435CrossRefGoogle Scholar
  38. Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18CrossRefGoogle Scholar
  39. Fan L, Poh K-L, Zhou P (2009) A sequential feature extraction approach for Naïve Bayes classification of microarray data. Expert Syst Appl 36(6):9919–9923CrossRefGoogle Scholar
  40. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Lear 29(2–3):131–163zbMATHCrossRefGoogle Scholar
  41. Garro BA, Rodríguez K, Vázquez RA (2016) Classification of DNA microarrays using artificial neural networks and ABC algorithm. Appl Soft Comput 38:548–560CrossRefGoogle Scholar
  42. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRefGoogle Scholar
  43. Gordon GJ, Jensen RV, Hsiao L-L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967Google Scholar
  44. Hall M (2007) A decision tree-based attribute weighting filter for naive Bayes. Knowl-Based Syst 20(2):120–126CrossRefGoogle Scholar
  45. Hsu C-C, Chen M-C, Chen L-S (2010) Integrating independent component analysis and support vector machine for multivariate process monitoring. Comput Ind Eng 59(1):145–156CrossRefGoogle Scholar
  46. Huang C-L, Wang C-J (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240CrossRefGoogle Scholar
  47. Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, LondonCrossRefGoogle Scholar
  48. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical Report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department,Google Scholar
  49. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324zbMATHCrossRefGoogle Scholar
  50. Kong W, Vanderburg CR, Gunshin H, Rogers JT, Huang X (2008) A review of independent component analysis application to microarray gene expression data. Biotechniques 45(5):501CrossRefGoogle Scholar
  51. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, De Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(4):1106–1119CrossRefGoogle Scholar
  52. Lin S-W, Ying K-C, Chen S-C, Lee Z-J (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824CrossRefGoogle Scholar
  53. Nutt CL, Mani D, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63(7):1602–1607Google Scholar
  54. Rabia A, Namita S, Chandan KV (2015a) t-Independent component analysis For SVM classification of DNA-microarray data. Int J Bioinform Res 6(1):305–312Google Scholar
  55. Rabia A, Namita S, Chandan KV (2015b) A weighted-SNR feature selection from independent component subspace for NB classification of microarray data. Int J Adv Biotechnol Res 6(2):245–255Google Scholar
  56. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517CrossRefGoogle Scholar
  57. Sandberg R, Winberg G, Bränden C-I, Kaske A, Ernberg I, Cöster J (2001) Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res 11(8):1404–1409CrossRefGoogle Scholar
  58. Shafiei A, Ahmadi MA, Zaheri SH, Baghban A, Amirfakhrian A, Soleimani R (2014) Estimating hydrogen sulfide solubility in ionic liquids using a machine learning approach. J Supercrit Fluids 95:525–534CrossRefGoogle Scholar
  59. Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209CrossRefGoogle Scholar
  60. Song B, Zhang G, Zhu W, Liang Z (2014) ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography. Int J Comput Assist Radiol Surg 9(1):79–89CrossRefGoogle Scholar
  61. Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123CrossRefGoogle Scholar
  62. Zar JH (1999) Biostatistical analysis. Pearson Education India, New DelhiGoogle Scholar
  63. Zhao W, Wang G, H-b Wang, H-l Chen, Dong H, Z-d Zhao (2011) A novel framework for gene selection. Int J Adv Comput Technol 3:184–191Google Scholar
  64. Zheng C-H, Huang D-S, Shang L (2006) Feature selection in independent component subspace for microarray data classification. Neurocomputing 69(16):2407–2410CrossRefGoogle Scholar
  65. Zheng C-H, Huang D-S, Kong X-Z, Zhao X-M (2008) Gene expression data classification using consensus independent component analysis. Genomics Proteomics Bioinform 6(2):74–82CrossRefGoogle Scholar
  66. Zibakhsh A, Abadeh MS (2013) Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function. Eng Appl Artif Intell 26(4):1274–1281CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of SASL (Mathematics)VIT University BhopalBhopalIndia
  2. 2.Department of Mathematics and Computer ApplicationMaulana Azad National Institute of TechnologyBhopalIndia

Personalised recommendations