Abstract
This paper presents a wrapper approach based on Strawberry Plant Algorithm (SPA) for gene selection in high dimension data classification problem by selecting the most relevant genes for each biological dataset. In order to perform an integrated exploration-exploitation approach to deal the near-optimal (small) gene subset problem obtained from high dimensional microarray data. First, a statistical filter is proposed for gene selection. After, a SPA is proposed to find the most informative genes from the previous gene selection, SPA is applied to explore and exploit new regions of this search and overall to overcome premature convergence. Empirical studies based in five public DNA-microarray datasets it is observed that our model gets the best performances using a smaller number of selected genes than other methods reported in the literature recently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, S., Aorigele, Kong, W., Zeng, W., Hong, X.: Hybrid binary imperialist competition algorithm and tabu search approach for feature selection using gene expression data. BioMed Res. Int. 2016, 12 (2016)
Alshamlan, H., Badr, G., Alohali1, Y.: mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. In: Hindawi Publishing Corporation BioMed Research International Volume (2015)
Chuang, L.-Y., Yang, C.-H., Li, J.-C., Yang, C.-H.: A hybrid BPSOCGA approach for gene selection and classification of microarray data. J. Comput. Biol. 19(1), 68–82 (2012)
Elyasigomari, V., Lee, D.A., Screen, H.R.C., Shaheed, M.H.: Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. J. Biomed. Inf. 67, 11–20 (2017)
Sharbaf, F.V., Mosafer, S., Moattar, M.H.: A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6), 231–238 (2016)
Lu, H., Chen, J., Yan, K., Jin, Q., Xue, Y., Gao, Z.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256, 56–62 (2017)
Apolloni, J., Leguizamón, G., Alba, E.: Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl. Soft Comput. 38, 922–932 (2016)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: Feature selection for high-dimensional data. Prog. Artif. Intell. 5(2), 18 (2016)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
Gordon, G.J., Jensen, R.V., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Can. Res. 17(62), 4963–4967 (2002)
Pomeroy, S.L., Tamayo, P., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., Ladd, C., Tamayo, P., Renshaw, A., D’Amico, A., Richie, J.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)
Dudoit, S., et al.: Comparison of discriminant methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 9, 77–87 (2002)
Tarek, S., Abd-Elwahab, R., Shoman, M.: Gene expression based cancer classification. Egypt. Inf. J. 18(3), 151–159 (2017)
Yang, X.-S.: Nature-Inspired Metaheuristic Algorithms. Luniver Press, Bristol (2011)
Salhi, A., Fraga, E.: Nature-inspired optimisation approaches and the new plant propagation algorithm. In: Proceedings of 2011 International Conference on Numerical Analysis and Optimization (ICeMATH 2011), pp. K2-1–K2-8 (2011)
Merrikh-Bayat, F.: A Numerical Optimization Algorithm Inspired by the Strawberry Plant. arXiv preprint arXiv:1407.7399, pp. 10–36 (2014)
Akyol, S., Alatas, B.: Plant intelligence based metaheuristic optimization algorithms. Artif. Intell. Rev. 45(4), 414–462 (2017)
Li, S., Tan, M.: Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft. Comput. 12, 1039–1048 (2008)
Ben-Dor, A., Bruhn, L., et al.: Tissue classification with gene expression profiles. J. Comput. Biol. 7(3–4), 559–583 (2000)
Wang, Y., Makedon, F.S., Ford, J.C., Pearlman, J.: HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8), 1530–1537 (2005)
Wan, S.-L., Li, X., et al.: Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction. Comput. Biol. Med. 40, 179–189 (2010)
Wessels, L.F.A., Rain, J.T.M., et al.: Representation and classification for high-throughput data. In: Proceedings of the SPIE 4626, Biomedical Nanotechnology Architectures and Applications, vol. 4626, pp. 226–237 (2002)
Cho, S.-B., Won, H.-H.: Machine learning in DNA microarray analysis for cancer classification. In: Proceedings of the 1st Asia-Pacific bioinformatics conference on Bioinformatics, vol. 19, pp. 189–198 (2003)
Cho, S.-B., Won, H.-H.: Cancer classification using ensemble of neural networks with multiple significant gene subsets. Appl. Intell. 26(3), 243–250 (2007)
Deb, K., Reddy, R.: Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems 72(1), 111–129 (2003)
Karimi, S., Farrokhnia, M.: Leukemia and small round blue cell tumor cancer detection using microarray gene expression data set: combining data dimension reduction and variable selection technique. Chemom. Intell. Lab. Syst. 139, 6–14 (2014)
Tang, Y., Zhang, Y., Huang, Z.: Development of two-stage SVMRFE gene selection strategy for microarray expression data analysis. IEEE/ACM Trans. Comput. Biol. Bioinformat. 4(3), 365–381 (2007)
Vinterbo, S.A., Kim, E.-Y., Ohno-Machao, L.: Small, fuzzy and interpretable gene expression based classifiers. Bioinformatics 21(9), 1964–1970 (2005)
Chu, W., Ghahramani, Z., Falciani, F., Wild, D.L.: Biomarker discovery in microarray gene expression with Gaussian process. Bioinformatics 21(16), 3385–3393 (2005)
Guan, Z., Zhao, H.: A semiparametric approach for marker gene selection based on gene expression data. Bioinformatics 24(4), 529–536 (2005)
Hu, S., Rao, J.: Statistical redundancy testing for improved gene selection in cancer classification using microarray data. Cancer Informat. 2, 29–41 (2007)
Arevalillo, J.-M., Navarro, H.: A new approach for detecting bivariate interactions in high-dimensional data using quadratic discriminant analysis. In: Proceedings of the 9th International Workshop Data Mining Bioinformatics, pp. 1–7 (2010)
Wan, X., Gotoh, O.: Microarray-based cancer prediction using soft computing approach. Cancer Informat. 7, 123–139 (2009)
Bonilla-Huerta, E., et al.: Hybrid framework using multiple-filters and an embedded approach, for an efficient selection and classification of microarray data. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(1), 12–26 (2016)
Chen, D., et al.: Selecting genes by test statistics. J. Biomed. Biotechnol. 2, 132–138 (2005)
Wang, S., et al.: Gene selection with rough sets for the molecular diagnosing of tumor based on support vector machines. In: Proceedings of the ICS, pp. 1368–1373 (2006)
Wang, S., Chen, H., Li, S.: Gene selection using neighborhood rough set from gene expression profiles. In: Proceedings of the International Conference on Computer Intelligent Security, pp. 959–963 (2007)
Luque-Baena, R.M., Urda, D., Subirats, J.L., Franco, L., Jerez, J.M.: Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data. Theoret. Biol. Med. Model. 11(Suppl. 1), S7 (2014)
Vanitha, D.-A., Devarajb, D., Venkatesuluc, M.: Gene expression data classification using support vector machine and mutual information-based gene selection. Procedia Comput. Sci. 47, 13–21 (2015)
Zhang, H., Wang, H., Dai, Z., et al.: Improving accuracy for cancer classification with a new algorithm for genes selection. BMC Bioinform. 13, 298 (2012)
Gao, L., Ye, M., et al.: Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genomics Proteomics Bioinform. 15, 389–395 (2017)
Mao, Z., Cai, W., Shao, X.: Hybrid method based on information gain and support vector machine for gene selection in cancer classification. J. Biomed. Inform. 46, 594–601 (2013)
Luque-Baena, R.M., Urda, D., et al.: Robust signatures from microarray data using genetic algorithms enriched with biological pathway keywords. J. Biomed. Inform. 49, 32–44 (2014)
Akadi, A.E., Amine, A., Ouardighi, A.E., Aboutajdine, D.: A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl. Inf. Syst. 26, 487–500 (2010)
Nanni, L., Brahnam, S., Lumini, A.: Combining multiple approaches for gene microarray classification. Bioinformatics 28(8), 1151–1157 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Bonilla-Huerta, E., Morales-Caporal, R., Arjona-López, M.A. (2018). Exploration and Exploitation of High Dimensional Biological Datasets Using a Wrapper Approach Based on Strawberry Plant Algorithm. In: Huang, DS., Jo, KH., Zhang, XL. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10955. Springer, Cham. https://doi.org/10.1007/978-3-319-95933-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-95933-7_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95932-0
Online ISBN: 978-3-319-95933-7
eBook Packages: Computer ScienceComputer Science (R0)