Abstract
Selection of genes is one of the most onerous tasks for the study of microarray data, which is accounted because of the higher number of features, rising up to tens of thousands. Feature selection is a crucial step for proper analysis and classification of microarray data. Filter methods are pre-processing algorithms that are independent of the type of classifiers used. Wrapper methods predict the advantages of adding or removing a feature from the dataset by introduction of the induction algorithm and cross validation. In our proposed technique we have tried for significant reduction of the dimensionality of the feature set namely, Leukaemia, Prostate Cancer and DLBCL datasets by passing it to various filters namely, T-test, Bhattacharyya and ReliefF. The further reduction in dimension is done in the second layer with the Mutual Information Maximisation (MIM) filter, which is further optimised by the Adaptive Genetic Algorithm (AGA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Heller, M.J.: DNA microarray technology: devices, systems, and applications. Annual Rev. Biomed. Eng. 4, 129–153 (2002)
Li, S., Li, D.: DNA microarray technology. In: DNA Microarray Technology and Data Analysis in Cancer Research, pp. 1–9 (2008)
Kumar, A., Kumar, S., Venkatesh, D., Prabhakaran, C., Ravi Prakash, D., Chakraborty, S.: Identification of genes associated with tumorigenesis of meibomian cell carcinoma by microarray analysis. Genomics 90, 559–566 (2007)
Wang, A., An, N., Chen, G., Li, L., Alterovitz, G.: Improving PLS-RFE based gene selection for microarray data classification. Comput. Biol. Med. 62, 14–24 (2015)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Brahim, A.B., Limam, M.: Robust ensemble feature selection for high dimensional data sets. In: International Conference on High Performance Computing and Simulation (HPCS), pp. 151–157 (2013)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform. 13, 51–60 (2002)
Yu, L., Liu H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Twentieth International Conference on Machine Learning, pp: 856–863 (2003)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification, 2nd edn. Wiley, New York (2001)
Guorong, X., Peiqi, C., Minhui, W.: Bhattacharyya distance feature selection. In: 13th International Conference on Pattern Recognition, pp. 195–199 (1996)
Robnik-Sikonja, M., Kononenko. I.: An adaptation of relief for attribute estimation in regression. In: ICML’97 Proceedings of the Fourteenth International Conference on Machine Learning, pp: 296–304 (1997)
Cover, T.M., Thomas, J.A.: Elements of information theory, Chapter 2. Wiley, New York (1991)
Davis, L.: Handbook of Genetic Algorithms. Van Nostrand Reinhold (1991)
Srinivas, M., Patnaik, L.M.: Genetic algorithm: a survey. IEEE Trans. Comput. 27, 17–26 (1994)
[Online] Available: http://www.biolab.si/sup/bi-cancer/projections/
Gao, L., Ye, M., Lu, X., Huang, D.: Hybrid method based on information gain and support vector machine for gene selection in cancer classification. Genom. Proteomics Bioinform. 15, 389–395 (2017)
Mohammadi, A., Saraee, M.H., Salehi, M.: Identification of disease-causing genes using micro array data mining and Gene Ontology. BMC Med. Genomics 4, 4–12 (2011)
Chandra, B., Gupta, M.: An efficient statistical feature selection approach for classification of gene expression data. J. Biomed. Inform. 44, 529–535 (2011)
Vege, S.H.: Ensemble of feature selection techniques for high dimensional data. Master’s Thesis and Specialist Projects (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Begum, S., Ansari, A.A., Sultan, S., Dam, R. (2019). A Hybrid Model for Optimum Gene Selection of Microarray Datasets. In: Kalita, J., Balas, V., Borah, S., Pradhan, R. (eds) Recent Developments in Machine Learning and Data Analytics. Advances in Intelligent Systems and Computing, vol 740. Springer, Singapore. https://doi.org/10.1007/978-981-13-1280-9_39
Download citation
DOI: https://doi.org/10.1007/978-981-13-1280-9_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1279-3
Online ISBN: 978-981-13-1280-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)