Abstract
In this paper, we propose a hybrid filter-wrapper algorithm, GSO-Infogain, for simultaneous feature selection for improved classification accuracy. GSO-Infogain employs Glowworm-Swarm Optimization(GSO) algorithm with Support Vector Machine(SVM) as its internal learning algorithm and utilizes feature ranking based on information gain as a heuristic. The GSO algorithm randomly generates a population of worms, each of which is a candidate subset of features. The fitness of each candidate solution, which is evaluated using Support Vector Machine, is encoded within its luciferin value. Each worm probabilistically moves towards the worm with the highest luciferin value in its neighbourhood. In the process, they explore the feature space and eventually converge to the global optimum. We have evaluated the performance of the hybrid algorithm for feature selection on a set of cancer datasets. We obtain a classification accuracy in the range 94-98 % for these datasets, which is comparable to the best results from other classification algorithms. We further tested the robustness of GSO-Infogain by evaluating its performance on the CoEPrA training and test datasets. GSO-Infogain performs well in this experiment too by giving similar prediction accuracies on the training and test datasets thus indicating its robustness.
V.N. gratefully acknowledges Council of Scientific and Industrial Research, New Delhi for awarding a Junior Research Fellowship.
V.K.J. gratefully acknowledges financial support from Department of Science and Technology, New Delhi.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellman, R.E.: Adaptive control processes - A guided tour. Princeton University Press, Princeton (1961)
Ng, A.Y.: On feature selection: learning with exponentially many irrelevant features as training examples. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 404–412, Morgan Kaufmann (1998)
Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theor. 14(1), 55–63 (1968)
Webb, A.R.: Statistical Pattern Recognition, 2nd edn. John Wiley & Sons, NJ (2002)
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial intelligence, AAAI 1992, pp. 129–134. AAAI Press (1992)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees, 1st edn. Chapman and Hall/CRC, London (1984)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Nair, V., Dutta, M., Manian, S.S., Kumari, R., Jayaraman, V.K.: Identification of penicillin-binding proteins employing support vector machines and random forest. Bioinformation 9(9), 481 (2013)
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. 97(1), 262–267 (2000)
Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Guo, G., Li, S.Z., Chan, K.L.: Face recognition by support vector machines. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 196–201, IEEE (2000)
Pontil, M., Verri, A.: Support vector machines for 3d object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 20(6), 637–646 (1998)
Rowley, H.A., Jing, Y., Baluja, S.: Large scale image-based adult-content filtering. In: VISAPP (1), pp. 290–296, Citeseer (2006)
Sculley, D., Wachman, G.M.: Relaxed online svms for spam filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 415–422. ACM (2007)
Krishnanand, K.N., Ghose, D.: Detection of multiple source locations using a glowworm metaphor with applications to collective robotics. In: Proceedings 2005 IEEE Swarm Intelligence Symposium, SIS 2005, pp. 84–91 (2005)
Colorni, A., Dorigo, M., Maniezzo, V., et al.: Distributed optimization by ant colonies. In: Proceedings of the First European Conference on Artificial Life. vol. 142, pp. 134–142, Paris, France (1991)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Statist. 22(1), 79–86 (1951)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intel. Syst. Technol. (TIST) 2(3), 27 (2011)
Kent, J.T.: Information gain and a general measure of correlation. Biometrika 70(1), 163–173 (1983)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J.A., Marks, J.R., Nevins, J.R.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. 98(20), 11462–11467 (2001)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Mohammadi, A., Saraee, M.H., Salehi, M.: Identification of disease-causing genes using microarray data mining and gene ontology. BMC Med. Genomics 4(1), 12 (2011)
Sharma, S., Ghosh, S., Anantharaman, N., Jayaraman, V.K.: Simultaneous informative gene extraction and cancer classification using aco-antminer and aco-random forests. In: Proceedings of the International Conference on Information Systems Design and Intelligent Applications 2012 (INDIA 2012) held in Visakhapatnam, India, pp. 755–761.Springer, January 2012
Nikumbh, S., Ghosh, S., Jayaraman, V.K.: Biogeography-based informative gene selection and cancer classification using svm and random forests. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 1–6. IEEE (2012)
Blanco, Á., Martín-Merino, M., De Las Rivas, J.: Combining dissimilarity based classifiers for cancer prediction using gene expression profiles. BMC Bioinform. 8(Suppl 8), S3 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gurav, A., Nair, V., Gupta, U., Valadi, J. (2015). Glowworm Swarm Based Informative Attribute Selection Using Support Vector Machines for Simultaneous Feature Selection and Classification. In: Panigrahi, B., Suganthan, P., Das, S. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2014. Lecture Notes in Computer Science(), vol 8947. Springer, Cham. https://doi.org/10.1007/978-3-319-20294-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-20294-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20293-8
Online ISBN: 978-3-319-20294-5
eBook Packages: Computer ScienceComputer Science (R0)