Abstract
Feature selection is a key success factor for classification problems with high dimensional and large datasets. In this paper, we introduce an approach for enhancing classification performance of high dimensional datasets using a combination of genetic algorithms for feature selection and One-class SVM for classification. The proposed approach is suitable for high dimensional and large datasets. It can be used when we have only one class observations and when high classification accuracy is required. Two benchmark datasets were taken from the NIPS 2003 variable selection competition and the UCI Machine Learning Repository to span a variety of domains and difficulties. Results show that applying feature selection prior to classification gives a higher prediction accuracy than using classification without any feature selection. It can also outperform classifier like random forest especially when we have datasets with a very large number of instances and a small number of observations like the ARCENE dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Michalski, R., Carbonell, J., Mitchell, T.: Machine Learning: An Artificial Intelligence Approach. Tioga Publishing Company (1983)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, 1st edn. Addison Wesley, Boston (1989)
Lorena, L., Carvalho, A., Lorena, A.: Filter feature selection for one-class classification. J. Intell. Robot. Syst. 80(Suppl. 1), 227–243 (2015)
Chow, R., Zhong, W., Blackmon, M., Stolz, R., Dowell, M.: An efficient SVM-GA feature selection model for large healthcare databases, pp. 978–990, July 16. ACM (2008)
Fehr, J., Arreola, K.Z., Burkhardt, H.: Fast support vector machine classification of very large datasets. Pattern Recognition and Image Processing, Freiburg, Germany (2011)
Navas, M., Ordonez, C.: Efficient computation of PCA with SVD in SQL. ACM (2009)
Le, H.T., Yannakakis, G.N.: Automatic feature selection for named entity recognition using genetic algorithm. ACM, pp. 2454–2466, 06 December 2013
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)
Cho, H.W., Kim, S.B., Park, Y., Ziegler, T.: Genetic algorithm-based feature selection in high-resolution NMR spectra. Expert Syst. Appl. 35(3), 967–975 (2008)
Lee, J., Hong, S., Lee, J.H.: An efficient prediction for heavy rain from big weather data using genetic algorithm. ACM, pp. 1–5 (2014)
Krupa, J., Chatterjee, S., Eldridge, E.: Evolutionary feature selection for classification: a plug-in hybrid vehicle adoption application, pp. 978–1000. ACM, July 2012
Yeniterzi, R., Küçükural, A., Yeniterzi, S., Sezerman, U.: Evolutionary selection of minimum number of features for classification of gene expression data using genetic algorithms. ACM, vol. 1 (2007). ISBN 978-1-59593-697-4
Bache, K., Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
Karegowda, A.G., Jayaram, M.A., Manjunath, A.: Feature subset selection problem using wrapper approach in supervised learning. Int. J. Comput. Appl. 7, 0975–8887 (2010)
Yugal, K., Sahoo, G.: Analysis of Bayes, neural network and tree classifier of classification technique in data mining using WEKA. In: CCSEA, pp. 359–369 (2012)
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Combined selection and hyperparameter optimization of classification algorithms. ACM, pp. 847–855 (2013)
Gibbs, M.S., Maier, H.R., Dandy, G.C., Nixon, J.B.: Minimum number of generations required for convergence of genetic algorithms. IEEE, vol. 9, no. 6, pp. 7803–9487 (2006)
Cernadas, E., Barro, S., Delgado, M.F.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)
Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Netw. 9, 1341–1390 (1996)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3, 185–205 (2005)
Dreiseitl, S., Osl, M., Scheibbóck, C., Binder, M.: Outlier detection with one-class SVMs: an application to melanoma prognosis. In: AMIA Annual Symposium Proceedings, vol. 2010, pp. 172–176 (2010)
Gharaee, H., Hosseinvand, H.: A new feature selection IDS based on genetic algorithm and SVM. In: 2016 8th International Symposium on Telecommunications (IST), Tehran, Iran (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Alkubabji, M., Aldasht, M., Adi, S. (2018). One Class Genetic-Based Feature Selection for Classification in Large Datasets. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-96292-4_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)