Skip to main content

One Class Genetic-Based Feature Selection for Classification in Large Datasets

  • Conference paper
  • First Online:
Big Data, Cloud and Applications (BDCA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

Abstract

Feature selection is a key success factor for classification problems with high dimensional and large datasets. In this paper, we introduce an approach for enhancing classification performance of high dimensional datasets using a combination of genetic algorithms for feature selection and One-class SVM for classification. The proposed approach is suitable for high dimensional and large datasets. It can be used when we have only one class observations and when high classification accuracy is required. Two benchmark datasets were taken from the NIPS 2003 variable selection competition and the UCI Machine Learning Repository to span a variety of domains and difficulties. Results show that applying feature selection prior to classification gives a higher prediction accuracy than using classification without any feature selection. It can also outperform classifier like random forest especially when we have datasets with a very large number of instances and a small number of observations like the ARCENE dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Michalski, R., Carbonell, J., Mitchell, T.: Machine Learning: An Artificial Intelligence Approach. Tioga Publishing Company (1983)

    Google Scholar 

  2. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, 1st edn. Addison Wesley, Boston (1989)

    MATH  Google Scholar 

  3. Lorena, L., Carvalho, A., Lorena, A.: Filter feature selection for one-class classification. J. Intell. Robot. Syst. 80(Suppl. 1), 227–243 (2015)

    Article  Google Scholar 

  4. Chow, R., Zhong, W., Blackmon, M., Stolz, R., Dowell, M.: An efficient SVM-GA feature selection model for large healthcare databases, pp. 978–990, July 16. ACM (2008)

    Google Scholar 

  5. Fehr, J., Arreola, K.Z., Burkhardt, H.: Fast support vector machine classification of very large datasets. Pattern Recognition and Image Processing, Freiburg, Germany (2011)

    Google Scholar 

  6. Navas, M., Ordonez, C.: Efficient computation of PCA with SVD in SQL. ACM (2009)

    Google Scholar 

  7. Le, H.T., Yannakakis, G.N.: Automatic feature selection for named entity recognition using genetic algorithm. ACM, pp. 2454–2466, 06 December 2013

    Google Scholar 

  8. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  9. Cho, H.W., Kim, S.B., Park, Y., Ziegler, T.: Genetic algorithm-based feature selection in high-resolution NMR spectra. Expert Syst. Appl. 35(3), 967–975 (2008)

    Article  Google Scholar 

  10. Lee, J., Hong, S., Lee, J.H.: An efficient prediction for heavy rain from big weather data using genetic algorithm. ACM, pp. 1–5 (2014)

    Google Scholar 

  11. Krupa, J., Chatterjee, S., Eldridge, E.: Evolutionary feature selection for classification: a plug-in hybrid vehicle adoption application, pp. 978–1000. ACM, July 2012

    Google Scholar 

  12. Yeniterzi, R., Küçükural, A., Yeniterzi, S., Sezerman, U.: Evolutionary selection of minimum number of features for classification of gene expression data using genetic algorithms. ACM, vol. 1 (2007). ISBN 978-1-59593-697-4

    Google Scholar 

  13. Bache, K., Lichman, M.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml

  14. Karegowda, A.G., Jayaram, M.A., Manjunath, A.: Feature subset selection problem using wrapper approach in supervised learning. Int. J. Comput. Appl. 7, 0975–8887 (2010)

    Google Scholar 

  15. Yugal, K., Sahoo, G.: Analysis of Bayes, neural network and tree classifier of classification technique in data mining using WEKA. In: CCSEA, pp. 359–369 (2012)

    Google Scholar 

  16. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Combined selection and hyperparameter optimization of classification algorithms. ACM, pp. 847–855 (2013)

    Google Scholar 

  17. Gibbs, M.S., Maier, H.R., Dandy, G.C., Nixon, J.B.: Minimum number of generations required for convergence of genetic algorithms. IEEE, vol. 9, no. 6, pp. 7803–9487 (2006)

    Google Scholar 

  18. Cernadas, E., Barro, S., Delgado, M.F.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  19. Wolpert, D.H.: The lack of a priori distinctions between learning algorithms. Neural Netw. 9, 1341–1390 (1996)

    Google Scholar 

  20. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3, 185–205 (2005)

    Article  Google Scholar 

  21. Dreiseitl, S., Osl, M., Scheibbóck, C., Binder, M.: Outlier detection with one-class SVMs: an application to melanoma prognosis. In: AMIA Annual Symposium Proceedings, vol. 2010, pp. 172–176 (2010)

    Google Scholar 

  22. Gharaee, H., Hosseinvand, H.: A new feature selection IDS based on genetic algorithm and SVM. In: 2016 8th International Symposium on Telecommunications (IST), Tehran, Iran (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Murad Alkubabji , Mohammed Aldasht or Safa Adi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alkubabji, M., Aldasht, M., Adi, S. (2018). One Class Genetic-Based Feature Selection for Classification in Large Datasets. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96292-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96291-7

  • Online ISBN: 978-3-319-96292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics