Instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis using decision tree



Instance selection and feature extraction is one of the most important task in data mining, due to the huge amount of data is constantly being produced in many fields. If the dataset is very large means most of the existing machine learning algorithms are inapplicable to handle such huge amount of data and computational cost is high. Two of the approaches have been used for solving this problem. One is scaling up algorithms and another one is data reduction. Scaling up data mining algorithm is not always feasible, but data reduction is possible. In this paper we take both, instance selection and feature extraction for data reduction. Instance selection is a technique that will reduce the size of the original training data. Feature extraction is input data having m dimension space that should be mapped into lower dimension space i.e., eliminate those components which are contributing less information. In this paper Cuttlefish optimization algorithm is used for instance selection, while principal component analysis is used for feature extraction. The combination of feature extraction and instance selection will reduce the large amount of computational time of training the classifiers. The optimal extracted subset of data points and reduced feature space are providing almost similar detection rate, accuracy rate, false positive rate and takes less amount of computational time for training the classifiers what we obtained from using original dataset.


Cuttlefish optimization algorithm Principal component analysis Feature extraction and instance selection 


  1. 1.
    Huan, L., Motoda, H.: Instance Selection and Construction for Data Mining The Kluwer International Series in Engineering and Computer Science. Springer, New York (2001)Google Scholar
  2. 2.
    Arnaiz-Gonzalez, A., Diez-Pastor, J.-F., Rodriguez, J.J., Gracia-Osoria, C.: Instance selection of linear complexity for big data. Knowl. Based Syst. 107, 83–95 (2016)CrossRefGoogle Scholar
  3. 3.
    Huan, L., Hiroshi, M.: On issues of instance selection. Data Min. Knowl. Discov. 6, 115–130 (2002)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Sarveniazi, A.: An actual survey of dimensionality reduction. Am. J. Comput. Math. 4, 55–57 (2014)CrossRefGoogle Scholar
  5. 5.
    Olvera-López, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)CrossRefGoogle Scholar
  6. 6.
    Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Trans. Evol. Comput. 4(2), 164–171 (2000)CrossRefGoogle Scholar
  7. 7.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)CrossRefMATHGoogle Scholar
  8. 8.
    Kordas, M., Klos-Witkowska, A.: Increasing speed of genetic algorithm based instance selection. In: The 8th IEEE international conference on intelligent data acquisition and advanced computing system: technology and applications, September 2015, Warsaw, Poland (2015)Google Scholar
  9. 9.
    Silva, D.A.N.S., Souza, L.C., Motta, G.H.M.B.: An Instance selection method for large datasets based on markov geometric diffusion. Data Knowl. Eng. 101, 24–41 (2016)CrossRefGoogle Scholar
  10. 10.
    Tsai, C.-F., Eberale, W., Chu, C.-Y.: Genetic algorithms in feature and instance selection. Knowl. Based Syst. 39, 240–247 (2013)CrossRefGoogle Scholar
  11. 11.
    Derrac, J., García, S., Herrera, F.: A survey on evolutionary instance selection and generation. Int. J. Appl. Metaheuristic Comput. 1(1), 60–92 (2010)CrossRefGoogle Scholar
  12. 12.
    Garcia, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl. Based Syst. 25(1), 3–12 (2012)CrossRefGoogle Scholar
  13. 13.
    Parez-Jimenez, A.J., Perez-Cortex, J.C.: Genetic algorithms for linear feature extraction. Pattern Recognit. Lett. 27(13), 1508–1514 (2006)CrossRefGoogle Scholar
  14. 14.
    Fu, Z., Golden, B.L., Lele, S., Raghavan, S., Wasil, E.A.: A genetic algorithm-based approach for building accurate decision trees. INFORMS J. Comput. 15(1), 3–22 (2003)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Eesa, A.S., Orman, Z., Brifcani, A.M.A.: A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Syst. Appl. 42, 2670–2679 (2015)CrossRefGoogle Scholar
  16. 16.
    Eesa, A.S., Brifcani, A.M.A., Orman, Z.: A new tool for global optimization problems-cuttlefish algorithm. Int. J. Math. Comput. Stat. Nat. Phys. Eng. 8(9), 1203–1207 (2014)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringMahendra College of EngineeringSalemIndia
  2. 2.Department of Computer Sciences TechnologyKarunya UniversityCoimbatoreIndia

Personalised recommendations