Advertisement

A Self-generating Prototype Method Based on Information Entropy Used for Condensing Data in Classification Tasks

  • Alberto ManastarlaEmail author
  • Leandro A. Silva
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11871)

Abstract

This paper presents a new self-generating prototype method based on information entropy to reduce the size of training datasets. The method accelerates the classifier training time without significantly decreasing the quality in the data classification task. The effectiveness of the proposed method is compared to the K-nearest neighbour classifier (kNN) and the genetic algorithm prototype selection (GA). kNN is a benchmark method used for data classification tasks, while GA is a prototype selection method that provides competitive optimisation of accuracy and the data reduction ratio. Considering thirty different public datasets, the results of the comparisons demonstrate that the proposed method outperforms kNN when using the original training set as well as the reduced training set obtained via GA prototype selection.

Keywords

Prototype Selection (PS) Data reduction Data classification Genetic Algorithm (GA) 

References

  1. 1.
    Acampora, G., Tortora, G., Vitiello, A.: Applying SPEA2 to prototype selection for nearest neighbor classification. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 003924–003929. IEEE (2016)Google Scholar
  2. 2.
    Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Chen, C., Jóźwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn. Lett. 17(8), 819–823 (1996)CrossRefGoogle Scholar
  4. 4.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRefGoogle Scholar
  5. 5.
    Deng, Z., Zhu, X., Cheng, D., Zong, M., Zhang, S.: Efficient KNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)CrossRefGoogle Scholar
  6. 6.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)zbMATHGoogle Scholar
  7. 7.
    Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)CrossRefGoogle Scholar
  8. 8.
    García, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 15(6), 617 (2009)CrossRefGoogle Scholar
  9. 9.
    Ougiaroglou, S., Evangelidis, G., Dervos, D.A.: FHC: an adaptive fast hybrid method for K-NN classification. Logic J. IGPL 23(3), 431–450 (2015).  https://doi.org/10.1093/jigpal/jzv015MathSciNetCrossRefGoogle Scholar
  10. 10.
    Pękalska, E., Duin, R.P., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)CrossRefGoogle Scholar
  11. 11.
    Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008).  https://doi.org/10.1109/TPAMI.2008.128CrossRefGoogle Scholar
  12. 12.
    Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42, 86–100 (2012).  https://doi.org/10.1109/TSMCC.2010.2103939CrossRefGoogle Scholar
  13. 13.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)CrossRefGoogle Scholar
  14. 14.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)CrossRefGoogle Scholar
  15. 15.
    Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1) (2008).  https://doi.org/10.1007/s10115-007-0114-2CrossRefGoogle Scholar
  16. 16.
    Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. CRC Press, Boca Raton (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Electrical Engineering and ComputingMackenzie Presbyterian UniversitySao PauloBrazil
  2. 2.Computing and Informatics Faculty, Electrical Engineering and ComputingMackenzie Presbyterian UniversitySao PauloBrazil

Personalised recommendations