Bi-criteria Data Reduction for Instance-Based Classification

  • Ireneusz Czarnowski
  • Joanna JȩdrzejowiczEmail author
  • Piotr Jȩdrzejowicz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9875)


One of the approaches to deal with the big data problem is the training data reduction which may improve generalization quality and decrease complexity of the data mining algorithm. In this paper we see the instance reduction problem as the multiple objective one and propose criteria which allow to generate the Pareto-optimal set of ‘typical’ instances. Next, the reduced dataset is used to construct classification function or to induce a classifier. The approach is validated experimentally.


Classification Clustering with bias-correction Pareto-optimal prototypes 


  1. 1.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 66, 37–66 (1991)Google Scholar
  2. 2.
    Andrews, N.O., Fox, E.A.: Clustering for data reduction: a divide and conquer approach. Technical Report TR-07-36, Computer Science, Virginia Tech (2007)Google Scholar
  3. 3.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science (2007).
  4. 4.
    Carbonera, J.L., Abel, M.: A density-based approach for instance selection. In: Proceedings of the 2015 IEEE 27th International Conference on Tool with Artificial Intelligence, pp. 768–774 (2015). doi: 10.1109/ICTAI.2015.114
  5. 5.
    Chin-Liang, C.: Finding prototypes for nearest neighbor classifier. IEEE Trans. Comput. 23(11), 1179–1184 (1974)CrossRefzbMATHGoogle Scholar
  6. 6.
    Czarnowski, I., Jȩdrzejowicz, P.: An agent-based approach to the multiple-objective selection of reference vectors. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 117–130. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Czarnowski, I., Jȩdrzejowicz, P.: An approach to instance reduction in supervised learning. In: Research and Development in Intelligent Systems XX, pp. 267–282. Springer, London (2004)Google Scholar
  8. 8.
    Czarnowski, I., Jȩdrzejowicz, P.: An approach to data reduction and integrated machine classification. New Generation Comput. 28, 21–40 (2010)CrossRefzbMATHGoogle Scholar
  9. 9.
    Czarnowski, I.: Distributed learning with data reduction. In: Nguyen, N.T. (ed.) TCCI IV 2011. LNCS, vol. 6660, pp. 3–121. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Czarnowski, I., Jędrzejowicz, P.: A new cluster-based instance selection algorithm. In: O’Shea, J., Nguyen, N.T., Crockett, K., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2011. LNCS, vol. 6682, pp. 436–445. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. Fuzzy Syst. 11(2), 262–270 (2013)CrossRefGoogle Scholar
  12. 12.
    Garcia, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer (2015)Google Scholar
  13. 13.
    Grudzinski, K., Duch, W.: SBL-PM: simple algorithm for selection of reference instances in similarity based methods. In: Proceedings of the Intelligence Systems, Bystra, Poland, pp. 99–107 (2000)Google Scholar
  14. 14.
    Hamo, Y., Markovitch, S.: The COMPSET algorithm for subset selection. In: Proceedings of the 19 International Joint Conference for Artificial Intelligence, Edinburgh, Scotland, pp. 728–733 (2005)Google Scholar
  15. 15.
    Hart, P.E.: The condensed nearest neighbour rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)CrossRefGoogle Scholar
  16. 16.
    Kim, S.W., Oommen, B.J.: A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal. Appl. 6, 232–244 (2003)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kuncheva, L.I., Bezdek, J.C.: Nearest prototype classification: clustering, genetic algorithm or random search? IEEE Trans. Syst. Man Cybern. 28(1), 160–164 (1998)CrossRefGoogle Scholar
  18. 18.
    Leyva, E., Gonzalez, A., Perez, R.: Three new instances selection methods based on local sets: a comparative study with several approaches from bi-objective perspective. Pattern Recogn. 48(4), 1523–1537 (2015)CrossRefGoogle Scholar
  19. 19.
    Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)CrossRefGoogle Scholar
  20. 20.
    Machine Learning Data Set Repository (2013).
  21. 21.
    Olvera-Lopez, J.A., Carrasco-Ochoa, A.J., Martnez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010). doi: 10.1007/s10462-010-9165-y CrossRefGoogle Scholar
  22. 22.
    Raman, B.: Enhancing Learning Using Feature and Example Selection. Texas A&M University, College Station (2003)Google Scholar
  23. 23.
    Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest decision rule. IEEE Trans. Inf. Theory 21, 665–669 (1975)CrossRefzbMATHGoogle Scholar
  24. 24.
    Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 293–301 (1994)Google Scholar
  25. 25.
    Song, H.H., Lee, S.W.: LVQ combined with simulated annealing for optimal design of large-set reference models. Neural Netw. 9(2), 329–336 (1996)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Tomek, I.: An experiment with the edited nearest-neighbour rule. IEEE Trans. Syst. Man Cybern. 6–6, 448–452 (1976)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Tsai, C.F., Eberle, W., Chu, C.Y.: Genetic algorithms in feature and instance selection. Knowl. Based Syst. 39, 240–247 (2013). doi: 10.1016/j.knosys.2012.11.005 CrossRefGoogle Scholar
  28. 28.
    Lin, W.C., Tsai, C.F., Ke, S.W., Hung, C.W., Eberle, W.: Learning to detect representative data for large scale instance selection. J. Syst. Softw. 106, 1–8 (2015)CrossRefGoogle Scholar
  29. 29.
  30. 30.
    Wilson, D.R., Martinez, T.R.: An integrated instance-based learning algorithm. Comput. Intell. 16, 1–28 (2000)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithm. Mach. Learn. 38, 257–286 (2000). Kluwer Academic Publishers, BostonCrossRefzbMATHGoogle Scholar
  32. 32.
    Winton, D., Pete, E.: Using instance selection to combine multiple models learned from disjoint subsets. In: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)Google Scholar
  33. 33.
    Wu, Y., Ianakiev, K., Govindaraju, V.: Improvements in k-nearest neighbor classification. In: Singh, S., Murshed, N., Kropatsch, W.G. (eds.) ICAPR 2001. LNCS, vol. 2013, pp. 222–229. Springer, Heidelberg (2001)Google Scholar
  34. 34.
    Yang, M.-S., Yi-Cheng, T.: Bias-correction fuzzy clustering algorithms. Inf. Sci. 309, 138–162 (2015)CrossRefGoogle Scholar
  35. 35.
    Yu, K., Xiaowei, X., Ester, M., Kriegel, H.P.: Feature weighting and instance selection for collaborative filtering: an information-theoretic approach. Knowl. Inf. Syst. 5(2), 201–224 (2004)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Ireneusz Czarnowski
    • 1
  • Joanna Jȩdrzejowicz
    • 2
    Email author
  • Piotr Jȩdrzejowicz
    • 1
  1. 1.Department of Information SystemsGdynia Maritime UniversityGdyniaPoland
  2. 2.Institute of InformaticsGdańsk UniversityGdańskPoland

Personalised recommendations