Skip to main content

Bi-criteria Data Reduction for Instance-Based Classification

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9875))

Included in the following conference series:

  • 1287 Accesses

Abstract

One of the approaches to deal with the big data problem is the training data reduction which may improve generalization quality and decrease complexity of the data mining algorithm. In this paper we see the instance reduction problem as the multiple objective one and propose criteria which allow to generate the Pareto-optimal set of ‘typical’ instances. Next, the reduced dataset is used to construct classification function or to induce a classifier. The approach is validated experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 66, 37–66 (1991)

    Google Scholar 

  2. Andrews, N.O., Fox, E.A.: Clustering for data reduction: a divide and conquer approach. Technical Report TR-07-36, Computer Science, Virginia Tech (2007)

    Google Scholar 

  3. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science (2007). http://www.ics.uci.edu/mlearn/MLRepository.html

  4. Carbonera, J.L., Abel, M.: A density-based approach for instance selection. In: Proceedings of the 2015 IEEE 27th International Conference on Tool with Artificial Intelligence, pp. 768–774 (2015). doi:10.1109/ICTAI.2015.114

  5. Chin-Liang, C.: Finding prototypes for nearest neighbor classifier. IEEE Trans. Comput. 23(11), 1179–1184 (1974)

    Article  MATH  Google Scholar 

  6. Czarnowski, I., Jȩdrzejowicz, P.: An agent-based approach to the multiple-objective selection of reference vectors. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 117–130. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Czarnowski, I., Jȩdrzejowicz, P.: An approach to instance reduction in supervised learning. In: Research and Development in Intelligent Systems XX, pp. 267–282. Springer, London (2004)

    Google Scholar 

  8. Czarnowski, I., Jȩdrzejowicz, P.: An approach to data reduction and integrated machine classification. New Generation Comput. 28, 21–40 (2010)

    Article  MATH  Google Scholar 

  9. Czarnowski, I.: Distributed learning with data reduction. In: Nguyen, N.T. (ed.) TCCI IV 2011. LNCS, vol. 6660, pp. 3–121. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Czarnowski, I., Jędrzejowicz, P.: A new cluster-based instance selection algorithm. In: O’Shea, J., Nguyen, N.T., Crockett, K., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2011. LNCS, vol. 6682, pp. 436–445. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. Fuzzy Syst. 11(2), 262–270 (2013)

    Article  Google Scholar 

  12. Garcia, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer (2015)

    Google Scholar 

  13. Grudzinski, K., Duch, W.: SBL-PM: simple algorithm for selection of reference instances in similarity based methods. In: Proceedings of the Intelligence Systems, Bystra, Poland, pp. 99–107 (2000)

    Google Scholar 

  14. Hamo, Y., Markovitch, S.: The COMPSET algorithm for subset selection. In: Proceedings of the 19 International Joint Conference for Artificial Intelligence, Edinburgh, Scotland, pp. 728–733 (2005)

    Google Scholar 

  15. Hart, P.E.: The condensed nearest neighbour rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)

    Article  Google Scholar 

  16. Kim, S.W., Oommen, B.J.: A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal. Appl. 6, 232–244 (2003)

    Article  MathSciNet  Google Scholar 

  17. Kuncheva, L.I., Bezdek, J.C.: Nearest prototype classification: clustering, genetic algorithm or random search? IEEE Trans. Syst. Man Cybern. 28(1), 160–164 (1998)

    Article  Google Scholar 

  18. Leyva, E., Gonzalez, A., Perez, R.: Three new instances selection methods based on local sets: a comparative study with several approaches from bi-objective perspective. Pattern Recogn. 48(4), 1523–1537 (2015)

    Article  Google Scholar 

  19. Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)

    Book  Google Scholar 

  20. Machine Learning Data Set Repository (2013). http://mldata.org/repository/tags/data/IDA_Benchmark_Repository/

  21. Olvera-Lopez, J.A., Carrasco-Ochoa, A.J., Martnez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010). doi:10.1007/s10462-010-9165-y

    Article  Google Scholar 

  22. Raman, B.: Enhancing Learning Using Feature and Example Selection. Texas A&M University, College Station (2003)

    Google Scholar 

  23. Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest decision rule. IEEE Trans. Inf. Theory 21, 665–669 (1975)

    Article  MATH  Google Scholar 

  24. Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 293–301 (1994)

    Google Scholar 

  25. Song, H.H., Lee, S.W.: LVQ combined with simulated annealing for optimal design of large-set reference models. Neural Netw. 9(2), 329–336 (1996)

    Article  MathSciNet  Google Scholar 

  26. Tomek, I.: An experiment with the edited nearest-neighbour rule. IEEE Trans. Syst. Man Cybern. 6–6, 448–452 (1976)

    MathSciNet  MATH  Google Scholar 

  27. Tsai, C.F., Eberle, W., Chu, C.Y.: Genetic algorithms in feature and instance selection. Knowl. Based Syst. 39, 240–247 (2013). doi:10.1016/j.knosys.2012.11.005

    Article  Google Scholar 

  28. Lin, W.C., Tsai, C.F., Ke, S.W., Hung, C.W., Eberle, W.: Learning to detect representative data for large scale instance selection. J. Syst. Softw. 106, 1–8 (2015)

    Article  Google Scholar 

  29. Waikato. http://moa.cms.waikato.ac.nz/datasets/2013

  30. Wilson, D.R., Martinez, T.R.: An integrated instance-based learning algorithm. Comput. Intell. 16, 1–28 (2000)

    Article  MathSciNet  Google Scholar 

  31. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithm. Mach. Learn. 38, 257–286 (2000). Kluwer Academic Publishers, Boston

    Article  MATH  Google Scholar 

  32. Winton, D., Pete, E.: Using instance selection to combine multiple models learned from disjoint subsets. In: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)

    Google Scholar 

  33. Wu, Y., Ianakiev, K., Govindaraju, V.: Improvements in k-nearest neighbor classification. In: Singh, S., Murshed, N., Kropatsch, W.G. (eds.) ICAPR 2001. LNCS, vol. 2013, pp. 222–229. Springer, Heidelberg (2001)

    Google Scholar 

  34. Yang, M.-S., Yi-Cheng, T.: Bias-correction fuzzy clustering algorithms. Inf. Sci. 309, 138–162 (2015)

    Article  Google Scholar 

  35. Yu, K., Xiaowei, X., Ester, M., Kriegel, H.P.: Feature weighting and instance selection for collaborative filtering: an information-theoretic approach. Knowl. Inf. Syst. 5(2), 201–224 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joanna Jȩdrzejowicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Czarnowski, I., Jȩdrzejowicz, J., Jȩdrzejowicz, P. (2016). Bi-criteria Data Reduction for Instance-Based Classification. In: Nguyen, NT., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9875. Springer, Cham. https://doi.org/10.1007/978-3-319-45243-2_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45243-2_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45242-5

  • Online ISBN: 978-3-319-45243-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics