Abstract
One of the approaches to deal with the big data problem is the training data reduction which may improve generalization quality and decrease complexity of the data mining algorithm. In this paper we see the instance reduction problem as the multiple objective one and propose criteria which allow to generate the Pareto-optimal set of ‘typical’ instances. Next, the reduced dataset is used to construct classification function or to induce a classifier. The approach is validated experimentally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 66, 37–66 (1991)
Andrews, N.O., Fox, E.A.: Clustering for data reduction: a divide and conquer approach. Technical Report TR-07-36, Computer Science, Virginia Tech (2007)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science (2007). http://www.ics.uci.edu/mlearn/MLRepository.html
Carbonera, J.L., Abel, M.: A density-based approach for instance selection. In: Proceedings of the 2015 IEEE 27th International Conference on Tool with Artificial Intelligence, pp. 768–774 (2015). doi:10.1109/ICTAI.2015.114
Chin-Liang, C.: Finding prototypes for nearest neighbor classifier. IEEE Trans. Comput. 23(11), 1179–1184 (1974)
Czarnowski, I., Jȩdrzejowicz, P.: An agent-based approach to the multiple-objective selection of reference vectors. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 117–130. Springer, Heidelberg (2007)
Czarnowski, I., Jȩdrzejowicz, P.: An approach to instance reduction in supervised learning. In: Research and Development in Intelligent Systems XX, pp. 267–282. Springer, London (2004)
Czarnowski, I., Jȩdrzejowicz, P.: An approach to data reduction and integrated machine classification. New Generation Comput. 28, 21–40 (2010)
Czarnowski, I.: Distributed learning with data reduction. In: Nguyen, N.T. (ed.) TCCI IV 2011. LNCS, vol. 6660, pp. 3–121. Springer, Heidelberg (2011)
Czarnowski, I., Jędrzejowicz, P.: A new cluster-based instance selection algorithm. In: O’Shea, J., Nguyen, N.T., Crockett, K., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2011. LNCS, vol. 6682, pp. 436–445. Springer, Heidelberg (2011)
Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. Fuzzy Syst. 11(2), 262–270 (2013)
Garcia, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer (2015)
Grudzinski, K., Duch, W.: SBL-PM: simple algorithm for selection of reference instances in similarity based methods. In: Proceedings of the Intelligence Systems, Bystra, Poland, pp. 99–107 (2000)
Hamo, Y., Markovitch, S.: The COMPSET algorithm for subset selection. In: Proceedings of the 19 International Joint Conference for Artificial Intelligence, Edinburgh, Scotland, pp. 728–733 (2005)
Hart, P.E.: The condensed nearest neighbour rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)
Kim, S.W., Oommen, B.J.: A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal. Appl. 6, 232–244 (2003)
Kuncheva, L.I., Bezdek, J.C.: Nearest prototype classification: clustering, genetic algorithm or random search? IEEE Trans. Syst. Man Cybern. 28(1), 160–164 (1998)
Leyva, E., Gonzalez, A., Perez, R.: Three new instances selection methods based on local sets: a comparative study with several approaches from bi-objective perspective. Pattern Recogn. 48(4), 1523–1537 (2015)
Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)
Machine Learning Data Set Repository (2013). http://mldata.org/repository/tags/data/IDA_Benchmark_Repository/
Olvera-Lopez, J.A., Carrasco-Ochoa, A.J., Martnez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010). doi:10.1007/s10462-010-9165-y
Raman, B.: Enhancing Learning Using Feature and Example Selection. Texas A&M University, College Station (2003)
Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest decision rule. IEEE Trans. Inf. Theory 21, 665–669 (1975)
Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 293–301 (1994)
Song, H.H., Lee, S.W.: LVQ combined with simulated annealing for optimal design of large-set reference models. Neural Netw. 9(2), 329–336 (1996)
Tomek, I.: An experiment with the edited nearest-neighbour rule. IEEE Trans. Syst. Man Cybern. 6–6, 448–452 (1976)
Tsai, C.F., Eberle, W., Chu, C.Y.: Genetic algorithms in feature and instance selection. Knowl. Based Syst. 39, 240–247 (2013). doi:10.1016/j.knosys.2012.11.005
Lin, W.C., Tsai, C.F., Ke, S.W., Hung, C.W., Eberle, W.: Learning to detect representative data for large scale instance selection. J. Syst. Softw. 106, 1–8 (2015)
Wilson, D.R., Martinez, T.R.: An integrated instance-based learning algorithm. Comput. Intell. 16, 1–28 (2000)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithm. Mach. Learn. 38, 257–286 (2000). Kluwer Academic Publishers, Boston
Winton, D., Pete, E.: Using instance selection to combine multiple models learned from disjoint subsets. In: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)
Wu, Y., Ianakiev, K., Govindaraju, V.: Improvements in k-nearest neighbor classification. In: Singh, S., Murshed, N., Kropatsch, W.G. (eds.) ICAPR 2001. LNCS, vol. 2013, pp. 222–229. Springer, Heidelberg (2001)
Yang, M.-S., Yi-Cheng, T.: Bias-correction fuzzy clustering algorithms. Inf. Sci. 309, 138–162 (2015)
Yu, K., Xiaowei, X., Ester, M., Kriegel, H.P.: Feature weighting and instance selection for collaborative filtering: an information-theoretic approach. Knowl. Inf. Syst. 5(2), 201–224 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Czarnowski, I., Jȩdrzejowicz, J., Jȩdrzejowicz, P. (2016). Bi-criteria Data Reduction for Instance-Based Classification. In: Nguyen, NT., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9875. Springer, Cham. https://doi.org/10.1007/978-3-319-45243-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-45243-2_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45242-5
Online ISBN: 978-3-319-45243-2
eBook Packages: Computer ScienceComputer Science (R0)