Bi-criteria Data Reduction for Instance-Based Classification

Czarnowski, Ireneusz; Jȩdrzejowicz, Joanna; Jȩdrzejowicz, Piotr

doi:10.1007/978-3-319-45243-2_41

Ireneusz Czarnowski¹⁷,
Joanna Jȩdrzejowicz¹⁸ &
Piotr Jȩdrzejowicz¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9875))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1287 Accesses

Abstract

One of the approaches to deal with the big data problem is the training data reduction which may improve generalization quality and decrease complexity of the data mining algorithm. In this paper we see the instance reduction problem as the multiple objective one and propose criteria which allow to generate the Pareto-optimal set of ‘typical’ instances. Next, the reduced dataset is used to construct classification function or to induce a classifier. The approach is validated experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 66, 37–66 (1991)
Google Scholar
Andrews, N.O., Fox, E.A.: Clustering for data reduction: a divide and conquer approach. Technical Report TR-07-36, Computer Science, Virginia Tech (2007)
Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science (2007). http://www.ics.uci.edu/mlearn/MLRepository.html
Carbonera, J.L., Abel, M.: A density-based approach for instance selection. In: Proceedings of the 2015 IEEE 27th International Conference on Tool with Artificial Intelligence, pp. 768–774 (2015). doi:10.1109/ICTAI.2015.114
Chin-Liang, C.: Finding prototypes for nearest neighbor classifier. IEEE Trans. Comput. 23(11), 1179–1184 (1974)
Article MATH Google Scholar
Czarnowski, I., Jȩdrzejowicz, P.: An agent-based approach to the multiple-objective selection of reference vectors. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 117–130. Springer, Heidelberg (2007)
Chapter Google Scholar
Czarnowski, I., Jȩdrzejowicz, P.: An approach to instance reduction in supervised learning. In: Research and Development in Intelligent Systems XX, pp. 267–282. Springer, London (2004)
Google Scholar
Czarnowski, I., Jȩdrzejowicz, P.: An approach to data reduction and integrated machine classification. New Generation Comput. 28, 21–40 (2010)
Article MATH Google Scholar
Czarnowski, I.: Distributed learning with data reduction. In: Nguyen, N.T. (ed.) TCCI IV 2011. LNCS, vol. 6660, pp. 3–121. Springer, Heidelberg (2011)
Chapter Google Scholar
Czarnowski, I., Jędrzejowicz, P.: A new cluster-based instance selection algorithm. In: O’Shea, J., Nguyen, N.T., Crockett, K., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2011. LNCS, vol. 6682, pp. 436–445. Springer, Heidelberg (2011)
Chapter Google Scholar
Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. Fuzzy Syst. 11(2), 262–270 (2013)
Article Google Scholar
Garcia, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer (2015)
Google Scholar
Grudzinski, K., Duch, W.: SBL-PM: simple algorithm for selection of reference instances in similarity based methods. In: Proceedings of the Intelligence Systems, Bystra, Poland, pp. 99–107 (2000)
Google Scholar
Hamo, Y., Markovitch, S.: The COMPSET algorithm for subset selection. In: Proceedings of the 19 International Joint Conference for Artificial Intelligence, Edinburgh, Scotland, pp. 728–733 (2005)
Google Scholar
Hart, P.E.: The condensed nearest neighbour rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)
Article Google Scholar
Kim, S.W., Oommen, B.J.: A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal. Appl. 6, 232–244 (2003)
Article MathSciNet Google Scholar
Kuncheva, L.I., Bezdek, J.C.: Nearest prototype classification: clustering, genetic algorithm or random search? IEEE Trans. Syst. Man Cybern. 28(1), 160–164 (1998)
Article Google Scholar
Leyva, E., Gonzalez, A., Perez, R.: Three new instances selection methods based on local sets: a comparative study with several approaches from bi-objective perspective. Pattern Recogn. 48(4), 1523–1537 (2015)
Article Google Scholar
Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)
Book Google Scholar
Machine Learning Data Set Repository (2013). http://mldata.org/repository/tags/data/IDA_Benchmark_Repository/
Olvera-Lopez, J.A., Carrasco-Ochoa, A.J., Martnez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010). doi:10.1007/s10462-010-9165-y
Article Google Scholar
Raman, B.: Enhancing Learning Using Feature and Example Selection. Texas A&M University, College Station (2003)
Google Scholar
Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest decision rule. IEEE Trans. Inf. Theory 21, 665–669 (1975)
Article MATH Google Scholar
Skalak, D.B.: Prototype and feature selection by sampling and random mutation hill climbing algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 293–301 (1994)
Google Scholar
Song, H.H., Lee, S.W.: LVQ combined with simulated annealing for optimal design of large-set reference models. Neural Netw. 9(2), 329–336 (1996)
Article MathSciNet Google Scholar
Tomek, I.: An experiment with the edited nearest-neighbour rule. IEEE Trans. Syst. Man Cybern. 6–6, 448–452 (1976)
MathSciNet MATH Google Scholar
Tsai, C.F., Eberle, W., Chu, C.Y.: Genetic algorithms in feature and instance selection. Knowl. Based Syst. 39, 240–247 (2013). doi:10.1016/j.knosys.2012.11.005
Article Google Scholar
Lin, W.C., Tsai, C.F., Ke, S.W., Hung, C.W., Eberle, W.: Learning to detect representative data for large scale instance selection. J. Syst. Softw. 106, 1–8 (2015)
Article Google Scholar
Waikato. http://moa.cms.waikato.ac.nz/datasets/2013
Wilson, D.R., Martinez, T.R.: An integrated instance-based learning algorithm. Comput. Intell. 16, 1–28 (2000)
Article MathSciNet Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithm. Mach. Learn. 38, 257–286 (2000). Kluwer Academic Publishers, Boston
Article MATH Google Scholar
Winton, D., Pete, E.: Using instance selection to combine multiple models learned from disjoint subsets. In: Instance Selection and Construction for Data Mining. Kluwer, Dordrecht (2001)
Google Scholar
Wu, Y., Ianakiev, K., Govindaraju, V.: Improvements in k-nearest neighbor classification. In: Singh, S., Murshed, N., Kropatsch, W.G. (eds.) ICAPR 2001. LNCS, vol. 2013, pp. 222–229. Springer, Heidelberg (2001)
Google Scholar
Yang, M.-S., Yi-Cheng, T.: Bias-correction fuzzy clustering algorithms. Inf. Sci. 309, 138–162 (2015)
Article Google Scholar
Yu, K., Xiaowei, X., Ester, M., Kriegel, H.P.: Feature weighting and instance selection for collaborative filtering: an information-theoretic approach. Knowl. Inf. Syst. 5(2), 201–224 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems, Gdynia Maritime University, Morska 83, 81-225, Gdynia, Poland
Ireneusz Czarnowski & Piotr Jȩdrzejowicz
Institute of Informatics, Gdańsk University, Wita Stwosza 57, 80-952, Gdańsk, Poland
Joanna Jȩdrzejowicz

Authors

Ireneusz Czarnowski
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Jȩdrzejowicz
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Jȩdrzejowicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joanna Jȩdrzejowicz .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc-Thanh Nguyen
Aristotle University of Thessaloniki, Thessaloniki, Greece
Lazaros Iliadis
Department of Forestry and Management, Democritus University of Thrace, Orestiada, Thrace, Greece
Yannis Manolopoulos
Wrocław University of Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Czarnowski, I., Jȩdrzejowicz, J., Jȩdrzejowicz, P. (2016). Bi-criteria Data Reduction for Instance-Based Classification. In: Nguyen, NT., Iliadis, L., Manolopoulos, Y., Trawiński, B. (eds) Computational Collective Intelligence. ICCCI 2016. Lecture Notes in Computer Science(), vol 9875. Springer, Cham. https://doi.org/10.1007/978-3-319-45243-2_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-45243-2_41
Published: 20 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45242-5
Online ISBN: 978-3-319-45243-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics