Abstract
Instance selection is a feasible strategy to solve the problem of dealing with large databases in inductive learning. There are several proposals in this area, but none of them consistently outperforms the others over a wide range of domains. In this paper we present a set of measures to characterize the databases, as well as a new algorithm that uses these measures and, depending on the data characteristics, it applies the method or combination of methods expected to produce the best results. This approach was evaluated over 20 databases and with six different learning paradigms. The results have been compared with those achieved by five well-known state-of-the-art methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brighton, H., Mellish, C.: Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining & Knowledge Disc. 6, 153–172 (2002)
Kim, S., Oommen, B.: A Brief Taxonomy and Ranking of Creative Prototype Reduction Schemes. Patt. Anal. Applic. 6, 232–244 (2003)
Reinartz, T.: A Unifying View on Instance Selection. Data Mining & Knowledge Disc. 6, 191–210 (2002)
Mollineda, R., Sánchez, J., Sotoca, J.: Data Characterization for Effective Prototype Selection. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523, pp. 27–34. Springer, Heidelberg (2005)
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Trans. on IT 14, 515–516 (1968)
Wilson, D.: Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Trans. on Syst., Man, and Cybernetics 2(3), 408–421 (1972)
Kruskal, J.: On the Shortest Spanning Subtree of a Graph and the Travelling Salesman Problem. Proc. of the Amer. Math. Soc. 7(1), 48–50 (1956)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Machine Learning 6(1), 37–66 (1991)
Kim, S., Oommen, B.: Enhancing Prototype Reduction Schemes with LVQ3-Type Algorithms. Patt. Recognition 36, 1083–1093 (2003)
Zhao, K., Zhou, S., Guan, J., Zhou, A.: C-pruner: An Improved Instance Pruning Algorithm. In: Int. Conf. on Machine Learning & Cybernetics, 2003, vol. 1, pp. 94–99 (2003)
Aha, D.W. (ed.): Lazy Learning. Kluwer Academic Publishers, Norwell (1997)
Wilson, D., Martinez, T.: Reduction Techniques for Instance-Based Learning Algorithms. Machine Learning 38, 257–286 (2000)
González, A., Pérez, R.: SLAVE: A Genetic Learning System Based on an Iterative Approach. IEEE Trans. on Fuzzy Systems 7, 176–191 (1999)
Quinlan, J.R.: C4.5: Program for Machine Learning. M. Kaufman, S. Mateo (1993)
Cover, T., Hart, P.: Nearest Neighbor Pattern Classification. IEEE Trans on Information Theory 13(1), 21–27 (1967)
UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
Bernadó-Mansilla, E., Llorá, X., Garrel, J.: XCS and GALE: A Comparative Study of Two Learning Classifier. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS (LNAI), vol. 2321, pp. 115–132. Springer, Heidelberg (2002)
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: 11th Conf. on Uncertainty in AI, pp. 338–345. Morgan Kaufmann, San Mateo (1995)
Frank, E., Witten, I.: Generating Accurate Rule Sets without Global Optimization. In: 15th Int. Conf. on Machine Learning, pp. 144–151. Morgan Kaufmann, San Francisco (1998)
Platt, J.: Fast Training of Support Vector Machines Using SMO. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Adv. in Kernel Methods, pp. 185–208. MIT Press, Cambridge (1999)
Demsar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. JMLR 7, 1–30 (2006)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is “Nearest Neighbor” Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Aggarwal, C., Hinneburg, A., Keim, D.: On the Surprising Behavior of Distance Metrics in High Dimensional Space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Caises, Y., González, A., Leyva, E., Pérez, R. (2009). SCIS: Combining Instance Selection Methods to Increase Their Effectiveness over a Wide Range of Domains. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-04394-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)