Advertisement

An Approach to Instance Reduction in Supervised Learning

  • Ireneusz Czarnowski
  • Piotr Jȩdrzejowicz

Abstract

The paper proposes a set of simple heuristic algorithms for instance reduction problem. Proposed algorithms can be used to increase efficiency of supervised learning. A reduced training set consisting of selected instances is used as an input for the machine-learning algorithm. This may result in reducing time needed for learning or increasing learning quality or both. The paper presents a collection of four algorithms, which are used to reduce the size of a training set. The algorithms are based on calculating for each instance in the original training set the value of its similarity coefficient. Values of the coefficient are used to group instances into clusters. Out of each cluster only a limited number of instances is selected to form a reduced training set. One of the proposed algorithms uses population-learning algorithm for selection of instances. The approach has been validated by means of computational experiment.

Keywords

Similarity Coefficient Original Training Generalization Accuracy Wisconsin Breast Cancer Reference Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Czarnowski, I. & Jedrzejowicz, P. An Approach to Artificial Neural Network Training. In: Max Bramer, Alun Preece and Franc Coenen (eds.) Research and Development in Intelligent Systems XIX, Springer, 2002, 149–162Google Scholar
  2. [2]
    Czarnowski, I. & Jedrzejowicz. P. Application of the Parallel Population Learning Algorithm to Training Feed-forward ANN. In: P. Sincak et all (eds.) Inteligent Technologies. Theory and Applications. IOS Press, Amsterdam, 2002, 10–16Google Scholar
  3. [3]
    Czarnowski, I. & Jedrzejowicz, P. Population Learning Metaheuristic for Neural Network Training. Proceedings of the Sixth International Conference on Neural Networks and Soft Computing (ICNNSC), Zakopane, 2002Google Scholar
  4. [4]
    Czarnowski, I., Jedrzejowicz, P., Ratajczak, E. Population Learning Algorithm-Example Implementations and Experiments. Proceedings of the Fourth Metaheuristics International Conference, Porto, 2001, 607–612Google Scholar
  5. [5]
    Czarnowski, I. & Jedrzejowicz, P. An Instance Reduction Algorithm for Supervised Learning. In: M.A. Klopotek, S.T. Wierzchori and K Trojanowski (eds.) Intelligent Information Processing and Web Mining, Springer, Berlin, 2003, 241–250Google Scholar
  6. [6]
    Gates, G.W. The Reduced Nearest Neighbour Rule. IEEE Transactions on Information Theory, IT-18-3, 1972, 431–433Google Scholar
  7. [7]
    Glover, F. Tabu Search-Part 1. ORSA Journal of Computing 1, 1990, 190–206CrossRefGoogle Scholar
  8. [8]
    Glover, F. Tabu Search-Part II. ORSA Journal of Computing 2, 1990, 4–32MATHCrossRefGoogle Scholar
  9. [9]
    Gómez-Ballester, E., Micó, L., Oncina, J. A Fast Approximated k-Median Algorithm. Structural, Syntactic and Statistical Pattern Recognition, Lecture Notes in Computer Science, Vol. 2396. Springer Verlag, Berlin, 2002, 684–690Google Scholar
  10. [10]
    Grudzinski, K & Duch, W. SBL-PM: Simple Algorithm for Selection of Reference Instances in Similarity Based Methods. Proceedings of the Intelligent Information Systems, Bystra, Poland, 2000, 99–107Google Scholar
  11. [11]
    Jedrzejowicz, P. Social Learning Algorithm as a Tool for Solving Some Difficult Scheduling Problems. Foundation of Computing and Decision Sciences (24), 1999, 51–66MathSciNetMATHGoogle Scholar
  12. [12]
    Li, J., Dong, G., Ramamohanarao, K Instance-based Classification by Emerging Patterns. Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Database. Lyon, France, 2000, 191–200Google Scholar
  13. [13]
    Likas, A., Vlassis, N., Verbeek, J.J. The Global k-Means Clustering Algorithm. Pattern Recognition 36(2), 2003Google Scholar
  14. [14]
    Mangasarian, O.L. & Wolberg, W.H. Cancer Diagnosis Via Linear Programming. SIAM News, 23(5), 1990, 1–18Google Scholar
  15. [15]
    Merz, C.J. & Murphy, P.M. UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998
  16. [16]
    Michalewicz, Z. Genetic Algorithms + Data Structures = Evolution Programs. 3rd edn. Springer-Verlag, Berlin Heidelberg New York, 1996Google Scholar
  17. [17]
    Salzberg, S. A Nearest Hyperrectangle Learning Method. Machine Learning, 6, 1991, 277–309Google Scholar
  18. [18]
    The European Network of Excellence on Intelligent Technologies for Smart Adaptive Systems (EUNITE)-EUNITE World competition in domain of Intelligent Technologies-http://neuron.tuke.sk/competition2/
  19. [19]
    Tomek, I. An Experiment with the Edited Nearest-Neighbour Rule. IEEE Transactions no Systems, Man, and Cybernetics, 6-6, 1976,448–452MathSciNetMATHCrossRefGoogle Scholar
  20. [20]
    Wilson, D.R. & Martinez, T.R. Reduction Techniques for Instancebased Learning Algorithm. Machine Learning, Kluwer Academic Publishers, Boston, 33-3, 2000, 257–286Google Scholar

Copyright information

© Springer-Verlag London 2004

Authors and Affiliations

  • Ireneusz Czarnowski
    • 1
  • Piotr Jȩdrzejowicz
    • 1
  1. 1.Department of Information SystemsGdynia Maritime UniversityGdyniaPoland

Personalised recommendations