Advertisement

Efficient Distributed Data Condensation for Nearest Neighbor Classification

  • Fabrizio Angiulli
  • Gianluigi Folino
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)

Abstract

In this work, PFCNN, a distributed method for computing a consistent subset of very large data sets for the nearest neighbor decision rule is presented. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, different variants of the basic PFCNN method are introduced. Experimental results, performed on a class of synthetic datasets revealed that these methods can be profitably applied to enormous collections of data. Indeed, they scale-up well and are efficient in memory consumption and achieve noticeable data reduction and good classification accuracy. To the best of our knowledge, this is the first distributed algorithm for computing a training set consistent subset for the nearest neighbor rule.

Keywords

Execution Time Communication Overhead Memory Usage Voronoi Cell Memory Consumption 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angiulli, F.: Fast condensend nearest neighbor rule. In: Proc. of the 22nd International Conference on Machine Learning, Bonn, Germany, pp. 25–32 (2005)Google Scholar
  2. 2.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. on Inform. Th. 13(1), 21–27 (1967)zbMATHCrossRefGoogle Scholar
  3. 3.
    Devi, F.S., Murty, M.N.: An incremental prototype set building technique. Pat. Recognition 35(2), 505–513 (2002)zbMATHCrossRefGoogle Scholar
  4. 4.
    Devroye, L.: On the inequality of cover and hart in nearest neighbor discrimination. IEEE Trans. on Pat. Anal. and Mach. Intel. 3, 75–78 (1981)zbMATHCrossRefGoogle Scholar
  5. 5.
    Foster, I., Kesselman, C.: The Grid2: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (2003)Google Scholar
  6. 6.
    Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. on Inform. Th. 14(3), 515–516 (1968)CrossRefGoogle Scholar
  7. 7.
    Karaçali, B., Krim, H.: Fast minimization of structural risk by nearest neighbor rule. IEEE Trans. on Neural Networks 14(1), 127–134 (2002)CrossRefGoogle Scholar
  8. 8.
    Stone, C.: Consistent nonparametric regression. Annals of Statistics 8, 1348–1360 (1977)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Fabrizio Angiulli
    • 1
  • Gianluigi Folino
    • 2
  1. 1.DEIS, Università della Calabria, Via P. Bucci 41C, 87036, Rende (CS)Italy
  2. 2.Institute of High Performance Computing and Networking (ICAR-CNR), Via P. Bucci 41C, 87036 Rende (CS)Italy

Personalised recommendations