Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations

  • R. Baraglia
  • D. Laforenza
  • Salvatore Orlando
  • P. Palmerini
  • Raffaele Perego
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1800)


This paper investigates scalable implementations of out-of-core I/O-intensive Data Mining algorithms on affordable parallel architectures, such as clusters of workstations. In order to validate our approach, the K-means algorithm, a well known DM Clustering algorithm, was used as a test case.


Main Memory Implementation Issue Physical Memory Load Imbalance Data Ining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jain A.K. and Dubes R.C. Algorithms for Clustering Data. Prentice Hall, 1988.Google Scholar
  2. 2.
    M. Beck et al. Linux Kernel Internals, 2nd ed. Addison-Wesley, 1998.Google Scholar
  3. 3.
    Rajkumar Buyya, editor. High Performance Cluster Computing. Prentice Hall PTR, 1999.Google Scholar
  4. 4.
    I. S. Dhillon and D. S. Modha. A data clustering algorithm on distributed memory machines. In ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 1999.Google Scholar
  5. 5.
    A. A. Freitas and S. H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer Academin Publishers, 1998.Google Scholar
  6. 6.
    V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999.Google Scholar
  7. 7.
    E. Han, G. Karypis, and V. Kumar. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering. To appear.Google Scholar
  8. 8.
    J.A. Hartigan. Clustering Algorithms. Wiley & Sons, 1975.Google Scholar
  9. 9.
    G. Karypis, E. Han, and V. Kumar. Chameleon: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer, 32:68–75, 1999.Google Scholar
  10. 10.
    Mac Queen, J.B. Some Methods for Classification and Analysis of Multivariate Observation. 5 th Berkeley Symp. on Mathematical Statistics and Probability, pages 281–297. Univ. of California Press, 1967.Google Scholar
  11. 11.
    Chris Ruemmler and John Wilkes. An Introduction to Disk Drive Modeling. IEEE Computer, 27(3):17–28, March 1994.Google Scholar
  12. 12.
    K. Stoffel and A. Belkoniene. Parallel k-means clustering for large datasets. EuroPar’99 Parallel Processing, Lecture Notes in Computer Science, No. 1685. Springer-Verlag, 1999.Google Scholar
  13. 13.
    Sterling T.L., Salmon J., Becker D.J., and Savarese D.F. How to Build a Beowulf. A guide to the Implementation and Application of PC Clusters. The MIT Press, 1999.Google Scholar
  14. 14.
    J. S. Vitter. External Memory Algorithms and Data Structures. In External Memory Algorithms (DIMACS Series on Discrete Mathematics and Theoretical Computer Science). American Mathematical Society, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • R. Baraglia
    • 1
  • D. Laforenza
    • 1
  • Salvatore Orlando
    • 2
  • P. Palmerini
    • 1
  • Raffaele Perego
    • 1
  1. 1.Istituto CNUCEConsiglio Nazionale delle Ricerche (CNR)PisaItaly
  2. 2.Dipartimento di InformaticaUniversità Ca’ Fbscari di VeneziaItaly

Personalised recommendations