Advertisement

Parallel Data Mining on ATM-Connected PC Cluster and Optimization of its Execution Environments

  • Masato Oguchi
  • Masaru Kitsuregawa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1800)

Abstract

In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot pro vide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a TCP retransmission parameters according to the proposed parameter optimization, reasonably good performance improvement is achieved for parallel data mining on 100 PCs.

Association rule mining, one of the best-known problems in data mining, differs from conventional scientific calculations in its usage of main memory. We have investigated the feasibility of using available memory on remote nodes as a swap area when working nodes need to swap out their real memory contents. According to the experimental results on our PC cluster, the proposed method is expected to be considerably better than using hard disks as a swapping device.

Keywords

Execution Time Association Rule Hard Disk Minimum Support Association Rule Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. Huang and P. K. McKinley: “Communication Issues in Parallel Computing Across ATM Networks”, IEEE Parallel and Distributed Technology, Vol.2, No.4, pp.73–86, 1994.CrossRefGoogle Scholar
  2. 2.
    R. Carter and J. Laroco: “Commodity Clusters: Performance Comparison Between PC’s and Workstations”, Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing, pp.292–304, August 1996.Google Scholar
  3. 3.
    D. E. Culler et al.: “Parallel Computing on the Berkeley NOW”, Proceedings of the 1997 Joint Symposium on Parallel Processing(JSPP’ 97), pp.237–247, May 1997.Google Scholar
  4. 4.
    T. Tamura, M. Oguchi, and M. Kitsuregawa: “Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining”, Proceedings of SuperComputing’ 97, November 1997.Google Scholar
  5. 5.
    U. M. Fayyad et al.: “Advances in Knowledge Discovery and Data Mining”, The MIT Press, 1996.Google Scholar
  6. 6.
    V. Ganti, J. Gehrke, and R. Ramakrishnan: “Mining Very Large Databases”, IEEE Computer, Vol.32, No.8, pp.38–45, August 1999.Google Scholar
  7. 7.
    R. Agrawal, T. Imielinski, and A. Swami: “Mining Association Rules between Sets of Items in Large Databases”, Proceedings of the A CM International Conference on Management of Data, pp.207–216, May 1993.Google Scholar
  8. 8.
    T. Shintani and M. Kitsuregawa: “Hash Based Parallel Algorithms for Mining Association Rules”, Proceedings of the Fourth IEEE International Conference on Parallel and Distributed Information Systems, pp.19–30, December 1996.Google Scholar
  9. 9.
    M. J. Zaki: “Parallel and Distributed Association Mining: A Survey”, IEEE Concurrency, Vol.7, No.4, pp.14–25, 1999.CrossRefGoogle Scholar
  10. 10.
    C. Amza et al.: “TreadMarks: Shared Memory Computing on Networks of Workstations”, IEEE Computer, Vol.29, No.2, pp.18–28, February 1996.Google Scholar
  11. 11.
    M. J. Feeley et al.: “Implementing Global Memory Management in a Workstation Cluster”, Proceedings of the ACM Symposium on Operating Systems Principles, pp.201–212, December 1995.Google Scholar
  12. 12.
    S. Dar et al.: “Semantic Data Caching and Replacement”, Proceedings of 22nd VLDB Conference, September 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Masato Oguchi
    • 1
    • 2
  • Masaru Kitsuregawa
    • 1
  1. 1.Institute of Industrial ScienceThe University of TokyoTokyoJapan
  2. 2.Informatik4Aachen University of TechnologyAachenGermany

Personalised recommendations