Advertisement

Parallel Generalized Association Rule Mining on Large Scale PC Cluster

  • Takahiko Shintani
  • Masaru Kitsuregawa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1759)

Abstract

One of the most important problems in data mining is discovery of association rules in large database. In our previous study, we proposed parallel algorithms and candidate duplication based load balancing strategies for mining generalized association rules and showed our algorithms could attain good performance on 16 nodes parallel computer system. However, as the number of nodes increase, it would be difficult to achieve flat workload distribution.

In this paper, we present the candidate partition based load balancing strategy for parallel algorithm of generalized association rule mining. This strategy partitions the candidate itemsets so that the number of candidate probes for each node is equalized each other with estimated support count by the information of previous pass. Moreover, we implement the parallel algorithms and load balancing strategies for mining generalized association rules on a cluster of 100 PCs interconnected with an ATM network, and analyze the performance using a large amount of transaction dataset. Through the several experiments, we showed the load balancing strategy, which partition the candidate itemsets with considering the distribution of candidate probes and duplicate the frequently occurring candidate itemsets, can attain high performance and achieve good workload distribution on one hundred PC cluster system.

Keywords

Association Rule Parallel Algorithm Minimum Support Transaction Data Candidate Probe 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tamura, T., Oguchi, M., Kitsuregawa, M.: Parallel database processing on a 100 node pc cluster: Cases for decision support query processing and data mining. In: Proceedings of Supercomputing 97::High Performance Networking and Computing. (1997)Google Scholar
  2. 2.
    Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proceedings of 21th International Conference on Very Large Data Bases. (1995) 407–419Google Scholar
  3. 3.
    Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceedings of 21th International Conference on Very Large Data Bases. (1995) 420–431Google Scholar
  4. 4.
    Shintani, T., Kitsuregawa, M.: Parallel algorithms for mining generalized association rules with classification hierarchy. In: Proceedings of 1998 ACM SIGMOD International Conference on Management of Data. (1998) 25–36Google Scholar
  5. 5.
    Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proceedings of 1997 ACM SIGMOD International Conference on Management of Data. (1997) 277–288Google Scholar
  6. 6.
    Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A fast distributed algorithm for mining association rules. In: Proceedings of 4th International Conference on Parallel and Distributed Information Systems. (1996) 31–42Google Scholar
  7. 7.
    Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient mining of association rules in distributed databases. In: IEEE Transactions on Knowledge and Data Engineering, Vol.8, No. 6. (1996) 911–922CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Takahiko Shintani
    • 1
  • Masaru Kitsuregawa
    • 1
  1. 1.Institute of Industrial ScienceThe University of TokyoTokyo

Personalised recommendations