Parallel Generalized Association Rule Mining on Large Scale PC Cluster

Shintani, Takahiko; Kitsuregawa, Masaru

doi:10.1007/3-540-46502-2_7

Takahiko Shintani³ &
Masaru Kitsuregawa³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1759))

700 Accesses
4 Citations

Abstract

One of the most important problems in data mining is discovery of association rules in large database. In our previous study, we proposed parallel algorithms and candidate duplication based load balancing strategies for mining generalized association rules and showed our algorithms could attain good performance on 16 nodes parallel computer system. However, as the number of nodes increase, it would be difficult to achieve flat workload distribution.

In this paper, we present the candidate partition based load balancing strategy for parallel algorithm of generalized association rule mining. This strategy partitions the candidate itemsets so that the number of candidate probes for each node is equalized each other with estimated support count by the information of previous pass. Moreover, we implement the parallel algorithms and load balancing strategies for mining generalized association rules on a cluster of 100 PCs interconnected with an ATM network, and analyze the performance using a large amount of transaction dataset. Through the several experiments, we showed the load balancing strategy, which partition the candidate itemsets with considering the distribution of candidate probes and duplicate the frequently occurring candidate itemsets, can attain high performance and achieve good workload distribution on one hundred PC cluster system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tamura, T., Oguchi, M., Kitsuregawa, M.: Parallel database processing on a 100 node pc cluster: Cases for decision support query processing and data mining. In: Proceedings of Supercomputing 97::High Performance Networking and Computing. (1997)
Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. In: Proceedings of 21th International Conference on Very Large Data Bases. (1995) 407–419
Google Scholar
Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: Proceedings of 21th International Conference on Very Large Data Bases. (1995) 420–431
Google Scholar
Shintani, T., Kitsuregawa, M.: Parallel algorithms for mining generalized association rules with classification hierarchy. In: Proceedings of 1998 ACM SIGMOD International Conference on Management of Data. (1998) 25–36
Google Scholar
Han, E.H., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proceedings of 1997 ACM SIGMOD International Conference on Management of Data. (1997) 277–288
Google Scholar
Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A fast distributed algorithm for mining association rules. In: Proceedings of 4th International Conference on Parallel and Distributed Information Systems. (1996) 31–42
Google Scholar
Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient mining of association rules in distributed databases. In: IEEE Transactions on Knowledge and Data Engineering, Vol.8, No. 6. (1996) 911–922
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, Tokyo
Takahiko Shintani & Masaru Kitsuregawa

Authors

Takahiko Shintani
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Mohammed J. Zaki
K55/B1, IBM Almaden Research Center, 650 Harry Road, San Jose, CA, 95120, USA
Ching-Tien Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shintani, T., Kitsuregawa, M. (2000). Parallel Generalized Association Rule Mining on Large Scale PC Cluster. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_7

Download citation

DOI: https://doi.org/10.1007/3-540-46502-2_7
Published: 17 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics