Parallel Data Mining on Large Scale PC Cluster

Kitsuregawa, Masaru; Shintani, Takahiko; Tamura, Masahisa; Pramudiono, Iko

doi:10.1007/3-540-45151-X_2

Masaru Kitsuregawa⁶,
Takahiko Shintani⁶,
Masahisa Tamura⁶ &
…
Iko Pramudiono⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1846))

Included in the following conference series:

International Conference on Web-Age Information Management

368 Accesses

Abstract

PC cluster is recently regarded as one of the most promising platforms for heavy data intensive applications, such as decision support query processing and data mining. We proposed some new parallel algorithms to mine association rule and generalized association rule with taxonomy and showed that PC cluster can handle large scale mining with them. During development of high performance parallel mining system on PC cluster, we found that heterogeneity is inevitable to take the advantage of rapid progress of PC hardware. However we can not naively apply existing parallel algorithms since they assume homogeneity. We proposed the new dynamic load balancing methods for association rule mining, which works under heterogeneous system. Two strategies, called candidate migration and transaction migration are proposed. Initially first one is invoked. When the load imbalance cannot be resolved with the first method, the second one is employed, which is costly but more effective for strong imbalance. The experimental results confirm that the proposed approach can very effectively balance the workload among heterogeneous PCs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. W. Cheung, J. Han, V. T. Ng, A. W. Fu, and Y. Fu. “A Fast Distributed Algorithms for Mining Association Rules.” In Proc. of PDIS, pp. 31–42, Dec. 1996.
Google Scholar
H. M. Dewan, M. A. Hernandez, K. W. Mok, S.J. Stolfo “Predictive Dynamic Load Balancing of Parallel Hash-Joins Over Heterogeneous Processors in the Presence of Data Skew.” In Proc. of PDIS, pp. 40–49, 1994.
Google Scholar
D. DeWitt and J. Gray “Parallel Database Systems: The Future of High Performance Database Systems.” In Communications of the ACM, Vol. 35, No. 6, pp. 85–98, Jun. 1992.
Article Google Scholar
E.-H. Han and G. Karypis and Vipin Kumar ”Scalable Parallel Data Mining for Association Rules.” In Proc. of SIGMOD, pp. 277–288, May. 1997
Google Scholar
M. Tamura, M. Kitsuregawa. ”Dynamic Load Balancing for Parallel Association Rule Mining on Heterogeneous PC Cluster System”. In Proc. of VLDB, 1999.
Google Scholar
M. Kitsuregawa, T. Tamura, M. Oguchi “Parallel Database Processing/Data Mining on Large Scale ATM Connected PC Cluster.” In Euro-PDS, pp. 313–320, Jun. 1997
Google Scholar
M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li “Parallel Algorithms for Discovery of Association Rules”. Data Mining and Knowledge Discovery, Dec. 1997.
Google Scholar
J. S. Park, M.-S. Chen, P. S. Yu ”Efficient Parallel Algorithms for Mining Association Rules” In Proc. of CIKM, pp. 31–36, Nov. 1995
Google Scholar
R. Agrawal and R. Srikant. ”Fast Algorithms for Mining Association Rules”. In Proc. of VLDB, pp. 487–499, Sep. 1994.
Google Scholar
R. Agrawal and J. C. Shafer. “Parallel Mining of Associaton Rules”. In IEEE TKDE, Vol. 8, No. 6, pp. 962–969, Dec. 1996.
Google Scholar
R. Srikant, R. Agrawal. ”Mining Generalized Association Rules”. In Proc. of VLDB, 1995.
Google Scholar
S. Parthasarathy and M. J. Zaki and W. Li “Memory Placement Techniques for Parallel Association Mining.” In Proc. of KDD, pp. 304–308, Aug. 1998
Google Scholar
T. Shintani, M. Oguchi, M. Kitsuregawa. ”Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster”. In Proc. of Euro-par, 1999.
Google Scholar
T. Shintani and M. Kitsuregawa “Hash Based Parallel Algorithms for Mining Association Rules”. In Proc. of PDIS, pp. 19–30, Dec. 1996.
Google Scholar
T. Shintani, M. Kitsuregawa “Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy.” In Proc. of SIGMOD, pp. 25–36, 1998.
Google Scholar
T. Tamura, M. Oguchi, M. Kitsuregawa “Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining.” In Super Computing 97::High Performance Networking and Computing, 1997
Google Scholar
Y. Xiao and D. W. Cheung “Effect of Data Skewness in Parallel Data Mining of Association Rules”. In Proc. of PAKDD, pp. 48–60, Apr. 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, 7-22-1, Roppongi, Minato-ku, Tokyo, 106, Japan
Masaru Kitsuregawa, Takahiko Shintani, Masahisa Tamura & Iko Pramudiono

Authors

Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar
Takahiko Shintani
View author publications
You can also search for this author in PubMed Google Scholar
Masahisa Tamura
View author publications
You can also search for this author in PubMed Google Scholar
Iko Pramudiono
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Hongjun Lu
Department of Computer Science, Fudan University, 220 Handan Road, Shanghai, China
Aoying Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kitsuregawa, M., Shintani, T., Tamura, M., Pramudiono, I. (2000). Parallel Data Mining on Large Scale PC Cluster. In: Lu, H., Zhou, A. (eds) Web-Age Information Management. WAIM 2000. Lecture Notes in Computer Science, vol 1846. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45151-X_2

Download citation

DOI: https://doi.org/10.1007/3-540-45151-X_2
Published: 07 November 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67627-0
Online ISBN: 978-3-540-45151-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics