Skip to main content

Parallel Data Mining on Large Scale PC Cluster

  • Conference paper
  • First Online:
Book cover Web-Age Information Management (WAIM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1846))

Included in the following conference series:

  • 368 Accesses

Abstract

PC cluster is recently regarded as one of the most promising platforms for heavy data intensive applications, such as decision support query processing and data mining. We proposed some new parallel algorithms to mine association rule and generalized association rule with taxonomy and showed that PC cluster can handle large scale mining with them. During development of high performance parallel mining system on PC cluster, we found that heterogeneity is inevitable to take the advantage of rapid progress of PC hardware. However we can not naively apply existing parallel algorithms since they assume homogeneity. We proposed the new dynamic load balancing methods for association rule mining, which works under heterogeneous system. Two strategies, called candidate migration and transaction migration are proposed. Initially first one is invoked. When the load imbalance cannot be resolved with the first method, the second one is employed, which is costly but more effective for strong imbalance. The experimental results confirm that the proposed approach can very effectively balance the workload among heterogeneous PCs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. W. Cheung, J. Han, V. T. Ng, A. W. Fu, and Y. Fu. “A Fast Distributed Algorithms for Mining Association Rules.” In Proc. of PDIS, pp. 31–42, Dec. 1996.

    Google Scholar 

  2. H. M. Dewan, M. A. Hernandez, K. W. Mok, S.J. Stolfo “Predictive Dynamic Load Balancing of Parallel Hash-Joins Over Heterogeneous Processors in the Presence of Data Skew.” In Proc. of PDIS, pp. 40–49, 1994.

    Google Scholar 

  3. D. DeWitt and J. Gray “Parallel Database Systems: The Future of High Performance Database Systems.” In Communications of the ACM, Vol. 35, No. 6, pp. 85–98, Jun. 1992.

    Article  Google Scholar 

  4. E.-H. Han and G. Karypis and Vipin Kumar ”Scalable Parallel Data Mining for Association Rules.” In Proc. of SIGMOD, pp. 277–288, May. 1997

    Google Scholar 

  5. M. Tamura, M. Kitsuregawa. ”Dynamic Load Balancing for Parallel Association Rule Mining on Heterogeneous PC Cluster System”. In Proc. of VLDB, 1999.

    Google Scholar 

  6. M. Kitsuregawa, T. Tamura, M. Oguchi “Parallel Database Processing/Data Mining on Large Scale ATM Connected PC Cluster.” In Euro-PDS, pp. 313–320, Jun. 1997

    Google Scholar 

  7. M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li “Parallel Algorithms for Discovery of Association Rules”. Data Mining and Knowledge Discovery, Dec. 1997.

    Google Scholar 

  8. J. S. Park, M.-S. Chen, P. S. Yu ”Efficient Parallel Algorithms for Mining Association Rules” In Proc. of CIKM, pp. 31–36, Nov. 1995

    Google Scholar 

  9. R. Agrawal and R. Srikant. ”Fast Algorithms for Mining Association Rules”. In Proc. of VLDB, pp. 487–499, Sep. 1994.

    Google Scholar 

  10. R. Agrawal and J. C. Shafer. “Parallel Mining of Associaton Rules”. In IEEE TKDE, Vol. 8, No. 6, pp. 962–969, Dec. 1996.

    Google Scholar 

  11. R. Srikant, R. Agrawal. ”Mining Generalized Association Rules”. In Proc. of VLDB, 1995.

    Google Scholar 

  12. S. Parthasarathy and M. J. Zaki and W. Li “Memory Placement Techniques for Parallel Association Mining.” In Proc. of KDD, pp. 304–308, Aug. 1998

    Google Scholar 

  13. T. Shintani, M. Oguchi, M. Kitsuregawa. ”Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster”. In Proc. of Euro-par, 1999.

    Google Scholar 

  14. T. Shintani and M. Kitsuregawa “Hash Based Parallel Algorithms for Mining Association Rules”. In Proc. of PDIS, pp. 19–30, Dec. 1996.

    Google Scholar 

  15. T. Shintani, M. Kitsuregawa “Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy.” In Proc. of SIGMOD, pp. 25–36, 1998.

    Google Scholar 

  16. T. Tamura, M. Oguchi, M. Kitsuregawa “Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining.” In Super Computing 97::High Performance Networking and Computing, 1997

    Google Scholar 

  17. Y. Xiao and D. W. Cheung “Effect of Data Skewness in Parallel Data Mining of Association Rules”. In Proc. of PAKDD, pp. 48–60, Apr. 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kitsuregawa, M., Shintani, T., Tamura, M., Pramudiono, I. (2000). Parallel Data Mining on Large Scale PC Cluster. In: Lu, H., Zhou, A. (eds) Web-Age Information Management. WAIM 2000. Lecture Notes in Computer Science, vol 1846. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45151-X_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-45151-X_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67627-0

  • Online ISBN: 978-3-540-45151-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics