Abstract
Parallel association rules mining is a high performance mining method. Until now there are many parallel algorithms to mine association rules, this paper emphatically analyses existing parallel mining algorithms’ realization skill and defects. On the basis, a new data structure, called FP-Forest, is designed with a multi-trees structure to store data. At the same time, a new parallel mining model is proposed according to the property of FP-Forest, which combines the advantage of data-parallel method and task-parallel method. First, database is reasonably divided to data processing nodes by core processor, and FP-Forest structure is built on data processing nodes for each sub-database. Secondly, core node perform a one-time synchronization merging for each FP-Forest, and every MFP-Tree on FP-Forest is dynamical assigned to corresponding mining node as sub-task by task-parallel technique. Furthermore, a fast parallel mining algorithm, namely F-FDPM, is presented to mine association rules according to above model, which mining process adopts frequent growth method basing on deepth-first searching strategy. From experimentation on real data sets, the algorithm has greatly enhanced association rules mining efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Sharfer, J.: Parallel Mining of Association Rules. IEEE Trans on Knowledge and Dara Engineering 8(6), 962–969 (1996)
Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel Data Mining for Association Rules on Shared-memory Multi-processors. In: Super computing 1996, Pittsburg, PA, November 1996, pp. 88–91. IEEE Press, New York (2006)
Park, J.S., Chen, M.S., Yu, P.S.: Efficient Parallel Data Mining for Association Rules. In: ACM Int’l Conf on Information and Knowledge Management, pp. 31–36. ACM Press, New York (1995)
Han, E.H., Karpis, G., Kumar, V.: Scalable Parallel Data Mining for Association Rules. In: Proc of the ACM SIGMOD Conference on Management of Data 1997, pp. 277–288. IEEE Press, New York (1997)
Schuster, A., Wolff, R.: Communication Efficient Distributed Mining of Association Rules. In: Proc of the ACM SIGMOD Int’1 Conference on Management of Data, Santa Barbara, California, pp. 473–484. ACM Press, New York (2001)
Cheung, D., Han, J., Ng, V.: A Fast Distributed Algorithm for Mining Association rules. In: Proc of 1996 int’1 Conf on Parallel and Distributed Information System, Miami Beach Florida, pp. 31–44. IEEE Press, New York (1996)
Cheung, D., Xiao, Y.: Effect of Data Skewness in Parallel Mining of Association rules. In: 12th Pacic-Asia Conference on Knowledge Discovery and Data Mining, Melbourne, Australia, pp. 48–60. Springer, Heidelberg (1998)
Cheung, D., Hu, K., Xia, S.: A Synchronous Parallel Algorithm for Mining Association Rules on Shared-memory Multi-processors. In: 10th ACM Symp Parallel Algorithms and Architectures, pp. 219–228. ACM Press, New York (1998)
Zaiane, O.R., EI-Hajj, M., Lu, P.: Fast Parallel Association Rule Mining Without Candidacy Generation. In: Proceedings IEEE International Conference on Data Mining 2001, pp. 665–668. IEEE Press, New York (2001)
Pramudiono, I., Kitsuregawa, M.: Parallel FP-Growth on PC Cluster. In: Proceedings of the 7th Pacific-Asia Conference of Knowledge Discovery and Data Mining 2003, pp. 467–473. Springer, Heidelberg (2003)
Merz, C. J., Merphy, P.: UCI Repository of Machine Learning Databases (1996), http://www.ics.uci.edu/~mlearn/MLRRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, J., Yang-Li, X. (2008). A Fast Parallel Association Rules Mining Algorithm Based on FP-Forest. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87734-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-87734-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87733-2
Online ISBN: 978-3-540-87734-9
eBook Packages: Computer ScienceComputer Science (R0)