Abstract
Frequent itemset mining (FIM) is one of the most deeply studied data mining task. A number of algorithms, employing different approaches and advanced data structures, have already been proposed to solve the task efficiently. Even the fastest serial FIM algorithms fail to scale up with the rapid growth of database sizes. Hence, parallel FIM algorithms are the only viable solutions in many domains as serial so- lutions have almost reached the physical barriers. To this end, parallel versions of a few serial FIM algorithms, including FP-Growth, have al- ready been developed. In this study, we develop three different parallel FP-Growth implementations for cluster computers. They, all MPI based, are (i) Static Parallel FP-Growth, (ii) Dynamic Parallel FP-Growth, and (iii) (Tree-Sharing) Dynamic Parallel FP-Growth. All the three variants are task-parallel, i.e., not based on horizontal or vertical partitioning of database. The algorithms are experimentally evaluated on a 16-node cluster computer. Our results demonstrate the utility of the algorithms.
Supported by TUBITAK, grant number 108E016
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T. Imielienski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD ’93, pages 207–216, 1993.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB’94, pages 487–499, 1994.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), pages 1–12, 2000.
Y-J. Lan and Y. Qiu. Parallel frequent itemsets mining algorithms without interme-diate results. In Proceedings of 2005 International Conference on Machine Learning and Cybernetics, pages 2102–2107, 2005.
H. Li, Y. Wang, D. Zhang, M. Zhang, and E.Y. Chang. Pfp: Parallel fp-growth for query recommendation. In Proceedings of the 2008 ACM Conference on Recommender Systems, pages 107–114, 2008.
G.O. Ozdogan, O. Abul, and A. Yazici. Paralel veri madenciligi algoritmalari. In Proceedings of the first National High-Performance and Grid Computing Conference, pages 131–137, 2009 (in Turkish).
I. Pramudiono and M. Kitsuregawa. Parallel fp-growth on pc cluster. In Proceedings of the 7th Pacific-Asia Conference of Knowledge Discovery and Data Mining, pages 467–473, 2003.
A. Savasere, E. Omiecinski, and S. Navathe. An e±cient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Databases (VLDB’95), pages 432–444, 1995.
O.R. Zaiane, M. El-Hajj, and P. Lu. Fast parallel association rule mining without candidacy generation. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 665–668, 2001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this paper
Cite this paper
Özdogan, G.Ö., Abul, O. (2011). Task-Parallel FP-Growth on Cluster Computers. In: Gelenbe, E., Lent, R., Sakellari, G., Sacan, A., Toroslu, H., Yazici, A. (eds) Computer and Information Sciences. Lecture Notes in Electrical Engineering, vol 62. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9794-1_71
Download citation
DOI: https://doi.org/10.1007/978-90-481-9794-1_71
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9793-4
Online ISBN: 978-90-481-9794-1
eBook Packages: EngineeringEngineering (R0)