Abstract
The search for frequent patterns in transactional databases is considered one of the most important data mining problems. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the dataset to determine the set of frequent itemsets, thus implying high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also implying high synchronization cost. We present a novel algorithm that exploits efficiently the trade-offs between computation, communication, memory usage and synchronization. The algorithm was implemented over a cluster of SMP nodes combining distributed and shared memory paradigms. This paper presents the results of our algorithm on different data sizes experimented on different numbers of processors, and studies the effect of these variations on the overall performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge (1996)
Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. on Knowledge and Data Engg. 8(6), 962–969 (1996)
Park, J.S., Chen, M., Yu, P.S.: Efficient parallel data mining for association rules. In: ACM Intl. Conf. Information and Knowledge Management (November 1995)
Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient mining of association rules in distributed databases. IEEE Trans. on Knowledge and Data Engg. 8(6), 911–922 (1996)
Cheung, D., Han, J., Ng, V., Fu, A., Fu, Y.: A fast distributed algorithm for mining association rules. In: 4th Intl. Conf. Parallel and Distributed Info. Systems (December 1996)
Cheung, D.W., Xiao, Y.: Effect of Data Skewness in Parallel Mining of Association Rules. In: Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 48–60, Melbourne, Australia (April 1998)
Ashrafi, M.Z., Taniar, D., Smith, K.A.: ODAM: An Optimized Distributed Association Rule Mining Algorithm. IEEE Distributed Systems Online (5) (2004)
Freitas, A.A., Lavington, S.H.: Mining Very Large Databases with Parallel Processing. Kluwer Academic Publishers, Boston (1998)
Freitas, A.A.: A Survey of Parallel Data Mining. In: Proc. 2nd Int. Conf. on the Practical Applications of Knowledge Discovery and Data Mining (1998)
Skillicorn, D.: Parallel Data Mining, Department of Computing and Information Science Queen’s University, Kingston (1999)
Palancar, J.H., León, R.H., Pagola, J.M., Díaz, A.H.: Mining Frequent Patterns Using Compressed Vertical Binary Representations In: Lin, T.Y., Xie, Y. (eds.) Proceedings of a Workshop Foundation of Semantic Oriented Data and Web Mining, held in Conjunction with the Fifth IEEE International Conference on Data Mining, Houston, Texas, USA, November 27-30, 2005 pp. 29–33 (2005), ISBN 0-9738918-7-4
Palancar, J.H., León, R.H., Pagola, J.M., Díaz, A.H.: A Compressed Vertical Binary Algorithm for Mining Frequent Patterns. In: The book Data Mining: Foundations and Practice Lin, T.Y., Wasilewska, A., Petry, F., Xie, Y.(eds.) Springer, Accepted for publication (to appear)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: A Scalable Multi-Strategy Algorithm for Counting Frequent Sets Washington, USA, pp. 19–30. In: Proceedings of the 5th Workshop on High Performance Data Mining, in conjunction with Second International SIAM Conference on Data Mining (April 2002)
Schuster, A., Wolff, R.: Communication-Efficient Distributed Mining of Association Rules. Data Mining and Knowledge Discovery, 8(2) (March 2004)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Heckerman, D., Mannila, H., Pregibon, D., Uthurusamy, R. (eds.) KDD 1997. Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, p. 283. AAAI Press, Stanford (1997)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Parallel Algorithms for Fast Discovery of Association Rules. Data Mining and Knowledge Discovery 1(4), 343–373 (1997)
Zaki, M.J., Parthasarathy, S., Li, W.: A Localized Algorithm for Parallel Association Mining. In: Proceedings of the 9th ACM Symposium on Parallel Algorithms and Architectures (1997)
Zaki, M.J.: Parallel and Distributed Association Mining: A Survey. IEEE Concurrency (October- December 1999)
Zaki, M., Parthasarhaty, S., Ogihara, M., Li, W.: Parallel Data Mining for Association Rules on Shared Memory Systems (February 28, 2001)
Zaiane, O.R., El-Hajj, M., Lu, P.: Fast Parallel Association Rule Mining without candidacy generation. Techical Report TR01-12. Department of Computing Sciences, University of Alberta, Canada (2001)
Shintani, T., Kitsuregawa, M.: Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy. In: Proceedings ACM SIGMOD International Conference on Management of Data, SIGMOD 1998, Seattle, Washington, USA (June 2-4, 1998)
Sam, E.-H., Karypis, H.G., Kumar, V.: Scalable Parallel Data Mining for Association Rules. Department of Computer Science. University of Minnesota (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Palancar, J.H., Tormo, O.F., Cárdenas, J.F., León, R.H. (2007). Distributed and Shared Memory Algorithm for Parallel Mining of Association Rules. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-73499-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)