Data is increasing rapidly day by day along with the transactional database. Dividing this data and storing it in a distributed manner is an effective way for storage and retrieval. Mining such distributed data with minimum dependence between sub-problems is a crucial task. Finding frequent itemsets and corresponding association rules is a big challenge while considering the aggregation in a distributed environment. To overcome these challenges, we propose a distributed frequent itemset generation and association rule mining algorithm using MapReduce programming model. The proposed scheme generates frequent itemset and mine association rules using a synthesized distributed technique. The rules are mined in a distributed manner, and then weights are assigned to subsets of data and association rules. A proper mixture of association rules that are generated in distributed manner is done using a weighted approach. This paper presents a novel MapReduce-based synthesis approach, which can work well over a distributed storage of large amount of data.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Wu X, Zhu X, Wu G Q and Ding W 2014 Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26(1): 97–107
DeMers, J 2015 Why Facebook is making big data available to select partners. Forbes, retrieved from http://www.forbes.com/sites/jaysondemers/2015/03/25/why-facebook-is-making-big-data-available-to-select-partners/#24f4d0422966
Turner V 2014 The digital universe of opportunities: rich data and the increasing value of the Internet of things. Retrieved October 26, 2016, from http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
Laney D 2001 3D Data management: controlling data volume, velocity and variety. META Group Research Note 6, 70
Fan W and Bifet A 2013 Mining Big Data: current status, and forecast to the future. ACM SIGKDD Explorations Newsletter 14(2): 1–5
Rashid M M, Gondal I and Kamruzzaman J 2017 Dependable large scale behavioral patterns mining from sensor data using Hadoop platform. Information Sciences 379: 128–145
Anitha R, Mukherjee S 2015 MaaS: fast retrieval of data in cloud using metadata as a service. Arabian Journal for Science and Engineering 40(8): 2323–2343
Hipp J, Güntzer U and Nakhaeizadeh G 2000 Algorithms for association rule mining: a general survey and comparison. ACM SIGKDD Explorations Newsletter 2(1): 58-64.
Seol W S, Jeong H W, Lee B and Youn H Y 2013 Reduction of association rules for Big Data sets in socially-aware computing. In: Proceedings of the 16th IEEE International Conference on Computational Science and Engineering (CSE), pp. 949–956
Han J 2005 Data mining: concepts and techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Agrawal R and Srikant R 1994 Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1215, pp. 487–499
Han J, Pei J, Yin Y and Mao R 2004 Mining frequent patterns without candidate generation: a frequent-pattern tree approach Data Mining and Knowledge Discovery 8(1): 53–87
Ordonez C, Mohanam N, Garcia-Alvarado C 2014 PCA for large data sets with parallel data summarization. Distributed and Parallel Databases 32(3): 377–403
Dean J, Ghemawat S 2008 MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1): 107–113
Agrawal D, Das S, El Abbadi A 2011 Big data and cloud computing: current state and future opportunities. In: Proceedings of the 14th International Conference on Extending Database Technology, 530–533
Agrawal R, Shafer J C 1996 Parallel mining of association rules: Design, implementation, and experience IBM Thomas J. Watson Research Division
Yang X Y, Liu Z, Fu Y 2010 MapReduce as a programming model for association rules algorithm on Hadoop. In: Proceedings of the 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), pp. 99–102
Lin M Y, Lee P Y, Hsueh S C 2012 Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, p. 76
Chang X Z MapReduce-Apriori algorithm under cloud computing environment. In: Proceedings of the International Conference on Machine Learning and Cybernetics (ICMLC), vol. 2, pp. 637–641
Lin X 2014 MR-apriori: association rules algorithm based on MapReduce. In: Proceedings of the 5th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 141–144
Li N, Zeng L, He Q, Shi Z 2012 Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), pp. 236–241
Guo J, Ren Y G 2013 Research on improved A Priori algorithm based on coding and MapReduce. In: Proceedings of the 10th Conference on Web Information System and Application (WISA), pp. 294–299
Li H, Wang Y, Zhang D, Zhang M and Chang E Y 2008 Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, pp. 107–114
Xun Y, Zhang J and Qin X 2016 Fidoop: parallel mining of frequent itemsets using MapReduce. IEEE Transactions on Systems, Man, and Cybernetics: Systems 46(3): 313–325
Riondato M, DeBrabant J A, Fonseca R and Upfal E 2012 PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, October, pp. 85–94
Morales G D F and Bifet A 2015 SAMOA: scalable advanced massive online analysis. Journal of Machine Learning Research 16(1): 149–153
Holt J D and Chung S M 2007 Parallel mining of association rules from text databases. The Journal of Supercomputing 39(3): 273–299
Shvachko K, Kuang H, Radia S and Chansler R 2010 The Hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10
Javed A and Khokhar A 2004 Frequent pattern mining on message passing multiprocessor systems. Distributed and Parallel Databases 16(3): 321–334
Wu X, Zhang S 2003 Synthesizing high-frequency rules from different data sources. IEEE Transactions on Knowledge and Data Engineering 15(2): 353–367
Friedman J, Hastie T, Tibshirani R 2001 The elements of statistical learning. In: Springer Series in Statistics, vol. 1. Berlin: Springer
Fournier-Viger P 2008 SPMF: a Java open-source data mining library. Retrieved on October 30, 2016, from http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php
Fournier-Viger P, Gomariz Gueniche T A, Soltani A, Wu C and Tseng V S 2014 SPMF: a Java open-source pattern mining library. Journal of Machine Learning Research 15: 3389–3393
About this article
Cite this article
Pal, A., Kumar, M. Distributed synthesized association mining for big transactional data. Sādhanā 45, 169 (2020). https://doi.org/10.1007/s12046-020-01380-8
- Big Data
- frequent itemset
- association rule