Abstract
The amount of multimedia data has grown rapidly because of improvements in data collection and storage technologies. The association rule mining (ARM) technique is a type of data mining method widely used to extract useful information from data warehouses. In real-world big data applications, fast and effective data mining algorithms are emerging as a valuable approach. In this paper, we propose DCE-Miner, a fast association rule mining algorithm with low memory requirements based on the MapReduce framework. In the precomputation phase, we split large datasets into equal-sized smaller ones using data division method. In the frequent K-itemsets mining phase, the mappers read the small datasets and distribute the data to reducers based on the closed set characteristics associated with each partition. The reducers use bitmaps to accelerate the computation speed and store the possible frequent 2-itemsets to reduce future computation. Extensive experimental results show that on large-scale datasets with up to 40 million transactions, DCE-Miner achieves better performance and is more robust with respect to dataset sizes and support level than are the current algorithms.
Similar content being viewed by others
References
Bhatt CA, Kankanhalli MS (2011) Multimedia data mining: state of the art and challenges. Multimed Tools Appl 51(1):35–76
Tsai CF, Chen MY (2010) Variable selection by association rules for customer churn prediction of multimedia on demand. Expert Syst Appl 37(3):2006–2015
Yang Y, Huang Z, Shen HT et al (2011) Mining multi-tag association for image tagging. World Wide Web. https://doi.org/10.1007/s11280-010-0099-8
Oswald C, Sivaselvan B, Ambient J (2018) An optimal text compression algorithm based on frequent pattern mining. Intell Human Comput 9:803–822. https://doi.org/10.1007/s12652-017-0540-2
Güder M, Çiçekli NK (2018) Multi-modal video event recognition based on association rules and decision fusion. Multimedia Systems 24:55–72. https://doi.org/10.1007/s00530-017-0535-z
Liu S, Bai W, Liu G, Li W, Srivastava HM (2018) Parallel fractal compression method for big video data. Complexity. https://doi.org/10.1155/2018/2016976
Liu S, Pan Z, Cheng X (2017) A novel fast fractal image compression method based on distance clustering in high dimensional sphere surface. Fractals. https://doi.org/10.1142/S0218348X17400047
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. IEEE, Santiago
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA. ACM
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97). AAAI Press, pp 283–286
Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip. Rev Data Min Knowl Discov 7(4):1–18
Gatuha G, Jiang T. (2017) Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures. Turk J Electr Eng Comput Sci 25:2096–2107
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Verma N, Singh J (2017) An intelligent approach to big data analytics for sustainable retail environment using apriori–map reduce framework. Ind Manag Data Syst 117(7):1503–1520
Yan X, Zhang J, Xun Y, Qin X (2017) A parallel algorithm for mining constrained frequent patterns using mapreduce. Soft Comput 21:2237–2249
Chon KW, Kim MS (2018) BIGMiner: a fast and scalable distributed frequent pattern miner for big data. Clust Comput 1:1–14
Li H, Wang Y, Zhang D, Zhang M, Chang EY (2008) Pfp: Parallel fp-growth for query recommendation. Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys 2008, Lausanne, Switzerland, October 23-25, 2008. ACM
Padillo F, Luna JM, Herrera F et al (2017) Mining association rules on big data through MapReduce genetic programming. Integrated Comput Aided Eng 25(2):1–19
Dean J, Ghemawa S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Zeng Y, Yin S et al (2015) Research of improved FP-growth algorithm in association rules mining. Sci Program. https://doi.org/10.1155/2015/910281
Lin X (2014) MR-Apriori: Association rules algorithm based on MapReduce. In: 2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE, Beijing, China
Chavan K, Kulkarni P, Ghodekar P, Patil SN (2015) Frequent itemset mining for Big data. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Noida, pp 1365–1368. https://doi.org/10.1109/ICGCIoT.2015.7380679
Lin M-Y, Lee P-Y, Hsueh S-C (2012) Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC ’12). Association for Computing Machinery, New York, NY, USA, Article 76, 1–8. https://doi.org/10.1145/2184751.2184842
Wang L (2014) An efficient algorithm of frequent Itemsets mining based on MapReduce. J Inf Comput Sci 11(8):2809–2816
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR ’09). Association for Computing Machinery, New York, NY, USA, Article 48, 1–9. https://doi.org/10.1145/1646396.1646452
Qu Z, Guo L, Chen Q et al (2013) Intelligent dispatching lossless cluster compression technology based on Hadoop cloud framework. Autom Electr Power Syst 37:93–98. https://doi.org/10.7500/AEPS201301138
Tang Z, Wang W, Sun L, Huang Y, Wu H, Wei J, Huang T (2018) IO dependent SSD cache allocation for elastic Hadoop applications. Science China Inf Sci 61:1–17. https://doi.org/10.1007/s11432-017-9401-y
Rathore MM, Son H, Ahmad A et al (2018) Real-time big data stream processing using GPU with spark over Hadoop ecosystem. Int J Parallel Prog 46(3):1–17
Djenouri Y, Djenouri D, Habbas Z et al (2018) How to exploit high performance computing in population-based metaheuristics for solving association rule mining problem. Distrib Parallel Databases 3:1–29
Acknowledgements
Supported by the Fundamental Research Fundation for Universities of Heilongjiang Province (JMRH2018XM04) and Natural Science Foundation of Heilongjiang Province of China (LC2018030).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chengyan, L., FENG, S. & SUN, G. DCE -miner: an association rule mining algorithm for multimedia based on the MapReduce framework. Multimed Tools Appl 79, 16771–16793 (2020). https://doi.org/10.1007/s11042-019-08361-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08361-y