DCE -miner: an association rule mining algorithm for multimedia based on the MapReduce framework

Abstract

The amount of multimedia data has grown rapidly because of improvements in data collection and storage technologies. The association rule mining (ARM) technique is a type of data mining method widely used to extract useful information from data warehouses. In real-world big data applications, fast and effective data mining algorithms are emerging as a valuable approach. In this paper, we propose DCE-Miner, a fast association rule mining algorithm with low memory requirements based on the MapReduce framework. In the precomputation phase, we split large datasets into equal-sized smaller ones using data division method. In the frequent K-itemsets mining phase, the mappers read the small datasets and distribute the data to reducers based on the closed set characteristics associated with each partition. The reducers use bitmaps to accelerate the computation speed and store the possible frequent 2-itemsets to reduce future computation. Extensive experimental results show that on large-scale datasets with up to 40 million transactions, DCE-Miner achieves better performance and is more robust with respect to dataset sizes and support level than are the current algorithms.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. 1.

    Bhatt CA, Kankanhalli MS (2011) Multimedia data mining: state of the art and challenges. Multimed Tools Appl 51(1):35–76

    Article  Google Scholar 

  2. 2.

    Tsai CF, Chen MY (2010) Variable selection by association rules for customer churn prediction of multimedia on demand. Expert Syst Appl 37(3):2006–2015

    Article  Google Scholar 

  3. 3.

    Yang Y, Huang Z, Shen HT et al (2011) Mining multi-tag association for image tagging. World Wide Web. https://doi.org/10.1007/s11280-010-0099-8

  4. 4.

    Oswald C, Sivaselvan B, Ambient J (2018) An optimal text compression algorithm based on frequent pattern mining. Intell Human Comput 9:803–822. https://doi.org/10.1007/s12652-017-0540-2

    Article  Google Scholar 

  5. 5.

    Güder M, Çiçekli NK (2018) Multi-modal video event recognition based on association rules and decision fusion. Multimedia Systems 24:55–72. https://doi.org/10.1007/s00530-017-0535-z

    Article  Google Scholar 

  6. 6.

    Liu S, Bai W, Liu G, Li W, Srivastava HM (2018) Parallel fractal compression method for big video data. Complexity. https://doi.org/10.1155/2018/2016976

  7. 7.

    Liu S, Pan Z, Cheng X (2017) A novel fast fractal image compression method based on distance clustering in high dimensional sphere surface. Fractals. https://doi.org/10.1142/S0218348X17400047

  8. 8.

    Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37

    Article  Google Scholar 

  9. 9.

    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. IEEE, Santiago

  10. 10.

    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA. ACM

  11. 11.

    Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97). AAAI Press, pp 283–286

  12. 12.

    Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip. Rev Data Min Knowl Discov 7(4):1–18

    Google Scholar 

  13. 13.

    Gatuha G, Jiang T. (2017) Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures. Turk J Electr Eng Comput Sci 25:2096–2107

  14. 14.

    Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362

    Article  Google Scholar 

  15. 15.

    Verma N, Singh J (2017) An intelligent approach to big data analytics for sustainable retail environment using apriori–map reduce framework. Ind Manag Data Syst 117(7):1503–1520

    Article  Google Scholar 

  16. 16.

    Yan X, Zhang J, Xun Y, Qin X (2017) A parallel algorithm for mining constrained frequent patterns using mapreduce. Soft Comput 21:2237–2249

    Article  Google Scholar 

  17. 17.

    Chon KW, Kim MS (2018) BIGMiner: a fast and scalable distributed frequent pattern miner for big data. Clust Comput 1:1–14

    Google Scholar 

  18. 18.

    Li H, Wang Y, Zhang D, Zhang M, Chang EY (2008) Pfp: Parallel fp-growth for query recommendation. Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys 2008, Lausanne, Switzerland, October 23-25, 2008. ACM

  19. 19.

    Padillo F, Luna JM, Herrera F et al (2017) Mining association rules on big data through MapReduce genetic programming. Integrated Comput Aided Eng 25(2):1–19

    Google Scholar 

  20. 20.

    Dean J, Ghemawa S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  21. 21.

    Zeng Y, Yin S et al (2015) Research of improved FP-growth algorithm in association rules mining. Sci Program. https://doi.org/10.1155/2015/910281

  22. 22.

    Lin X (2014) MR-Apriori: Association rules algorithm based on MapReduce. In: 2014 5th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE, Beijing, China

  23. 23.

    Chavan K, Kulkarni P, Ghodekar P, Patil SN (2015) Frequent itemset mining for Big data. In: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), Noida, pp 1365–1368. https://doi.org/10.1109/ICGCIoT.2015.7380679

  24. 24.

    Lin M-Y, Lee P-Y, Hsueh S-C (2012) Apriori-based frequent itemset mining algorithms on MapReduce. In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC ’12). Association for Computing Machinery, New York, NY, USA, Article 76, 1–8. https://doi.org/10.1145/2184751.2184842

  25. 25.

    Wang L (2014) An efficient algorithm of frequent Itemsets mining based on MapReduce. J Inf Comput Sci 11(8):2809–2816

  26. 26.

    Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR ’09). Association for Computing Machinery, New York, NY, USA, Article 48, 1–9. https://doi.org/10.1145/1646396.1646452

  27. 27.

    Qu Z, Guo L, Chen Q et al (2013) Intelligent dispatching lossless cluster compression technology based on Hadoop cloud framework. Autom Electr Power Syst 37:93–98. https://doi.org/10.7500/AEPS201301138

    Article  Google Scholar 

  28. 28.

    Tang Z, Wang W, Sun L, Huang Y, Wu H, Wei J, Huang T (2018) IO dependent SSD cache allocation for elastic Hadoop applications. Science China Inf Sci 61:1–17. https://doi.org/10.1007/s11432-017-9401-y

    Article  Google Scholar 

  29. 29.

    Rathore MM, Son H, Ahmad A et al (2018) Real-time big data stream processing using GPU with spark over Hadoop ecosystem. Int J Parallel Prog 46(3):1–17

    Article  Google Scholar 

  30. 30.

    Djenouri Y, Djenouri D, Habbas Z et al (2018) How to exploit high performance computing in population-based metaheuristics for solving association rule mining problem. Distrib Parallel Databases 3:1–29

    Google Scholar 

Download references

Acknowledgements

Supported by the Fundamental Research Fundation for Universities of Heilongjiang Province (JMRH2018XM04) and Natural Science Foundation of Heilongjiang Province of China (LC2018030).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Guanglu SUN.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chengyan, L., FENG, S. & SUN, G. DCE -miner: an association rule mining algorithm for multimedia based on the MapReduce framework. Multimed Tools Appl 79, 16771–16793 (2020). https://doi.org/10.1007/s11042-019-08361-y

Download citation

Keywords

  • Multimedia
  • Data mining
  • Association rule
  • Big data
  • MapReduce