Advertisement

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

  • Pankaj Singh
  • Sudhakar SinghEmail author
  • P. K. Mishra
  • Rakhi Garg
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 44)

Abstract

Initially, a number of frequent itemset mining (FIM) algorithms have been designed on the Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for such highly iterative algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM algorithms have been designed, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, which shows that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.

Keywords

Parallel and distributed algorithms Frequent itemset mining Eclat Spark Big data analytics 

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th International Conference on Very Large Databases, VLDB 1215, pp. 487–499 (1994)Google Scholar
  2. 2.
  3. 3.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51, 107–113 (2008)CrossRefGoogle Scholar
  4. 4.
  5. 5.
    Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 10 (2010)Google Scholar
  6. 6.
    Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671. IEEE Press (2014)Google Scholar
  7. 7.
    Rathee, S., Kaul, M., Kashyap, A.: R-Apriori: an efficient apriori based algorithm on spark, In: 8th Ph. D. Workshop in Information and Knowledge Management, pp. 27–34. ACM (2015)Google Scholar
  8. 8.
    Rathee, S., Kashyap, A.: Adaptive-Miner: an efficient distributed association rule mining algorithm on Spark. J. Big Data 5(1), 6 (2018)CrossRefGoogle Scholar
  9. 9.
    Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Cluster Comput. 18(4), 1493–1501 (2015)CrossRefGoogle Scholar
  10. 10.
    Sethi, K.K., Ramesh, D.: HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomput. 73(8), 3652–3668 (2017)CrossRefGoogle Scholar
  11. 11.
    Shi, X., Chen, S., Yang, H.: DFPS: Distributed FP-growth algorithm based on Spark. In: 2nd IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 1725–1731. Chongqing (2017)Google Scholar
  12. 12.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)CrossRefGoogle Scholar
  13. 13.
    Borgelt, C.: Efficient implementations of apriori and éclat. In: IEEE ICDM Workshop on Frequent itemset mining Implementations (FIMI’03) (2003)Google Scholar
  14. 14.
    Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1, 343–373 (1997)CrossRefGoogle Scholar
  15. 15.
    Liu, J., Wu, Y., Zhou, Q., Fung, B.C.M., Chen, F., Yu, B.: Parallel eclat for opportunistic mining of frequent itemsets. In: Database and Expert Systems Applications, LNCS, vol. 9261, pp. 401–415. Springer (2015)Google Scholar
  16. 16.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McChauley, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th USENIX Conference on Networked Systems Design and Implementation, p. 2, USENIX Association (2012)Google Scholar
  17. 17.
  18. 18.
  19. 19.
  20. 20.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)CrossRefGoogle Scholar
  21. 21.
    Lin, M-Y., Lee, P-Y., Hsueh, S-C.: Apriori-based Frequent Itemset Mining Algorithms on MapReduce. In: 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC ’12), Article 76, ACM, New York (2012)Google Scholar
  22. 22.
    Singh, S., Garg, R., Mishra, P.K.: Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster. Comput. Electr. Eng. 67, 348–364 (2018)CrossRefGoogle Scholar
  23. 23.
    Chon, K.W., Kim, M.S.: BIGMiner: a fast and scalable distributed frequent pattern miner for big data. Cluster Computing 21(3), 1507–1520 (2018)CrossRefGoogle Scholar
  24. 24.
    Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118. IEEE Press (2013)Google Scholar
  25. 25.
    Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E. Y.: PFP: Parallel FP-growth for query recommendation. In: ACM Conference on Recommender System, pp. 107–114. ACM (2008)Google Scholar
  26. 26.
    Xun, Y., Zhang, J., Qin, X.: FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce. IEEE Trans. Syst. Man Cybern.: Syst. 46(3), 313–325 (2016)CrossRefGoogle Scholar
  27. 27.
    Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T.: The SPMF open-source data mining library version 2. In: 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, LNCS 9853, pp. 36–40. Springer (2016)Google Scholar
  28. 28.
    Frequent Itemset Mining Dataset Repository, http://fimi.ua.ac.be/data

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Pankaj Singh
    • 1
  • Sudhakar Singh
    • 2
    Email author
  • P. K. Mishra
    • 1
  • Rakhi Garg
    • 3
  1. 1.Department of Computer ScienceBanaras Hindu UniversityVaranasiIndia
  2. 2.Department of Electronics and CommunicationUniversity of AllahabadAllahabadIndia
  3. 3.Mahila Maha VidyalayaBanaras Hindu UniversityVaranasiIndia

Personalised recommendations