Skip to main content

RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework

  • Conference paper
  • First Online:
Second International Conference on Computer Networks and Communication Technologies (ICCNCT 2019)

Abstract

Initially, a number of frequent itemset mining (FIM) algorithms have been designed on the Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for such highly iterative algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On the Spark RDD framework, Apriori and FP-Growth based FIM algorithms have been designed, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, which shows that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: 20th International Conference on Very Large Databases, VLDB 1215, pp. 487–499 (1994)

    Google Scholar 

  2. Apache Hadoop, http://hadoop.apache.org

  3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51, 107–113 (2008)

    Article  Google Scholar 

  4. Apache Spark, http://spark.apache.org

  5. Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 10 (2010)

    Google Scholar 

  6. Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671. IEEE Press (2014)

    Google Scholar 

  7. Rathee, S., Kaul, M., Kashyap, A.: R-Apriori: an efficient apriori based algorithm on spark, In: 8th Ph. D. Workshop in Information and Knowledge Management, pp. 27–34. ACM (2015)

    Google Scholar 

  8. Rathee, S., Kashyap, A.: Adaptive-Miner: an efficient distributed association rule mining algorithm on Spark. J. Big Data 5(1), 6 (2018)

    Article  Google Scholar 

  9. Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Cluster Comput. 18(4), 1493–1501 (2015)

    Article  Google Scholar 

  10. Sethi, K.K., Ramesh, D.: HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomput. 73(8), 3652–3668 (2017)

    Article  Google Scholar 

  11. Shi, X., Chen, S., Yang, H.: DFPS: Distributed FP-growth algorithm based on Spark. In: 2nd IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp. 1725–1731. Chongqing (2017)

    Google Scholar 

  12. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

    Article  Google Scholar 

  13. Borgelt, C.: Efficient implementations of apriori and éclat. In: IEEE ICDM Workshop on Frequent itemset mining Implementations (FIMI’03) (2003)

    Google Scholar 

  14. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Disc. 1, 343–373 (1997)

    Article  Google Scholar 

  15. Liu, J., Wu, Y., Zhou, Q., Fung, B.C.M., Chen, F., Yu, B.: Parallel eclat for opportunistic mining of frequent itemsets. In: Database and Expert Systems Applications, LNCS, vol. 9261, pp. 401–415. Springer (2015)

    Google Scholar 

  16. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McChauley, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: 9th USENIX Conference on Networked Systems Design and Implementation, p. 2, USENIX Association (2012)

    Google Scholar 

  17. Apache Hadoop, http://hadoop.apache.org

  18. Cluster Overview, https://spark.apache.org/docs/latest/cluster-overview.html

  19. RDD Programming Guide, https://spark.apache.org/docs/latest/rdd-programming-guide.html

  20. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Record 29(2), 1–12 (2000)

    Article  Google Scholar 

  21. Lin, M-Y., Lee, P-Y., Hsueh, S-C.: Apriori-based Frequent Itemset Mining Algorithms on MapReduce. In: 6th International Conference on Ubiquitous Information Management and Communication (ICUIMC ’12), Article 76, ACM, New York (2012)

    Google Scholar 

  22. Singh, S., Garg, R., Mishra, P.K.: Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster. Comput. Electr. Eng. 67, 348–364 (2018)

    Article  Google Scholar 

  23. Chon, K.W., Kim, M.S.: BIGMiner: a fast and scalable distributed frequent pattern miner for big data. Cluster Computing 21(3), 1507–1520 (2018)

    Article  Google Scholar 

  24. Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118. IEEE Press (2013)

    Google Scholar 

  25. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E. Y.: PFP: Parallel FP-growth for query recommendation. In: ACM Conference on Recommender System, pp. 107–114. ACM (2008)

    Google Scholar 

  26. Xun, Y., Zhang, J., Qin, X.: FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce. IEEE Trans. Syst. Man Cybern.: Syst. 46(3), 313–325 (2016)

    Article  Google Scholar 

  27. Fournier-Viger, P., Lin, C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H. T.: The SPMF open-source data mining library version 2. In: 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, LNCS 9853, pp. 36–40. Springer (2016)

    Google Scholar 

  28. Frequent Itemset Mining Dataset Repository, http://fimi.ua.ac.be/data

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sudhakar Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, P., Singh, S., Mishra, P.K., Garg, R. (2020). RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework. In: Smys, S., Senjyu, T., Lafata, P. (eds) Second International Conference on Computer Networks and Communication Technologies. ICCNCT 2019. Lecture Notes on Data Engineering and Communications Technologies, vol 44. Springer, Cham. https://doi.org/10.1007/978-3-030-37051-0_85

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37051-0_85

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37050-3

  • Online ISBN: 978-3-030-37051-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics