Advertisement

Journal of Computer Science and Technology

, Volume 32, Issue 2, pp 368–385 | Cite as

Parallel Incremental Frequent Itemset Mining for Large Data

Regular Paper

Abstract

Frequent itemset mining (FIM) is a popular data mining issue adopted in many fields, such as commodity recommendation in the retail industry, log analysis in web searching, and query recommendation (or related search). A large number of FIM algorithms have been proposed to obtain better performance, including parallelized algorithms for processing large data volumes. Besides, incremental FIM algorithms are also proposed to deal with incremental database updates. However, most of these incremental algorithms have low parallelism, causing low efficiency on huge databases. This paper presents two parallel incremental FIM algorithms called IncMiningPFP and IncBuildingPFP, implemented on the MapReduce framework. IncMiningPFP preserves the FP-tree mining results of the original pass, and utilizes them for incremental calculations. In particular, we propose a method to generate a partial FP-tree in the incremental pass, in order to avoid unnecessary mining work. Further, some of the incremental parallel tasks can be omitted when the inserted transactions include fewer items. IncbuildingPFP preserves the CanTrees built in the original pass, and then adds new transactions to them during the incremental passes. Our experimental results show that IncMiningPFP can achieve significant speedup over PFP (Parallel FPGrowth) and a sequential incremental algorithm (CanTree) in most cases of incremental input database, and in other cases IncBuildingPFP can achieve it.

Keywords

incremental parallel FPGrowth data mining frequent itemset mining MapReduce 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2017_1726_MOESM1_ESM.pdf (513 kb)
ESM 1 (PDF 512 kb)

References

  1. [1]
    Liu D, Shih Y. Integrating AHP and data mining for product recommendation based on customer lifetime value. Information & Management, 2005, 42(3): 387-400.CrossRefGoogle Scholar
  2. [2]
    Iváncsy R, Vajk I. Frequent pattern mining in web log data. Acta Polytechnica Hungarica, 2006, 3(1): 77-90.Google Scholar
  3. [3]
    Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000, 29(2): 1-12.CrossRefGoogle Scholar
  4. [4]
    Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th Int. Conf. Very Large Data Bases, Sept. 1994, pp.487-499.Google Scholar
  5. [5]
    Pietracaprina A, Zandolin D. Mining frequent itemsets using patricia tries. http://citeseerx.ist.psu.edu/viewdoc/summary? doi = 10.1.1.11.437, Jan. 2017.
  6. [6]
    Inokuchi A, Washio T, Motoda H. An Apriori-based algorithm for mining frequent substructures from graph data. In Proc. the 4th European Conference on Principles of Data Mining and Knowledge Discovery, Sept. 2000, pp.13-23.Google Scholar
  7. [7]
    Li H, Wang Y, Zhang D et al. PFP: Parallel FP-Growth for query recommendation. In Proc. the ACM Conference on Recommender Systems, Oct. 2008, pp.107-114.Google Scholar
  8. [8]
    Aouad L, Le-Khac N, Kechadi T. Distributed frequent itemsets mining in heterogeneous platforms. Journal of Engineering, Computing and Architecture, 2007, 1(2): 1-12.Google Scholar
  9. [9]
    Liu L, Li E, Zhang Y. Optimization of frequent itemset mining on multiple-core processor. In Proc. the 33rd Int. Conf. Very Large Data Bases, Sept. 2007, pp.1275-1285.Google Scholar
  10. [10]
    Patel S, Kotecha K. Frequent pattern mining using parallel architecture of artificial bee colony. International Journal of Advance Research in Computer Science and Management Studies, 2015, 3(11): 287-293.Google Scholar
  11. [11]
    Chang H Y, Tzang Y J, Lin J C, Hong Z H, Chi T Y, Huang C Y. A hybrid algorithm for frequent pattern mining using MapReduce framework. In Proc. the 1st IEEE Int. Conf. Computational Intelligence Theory, Systems and Applications (CCITSA), Dec. 2015, pp.19-22.Google Scholar
  12. [12]
    Gole S, Tidke B. Frequent itemset mining for big data in social media using ClustBigFIM algorithm. In Proc. Int. Conf. Pervasive Computing (ICPC), Jan. 2015.Google Scholar
  13. [13]
    Dean J, Ghemawat S. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107-113.CrossRefGoogle Scholar
  14. [14]
    Leung C, Khan Q, Li Z et al. CanTree: A canonical-order tree for incremental frequent-pattern mining. Knowledge and Information Systems, 2007, 11(3): 287-311.CrossRefGoogle Scholar
  15. [15]
    Cheng H, Yan X, Han J. IncSpan: Incremental mining of sequential patterns in large database. In Proc. the 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Aug. 2004, pp.527-532.Google Scholar
  16. [16]
    Leung C, Khan Q, Hoque T. CanTree: A tree structure for efficient incremental mining of frequent patterns. In Proc. the 5th IEEE Int. Conf. Data Mining, Nov. 2005.Google Scholar
  17. [17]
    Basak P, Sedamkar R R, Thakur R. Fast mining of finding frequent patterns in transactional database using incremental approach. International Journal of Applied Information Systems (IJAIS), 2015, 9(2): 6-10.CrossRefGoogle Scholar
  18. [18]
    Cheung D W, Han J, Ng V T, Wong C Y. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proc. the 12th Int. Conf. Data Engineering, Feb.26-Mar.1, 1996, pp.106-114.Google Scholar
  19. [19]
    Ayan N F, Tansel A U, Arkun E. An efficient algorithm to update large itemsets with early pruning. In Proc. the 5th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Aug. 1999, pp.287-291.Google Scholar
  20. [20]
    Li Y, Zhang Z H, Chen W B, Min F. TDUP: An approach to incremental mining of frequent itemsets with three-waydecision pattern updating. International Journal of Machine Learning and Cybernetics, 2015, pp.1-13.Google Scholar
  21. [21]
    Huo W, Feng X, Zhang Z. An efficient approach for incremental mining fuzzy frequent itemsets with FP-tree. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2016, 24(03): 367-386.CrossRefGoogle Scholar
  22. [22]
    Agrawal R, Imieli´nski T, Swami A. Mining association rules between sets of items in large databases. In Proc. the ACM SIGMOD Conference on Management of Data, May 1993, pp.207-216.Google Scholar
  23. [23]
    Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In Proc. the 11th Int. Conf. Data Engineering, March 1995, pp.25-33.Google Scholar
  24. [24]
    Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In Proc. the 26th IEEE Symp. Mass Storage Systems and Technologies (MSST), May 2010. http://storagecnference.us/2010/Papers/MSST/Shvachko.pdf, Feb. 2017.
  25. [25]
    Borthakur D. The Hadoop Distributed File system: Architecture and Design. Hadoop Project Website, 2007.Google Scholar
  26. [26]
    White T. Hadoop: The Definitive Guide (3rd edition). O’Reilly Media, California, U.S., 2012.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.State Key Laboratory of Computer Architecture, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations