ETARM: an efficient top-k association rule mining algorithm
- 260 Downloads
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.
KeywordsData mining Association rule mining Top-k association rules Rule Expansion
This work was carried out during the tenure of an ERCIM ‘Alain Bensoussan’ Fellowship Programme.
- 1.Agrawal R, Imielminski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings ACM international conference on management of data. ACM Press, pp 207–216Google Scholar
- 2.Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499Google Scholar
- 4.Deng Z, Fang G (2007) Mining top-rank-k frequent patterns. In: ICMLC’07, pp 851–856Google Scholar
- 8.Fang G, Deng ZH (2008) VTK: vertical mining of top-rank-k frequent patterns. In: FSKD’08, pp 620–624Google Scholar
- 9.Fournier-Viger P, Wu C-W, Tseng VS (2012) Mining top-k association rules. In: Proceedings of the 25th Canadian conference on artificial intelligence AI (2012). Springer, LNAI 7310, pp 61– 73Google Scholar
- 10.Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: ICDE’99, pp 106–115Google Scholar
- 11.Han J, Pei H, Yin Y (2004) Mining frequent patterns without candidate generation. In: Proceedings ACM international conference on management of data (SIGMOD’00, Dallas, TX), vol 8(1). ACM Press, pp 53–87Google Scholar
- 12.Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining top-k frequent closed patterns without minimum support. In: ICDM’02, pp 211–218Google Scholar
- 18.Pietracaprina A, Vandin F (2004) Efficient incremental mining of top-k frequent closed itemsets. In: Tenth international conference discovery science. Springer, Berlin, pp 275–280Google Scholar
- 24.Vo B, Le B (2009) Mining traditional association rules using frequent itemsets lattice. In: International conference on computers & industrial engineering. IEEE Press, pp 1401–1406Google Scholar
- 30.You Y, Zhang J, Yang Z, Liu G (2010) Mining top-k fault tolerant association rules by redundant pattern disambiguation in data streams. In: International conference intelligent computing and cognitive informatics. IEEE Press, pp 470–473Google Scholar