Enhancing the Apriori Algorithm for Frequent Set Counting
In this paper we propose DCP, a new algorithm for solving the Frequent Set Counting problem, which enhances Apriori. Our goal was to optimize the initial iterations of Apriori, i.e. the most time consuming ones when datasets characterized by short or medium length frequent patterns are considered. The main improvements regard the use of an innovative method for storing candidate set of items and counting their support, and the exploitation of effective pruning techniques which significantly reduce the size of the dataset as execution progresses.
KeywordsAssociation Rule Frequent Itemsets Mining Association Rule Support Threshold Pruning Technique
Unable to display preview. Download preview PDF.
- 1.R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad. Depth first generation of long patterns. In Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 108–118, Boston, MA, USA, 2000.Google Scholar
- 2.R. Agrawal, T. Imielinski, and Swami A. Mining Associations between Sets of Items in Massive Databases. In Proc. of the ACM-SIGMOD 1993 Int’l Conf. on Management of Data, pages 207–216, Washington D.C., USA, 1993.Google Scholar
- 3.R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. InkeriVerkamo. Fast Discovery of Association Rules in Large Databases. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.Google Scholar
- 4.R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. of the 20th VLDB Conf., pages 487–499, Santiago, Chile, 1994.Google Scholar
- 5.R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation issues in the design of I/O intensive data mining applications on clusters of workstations. In Proc. of the 3rd Workshop on High Performance Data Mining, in conjunction with IPDPS-2000, Cancun, Mexico, pages 350–357. LNCS 1800 Spinger-Verlag, 2000.Google Scholar
- 6.R.J. Bayardo Jr. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 85–93, Seattle, Washington, USA, 1998.Google Scholar
- 7.Brian Dunkel and Nandit Soparkar. Data organization and access for efficient data mining. In Proceedings of the 15th ICDE Int. Conf. on Data Engineering, pages 522–529, Sydney, Australia, 1999. IEEE Computer Society.Google Scholar
- 8.U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1998.Google Scholar
- 9.V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999.Google Scholar
- 11.J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Dallas, Texas, USA, 2000.Google Scholar
- 12.J.-L. Lin and M.H. Dunham. Mining association rules: Anti-skew algorithms. In Proceedings of the 14-th Int. Conf. on Data Engineering, pages 486–493, Orlando, Florida, USA, 1998. IEEE Computer Society.Google Scholar
- 13.A. Mueller. Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison. Technical Report CS-TR-3515, Univ. of Maryland, College Park, 1995.Google Scholar
- 14.S. Orlando, P. Palmerini, and R. Perego. The DCP algorithm for Frequent Set Counting. Technical Report CS-2001-7, Dip. di Informatica, Università di Venezia, 2001. Available at http://www.dsi.unive.it/~orlando/TR01-7.pdf.
- 15.J.S. Park, M.-S. Chen, and P.S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of the 1995 ACM SIGMOD International Conference on Management of Data, pages 175–186, San Jose, California, 1995.Google Scholar
- 16.N. Ramakrishnan and A.Y. Grama. Data Mining: From Serendipity to Science. IEEE Computer, 32(8):34–37, 1999.Google Scholar
- 17.A. Savasere, E. Omiecinski, and S.B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of the 21th VLDB Conference, pages 432–444, Zurich, Switzerland, 1995.Google Scholar
- 18.H. Toivonen. Sampling Large Databases for Association Rules. In Proceedings of the 22th VLDB Conference, pages 134–145, Mumbai (Bombay), IndiaA, 1996.Google Scholar
- 20.M.J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara. Evaluation of Sampling for Data Mining of Association Rules. In 7th Int. Workshop on Research Issues in Data Engineering (RIDE), pages 42–50, Birmingham, UK, 1997.Google Scholar