Advertisement

Enhancing the Apriori Algorithm for Frequent Set Counting

  • Raffaele Perego
  • Salvatore Orlando
  • P. Palmerini
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2114)

Abstract

In this paper we propose DCP, a new algorithm for solving the Frequent Set Counting problem, which enhances Apriori. Our goal was to optimize the initial iterations of Apriori, i.e. the most time consuming ones when datasets characterized by short or medium length frequent patterns are considered. The main improvements regard the use of an innovative method for storing candidate set of items and counting their support, and the exploitation of effective pruning techniques which significantly reduce the size of the dataset as execution progresses.

Keywords

Association Rule Frequent Itemsets Mining Association Rule Support Threshold Pruning Technique 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R.C. Agarwal, C.C. Aggarwal, and V.V.V. Prasad. Depth first generation of long patterns. In Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 108–118, Boston, MA, USA, 2000.Google Scholar
  2. 2.
    R. Agrawal, T. Imielinski, and Swami A. Mining Associations between Sets of Items in Massive Databases. In Proc. of the ACM-SIGMOD 1993 Int’l Conf. on Management of Data, pages 207–216, Washington D.C., USA, 1993.Google Scholar
  3. 3.
    R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. InkeriVerkamo. Fast Discovery of Association Rules in Large Databases. In Advances in Knowledge Discovery and Data Mining, pages 307–328. AAAI Press, 1996.Google Scholar
  4. 4.
    R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. of the 20th VLDB Conf., pages 487–499, Santiago, Chile, 1994.Google Scholar
  5. 5.
    R. Baraglia, D. Laforenza, S. Orlando, P. Palmerini, and R. Perego. Implementation issues in the design of I/O intensive data mining applications on clusters of workstations. In Proc. of the 3rd Workshop on High Performance Data Mining, in conjunction with IPDPS-2000, Cancun, Mexico, pages 350–357. LNCS 1800 Spinger-Verlag, 2000.Google Scholar
  6. 6.
    R.J. Bayardo Jr. Efficiently Mining Long Patterns from Databases. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 85–93, Seattle, Washington, USA, 1998.Google Scholar
  7. 7.
    Brian Dunkel and Nandit Soparkar. Data organization and access for efficient data mining. In Proceedings of the 15th ICDE Int. Conf. on Data Engineering, pages 522–529, Sydney, Australia, 1999. IEEE Computer Society.Google Scholar
  8. 8.
    U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors. Advances in Knowledge Discovery and Data Mining. AAAI Press, 1998.Google Scholar
  9. 9.
    V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999.Google Scholar
  10. 10.
    E.H. Han, G. Karypis, and Kumar V. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering, 12(3):337–352, May/June 2000.CrossRefGoogle Scholar
  11. 11.
    J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Dallas, Texas, USA, 2000.Google Scholar
  12. 12.
    J.-L. Lin and M.H. Dunham. Mining association rules: Anti-skew algorithms. In Proceedings of the 14-th Int. Conf. on Data Engineering, pages 486–493, Orlando, Florida, USA, 1998. IEEE Computer Society.Google Scholar
  13. 13.
    A. Mueller. Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison. Technical Report CS-TR-3515, Univ. of Maryland, College Park, 1995.Google Scholar
  14. 14.
    S. Orlando, P. Palmerini, and R. Perego. The DCP algorithm for Frequent Set Counting. Technical Report CS-2001-7, Dip. di Informatica, Università di Venezia, 2001. Available at http://www.dsi.unive.it/~orlando/TR01-7.pdf.
  15. 15.
    J.S. Park, M.-S. Chen, and P.S. Yu. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of the 1995 ACM SIGMOD International Conference on Management of Data, pages 175–186, San Jose, California, 1995.Google Scholar
  16. 16.
    N. Ramakrishnan and A.Y. Grama. Data Mining: From Serendipity to Science. IEEE Computer, 32(8):34–37, 1999.Google Scholar
  17. 17.
    A. Savasere, E. Omiecinski, and S.B. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases. In Proceedings of the 21th VLDB Conference, pages 432–444, Zurich, Switzerland, 1995.Google Scholar
  18. 18.
    H. Toivonen. Sampling Large Databases for Association Rules. In Proceedings of the 22th VLDB Conference, pages 134–145, Mumbai (Bombay), IndiaA, 1996.Google Scholar
  19. 19.
    M.J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12:372–390, May/June 2000.CrossRefGoogle Scholar
  20. 20.
    M.J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara. Evaluation of Sampling for Data Mining of Association Rules. In 7th Int. Workshop on Research Issues in Data Engineering (RIDE), pages 42–50, Birmingham, UK, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Raffaele Perego
    • 1
  • Salvatore Orlando
    • 2
  • P. Palmerini
    • 1
    • 2
  1. 1.Istituto CNUCEPisaItaly
  2. 2.Dipartimento di InformaticaUniversitá Ca’ Foscari di VeneziaItaly

Personalised recommendations