Skip to main content

Pincer-search: A new algorithm for discovering the maximum frequent set

  • Conference paper
  • First Online:
Book cover Advances in Database Technology — EDBT'98 (EDBT 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1377))

Included in the following conference series:

Abstract

Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent item-sets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the algorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. SIGMOD, May 1993.

    Google Scholar 

  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. 20th VLDB, Sept. 1994.

    Google Scholar 

  3. R. Agrawal and J. Shafer. Parallel mining of association rules. IEEE Trans. on Knowledge and Data Engineering, Jan. 1996.

    Google Scholar 

  4. S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In Proc. SIGMOD, May 1997.

    Google Scholar 

  5. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthrusamy (Eds.). Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, CA, 1996.

    Google Scholar 

  6. D. Gunopulos, H. Mannila, and S. Saluja. Discovering all most specific sentences by randomized algorithm. In Proc. Intl. Conf. of Database Theory, Jan. 1997.

    Google Scholar 

  7. J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proc. 21st VLDB, Sept. 1995.

    Google Scholar 

  8. D. Lin and Z. Kedem. Pincer-Search: A new algorithm for discovering the maximum frequent set. Technical Report TR1997-742, Dept. of Computer Science, New York University, Sept. 1997.

    Google Scholar 

  9. H. Mannila and H. Toivonen. Discovering frequent episodes in sequences. In Proc. KDD'95, Aug. 1995.

    Google Scholar 

  10. H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Technical Report TR C-1997-8, Dept. of Computer Science, U. of Helsinki, Jan. 1997.

    Google Scholar 

  11. H. Mannila, H. Toivonen, and A. Verkamo. Improved methods for finding association rules. In Proc. AAAI Workshop on Knowledge Discovery, July 1994.

    Google Scholar 

  12. T. Mitchell. Generalization as search. Artificial Intelligence, Vol. 18, 1982.

    Google Scholar 

  13. B. özden, S. Ramaswamy. and A. Silberschatz. Cyclic Association Rules. In Proc. 14th Intl. Conf. on Data Engineering, 1998, to appear.

    Google Scholar 

  14. J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for mining association rules. In Proc. ACM-SIGMOD, May 1995.

    Google Scholar 

  15. G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, AAAI Press, 1991.

    Google Scholar 

  16. A. Sarasere, E. Omiecinsky, and S. Navathe. An efficient algorithm for mining association rules in large databases. In Proc. 21st VLDB, Sept. 1995.

    Google Scholar 

  17. R. Srikant and R. Agrawal. Mining generalized association rules. IBM Research Report RJ 9963, June 1995.

    Google Scholar 

  18. H. Toivonen. Sampling large databases for association rules. In Proc. 22nd VLDB, Sept. 1996.

    Google Scholar 

  19. M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In Proc. KDD'97, Aug. 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hans-Jörg Schek Gustavo Alonso Felix Saltor Isidro Ramos

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, DI., Kedem, Z.M. (1998). Pincer-search: A new algorithm for discovering the maximum frequent set. In: Schek, HJ., Alonso, G., Saltor, F., Ramos, I. (eds) Advances in Database Technology — EDBT'98. EDBT 1998. Lecture Notes in Computer Science, vol 1377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0100980

Download citation

  • DOI: https://doi.org/10.1007/BFb0100980

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64264-0

  • Online ISBN: 978-3-540-69709-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics