Skip to main content
Log in

Maintaining frequent closed itemsets over a sliding window

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper, we study the incremental update of Frequent Closed Itemsets (FCIs) over a sliding window in a high-speed data stream. We propose the notion of semi-FCIs, which is to progressively increase the minimum support threshold for an itemset as it is retained longer in the window, thereby drastically reducing the number of itemsets that need to be maintained and processed. We explore the properties of semi-FCIs and observe that a majority of the subsets of a semi-FCI are not semi-FCIs and need not be updated. This finding allows us to devise an efficient algorithm, IncMine, that incrementally updates the set of semi-FCIs over a sliding window. We also develop an inverted index to facilitate the update process. Our empirical results show that IncMine achieves significantly higher throughput and consumes less memory than the state-of-the-art streaming algorithms for mining FCIs and FIs. IncMine also attains high accuracy of 100% precision and over 93% recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In SIGMOD, (pp. 207–216).

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In VLDB, (pp. 487–499).

  • Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD, (pp. 487–492).

  • Chang, J. H., & Lee, W. S. (2004). A sliding window method for finding recently frequent itemsets over online data streams. Journal of Information Science and Engineering, 20(4), 753–762.

    Google Scholar 

  • Chen, Y., Dong, G., Han, J., Wah, B. W., & Wang, J. (2002). Multi-dimensional regression analysis of time-series data streams. In VLDB, (pp. 323–334).

  • Chi, Y., Wang, H., Yu, P. S., & Muntz, R. R. (2004). Moment: Maintaining closed frequent itemsets over a stream sliding window. In ICDM, (pp. 59–66).

  • FIMI Dataset Repository (2003). Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/.

  • Garofalakis, M. N., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD, (p. 63).

  • Giannella, C., Han, J., Pei, J., Yan, X., & Yu, P. S. (2004). Mining frequent patterns in data streams at multiple time granularities. Cambridge, MA: MIT Press.

    Google Scholar 

  • Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate generation. In SIGMOD, (pp. 1–12).

  • IBM Quest (1996). Ibm quest data mining project. frequent itemset mining dataset repository. http://www.almaden.ibm.com/software/quest/.

  • Jiang, N. & Gruenwald, L. (2006). CFI-Stream: Mining closed frequent itemsets in data streams. In KDD, (pp. 592–597).

  • Kifer, D., Ben-David, S., & Gehrke, J. (2004). Detecting change in data streams. In VLDB, (pp. 180–191).

  • Lee, C.-H., Lin, C.-R., & Chen, M.-S. (2001). Sliding-window filtering: An efficient algorithm for incremental mining. In CIKM, (pp. 263–270).

  • Li, H., Lee, S., & Shan, M. (2004). Algorithm for mining frequent itemsets over the entire history of data streams. In Proc. of First International Workshop on Knowledge Discovery in Data Streams.

  • Manku, G. S. & Motwani, R. (2002). Approximate frequency counts over data streams. In VLDB, pages 346–357.

  • Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In ICDT, (pp. 398–416).

  • Pei, J., Han, J., & Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, (pp. 21–30).

  • Wang, J., Han, J., & Pei, J. (2003). CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In KDD, (pp. 236–245).

  • Yu, J. X., Chong, Z., Lu, H., & Zhou, A. (2004). False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In VLDB, (pp. 204–215).

  • Zaki, M. J. (2000). Generating non-redundant association rules. In KDD, (pp. 34–43).

  • Zaki, M. J., & Hsiao, C.-J. (2002). Charm: An efficient algorithm for closed itemset mining. In SDM.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James Cheng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, J., Ke, Y. & Ng, W. Maintaining frequent closed itemsets over a sliding window. J Intell Inf Syst 31, 191–215 (2008). https://doi.org/10.1007/s10844-007-0042-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-007-0042-3

Keywords

Navigation