Skip to main content
Log in

Mining top-k frequent patterns over data streams sliding window

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Agrawal, C.C. (2009). On high dimensional projected clustering of uncertain data streams. In Proceedings of the IEEE 25th International Conference on Data Engineering, ICDE’09 (pp. 1152–1154). IEEE Press.

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on Very Large Data Bases, VLDB (pp. 487–499).

  • Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J. (2002). Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 1–16). ACM Press.

  • Chang, J.H. & Lee, W.S. (2006). Finding recently frequent itemsets adaptively over online transactional data streams. Information Systems, 31(8), 849–869.

    Article  Google Scholar 

  • Chen, H., Shu, L.C., Xia, J., Deng, Q. (2012a). Mining frequent patterns in a varying-size sliding window of online transactional data streams. Information Sciences, 215, 15–36.

    Article  MathSciNet  Google Scholar 

  • Chen, L., Zou, L.-J., Tu, L. (2012b). A clustering algorithm for multiple data streams based on spectral component similarity. Information Sciences, 183(1), 35–47.

    Article  Google Scholar 

  • Cheung, Y.-L. & Fu, A.W.-C. (2004). Mining frequent itemsets without support threshold: with and without item constraints. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1052–1069.

    Article  Google Scholar 

  • Fu, A.W.-C. & Wong, R.C.-W. (2006). Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 13(2), 193–217.

    Article  MathSciNet  Google Scholar 

  • Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S. (2003). Mining frequent patterns in data streams at multiple time granularities. Next generations on data mining (pp. 191–212).

  • Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD international conference of management of data (pp. 1–12). ACM Press.

  • Han, J., Wang, J., Lu, Y., Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02) (pp. 211–218). IEEE Press.

  • Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transaction on Knowledge and Data Engineering, 21(5), 624–637.

    Article  Google Scholar 

  • Homem, N. & Carvalho, J.P. (2010). Finding top-k elements in data streams. Information Sciences, 180(24), 4958–4974.

    Article  Google Scholar 

  • Jea, K.-F. & Li, C.-W. (2010). A sliding-window based adaptive approximating method to discover recent frequent itemsets from data streams. Lecture Notes in Engineering and Computer Science, 2180(1), 532–539.

    Google Scholar 

  • Karp, R.M., Shenker, S., Papadimitriou, C.H. (2003). A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), 28(1), 51–55.

    Article  Google Scholar 

  • Kranen, P., Assent, I., Balduaf, C., Seidl, T. (2011). The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249–272.

    Article  Google Scholar 

  • Lam, H.T. & Calders, T. (2010). Mining top-k frequent items in a data stream with flexible sliding windows. In Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 283–292). ACM Press.

  • Law, Y.-N., Wang, H., Zaniolo, C. (2011). Rational languages and data models for continuous queries on sequence and data streams. ACM Transactions on Database Systems, 36(2), 8:1–8:32.

    Article  Google Scholar 

  • Leung, C.K.-S. & Khan, Q.I. (2006). Dstree: a tree structure for the mining of frequent sets from data streams. In Proceedings of the 6th International Conference on Data Mining (ICDM’06) (pp. 928–932). IEEE Press.

  • Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36, 4382–4386.

    Article  Google Scholar 

  • Li, H.-F., Shan, M.-K., Lee, S.-Y. (2008). Dsm-fi: an efficient algorithm for mining frequent itemsets in data streams. Knowledge and Information Systems, 17(1), 79–97.

    Article  Google Scholar 

  • Manku, G.S. & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of the 28th VLDB conference (pp. 346–357). VLDB.

  • Metwally, A., Agrawal, D., Abbadi, A.E. (2005). Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th international conference on database theory (pp. 398–412). Springer Press.

  • Mozafari, B. & Hetal Thakkar, C.Z. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering, ICDE’08, (pp. 179–188). IEEE Press.

  • Mozafari, B., Thakkar, H., Zaniolo, C. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of IEEE 24th international conference of data engineering (pp. 179–188). IEEE Press.

  • Park, N.H., Oh, S.H., Lee, W.S. (2010). Anomaly intrusion detection by clustering transactional audit streams in a host computer. Information Sciences, 180(12), 2375–2389.

    Article  Google Scholar 

  • Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K. (2009). Sliding window-based frequent pattern mining over data streams. Information Sciences, 179(22), 3843–3865.

    Article  MathSciNet  Google Scholar 

  • Tsai, P.S.M. (2010). Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Systems with Applications, 37, 6968–6973.

    Article  Google Scholar 

  • Wong, R.C.-W. & Fu, A.W.-C. (2005). Mining top-k itemsets over sliding window based zipfian distribution. In Proceedings of the 2005 SIAM international conference on data mining. SIAM Press.

  • Yu, J.X., Chong, Z., Lu, H., Zhangd, Z., Zhou, A. (2006). A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176, 1986–2015.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Chen.

Additional information

This work was partly supported by the National Science Foundation of China under grant No. 61262033, 61262009; the Natural Science Foundation of Jiangxi Province, China under grant No. 20122BAB201032; the Science Foundation of Jiangxi Provincial Department of Education, China under grant No. GJJ13303, GJJ12259; and the Middle Age and Young Teachers Development Program of Undergraduate Universities in Jiangxi Province, China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H. Mining top-k frequent patterns over data streams sliding window. J Intell Inf Syst 42, 111–131 (2014). https://doi.org/10.1007/s10844-013-0265-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0265-4

Keywords

Navigation