Mining top-k frequent patterns over data streams sliding window

Chen, Hui

doi:10.1007/s10844-013-0265-4

Mining top-k frequent patterns over data streams sliding window

Published: 18 July 2013

Volume 42, pages 111–131, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Hui Chen¹

884 Accesses
22 Citations
Explore all metrics

Abstract

Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Stratified random sampling from streaming and stored data

Article 23 October 2020

On the nature and types of anomalies: a review of deviations in data

Article Open access 04 August 2021

References

Agrawal, C.C. (2009). On high dimensional projected clustering of uncertain data streams. In Proceedings of the IEEE 25th International Conference on Data Engineering, ICDE’09 (pp. 1152–1154). IEEE Press.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on Very Large Data Bases, VLDB (pp. 487–499).
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J. (2002). Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 1–16). ACM Press.
Chang, J.H. & Lee, W.S. (2006). Finding recently frequent itemsets adaptively over online transactional data streams. Information Systems, 31(8), 849–869.
Article Google Scholar
Chen, H., Shu, L.C., Xia, J., Deng, Q. (2012a). Mining frequent patterns in a varying-size sliding window of online transactional data streams. Information Sciences, 215, 15–36.
Article MathSciNet Google Scholar
Chen, L., Zou, L.-J., Tu, L. (2012b). A clustering algorithm for multiple data streams based on spectral component similarity. Information Sciences, 183(1), 35–47.
Article Google Scholar
Cheung, Y.-L. & Fu, A.W.-C. (2004). Mining frequent itemsets without support threshold: with and without item constraints. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1052–1069.
Article Google Scholar
Fu, A.W.-C. & Wong, R.C.-W. (2006). Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 13(2), 193–217.
Article MathSciNet Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S. (2003). Mining frequent patterns in data streams at multiple time granularities. Next generations on data mining (pp. 191–212).
Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD international conference of management of data (pp. 1–12). ACM Press.
Han, J., Wang, J., Lu, Y., Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02) (pp. 211–218). IEEE Press.
Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transaction on Knowledge and Data Engineering, 21(5), 624–637.
Article Google Scholar
Homem, N. & Carvalho, J.P. (2010). Finding top-k elements in data streams. Information Sciences, 180(24), 4958–4974.
Article Google Scholar
Jea, K.-F. & Li, C.-W. (2010). A sliding-window based adaptive approximating method to discover recent frequent itemsets from data streams. Lecture Notes in Engineering and Computer Science, 2180(1), 532–539.
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H. (2003). A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), 28(1), 51–55.
Article Google Scholar
Kranen, P., Assent, I., Balduaf, C., Seidl, T. (2011). The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249–272.
Article Google Scholar
Lam, H.T. & Calders, T. (2010). Mining top-k frequent items in a data stream with flexible sliding windows. In Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 283–292). ACM Press.
Law, Y.-N., Wang, H., Zaniolo, C. (2011). Rational languages and data models for continuous queries on sequence and data streams. ACM Transactions on Database Systems, 36(2), 8:1–8:32.
Article Google Scholar
Leung, C.K.-S. & Khan, Q.I. (2006). Dstree: a tree structure for the mining of frequent sets from data streams. In Proceedings of the 6th International Conference on Data Mining (ICDM’06) (pp. 928–932). IEEE Press.
Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36, 4382–4386.
Article Google Scholar
Li, H.-F., Shan, M.-K., Lee, S.-Y. (2008). Dsm-fi: an efficient algorithm for mining frequent itemsets in data streams. Knowledge and Information Systems, 17(1), 79–97.
Article Google Scholar
Manku, G.S. & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of the 28th VLDB conference (pp. 346–357). VLDB.
Metwally, A., Agrawal, D., Abbadi, A.E. (2005). Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th international conference on database theory (pp. 398–412). Springer Press.
Mozafari, B. & Hetal Thakkar, C.Z. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering, ICDE’08, (pp. 179–188). IEEE Press.
Mozafari, B., Thakkar, H., Zaniolo, C. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of IEEE 24th international conference of data engineering (pp. 179–188). IEEE Press.
Park, N.H., Oh, S.H., Lee, W.S. (2010). Anomaly intrusion detection by clustering transactional audit streams in a host computer. Information Sciences, 180(12), 2375–2389.
Article Google Scholar
Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K. (2009). Sliding window-based frequent pattern mining over data streams. Information Sciences, 179(22), 3843–3865.
Article MathSciNet Google Scholar
Tsai, P.S.M. (2010). Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Systems with Applications, 37, 6968–6973.
Article Google Scholar
Wong, R.C.-W. & Fu, A.W.-C. (2005). Mining top-k itemsets over sliding window based zipfian distribution. In Proceedings of the 2005 SIAM international conference on data mining. SIAM Press.
Yu, J.X., Chong, Z., Lu, H., Zhangd, Z., Zhou, A. (2006). A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176, 1986–2015.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software and Communication Engineering, Jiangxi University of Finance and Economics, West Yuping Road, Changbei District, Nanchang City, Jiangxi Province, 330012, People’s Republic of China
Hui Chen

Authors

Hui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Chen.

Additional information

This work was partly supported by the National Science Foundation of China under grant No. 61262033, 61262009; the Natural Science Foundation of Jiangxi Province, China under grant No. 20122BAB201032; the Science Foundation of Jiangxi Provincial Department of Education, China under grant No. GJJ13303, GJJ12259; and the Middle Age and Young Teachers Development Program of Undergraduate Universities in Jiangxi Province, China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, H. Mining top-k frequent patterns over data streams sliding window. J Intell Inf Syst 42, 111–131 (2014). https://doi.org/10.1007/s10844-013-0265-4

Download citation

Received: 31 October 2012
Revised: 01 July 2013
Accepted: 04 July 2013
Published: 18 July 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s10844-013-0265-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining top-k frequent patterns over data streams sliding window

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Stratified random sampling from streaming and stored data

On the nature and types of anomalies: a review of deviations in data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining top-k frequent patterns over data streams sliding window

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Stratified random sampling from streaming and stored data

On the nature and types of anomalies: a review of deviations in data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation