Abstract
Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.
Similar content being viewed by others
References
Agrawal, C.C. (2009). On high dimensional projected clustering of uncertain data streams. In Proceedings of the IEEE 25th International Conference on Data Engineering, ICDE’09 (pp. 1152–1154). IEEE Press.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th international conference on Very Large Data Bases, VLDB (pp. 487–499).
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J. (2002). Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 1–16). ACM Press.
Chang, J.H. & Lee, W.S. (2006). Finding recently frequent itemsets adaptively over online transactional data streams. Information Systems, 31(8), 849–869.
Chen, H., Shu, L.C., Xia, J., Deng, Q. (2012a). Mining frequent patterns in a varying-size sliding window of online transactional data streams. Information Sciences, 215, 15–36.
Chen, L., Zou, L.-J., Tu, L. (2012b). A clustering algorithm for multiple data streams based on spectral component similarity. Information Sciences, 183(1), 35–47.
Cheung, Y.-L. & Fu, A.W.-C. (2004). Mining frequent itemsets without support threshold: with and without item constraints. IEEE Transactions on Knowledge and Data Engineering, 16(9), 1052–1069.
Fu, A.W.-C. & Wong, R.C.-W. (2006). Mining top-k frequent itemsets from data streams. Data Mining and Knowledge Discovery, 13(2), 193–217.
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S. (2003). Mining frequent patterns in data streams at multiple time granularities. Next generations on data mining (pp. 191–212).
Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD international conference of management of data (pp. 1–12). ACM Press.
Han, J., Wang, J., Lu, Y., Tzvetkov, P. (2002). Mining top-k frequent closed patterns without minimum support. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02) (pp. 211–218). IEEE Press.
Hashemi, S., Yang, Y., Mirzamomen, Z., Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transaction on Knowledge and Data Engineering, 21(5), 624–637.
Homem, N. & Carvalho, J.P. (2010). Finding top-k elements in data streams. Information Sciences, 180(24), 4958–4974.
Jea, K.-F. & Li, C.-W. (2010). A sliding-window based adaptive approximating method to discover recent frequent itemsets from data streams. Lecture Notes in Engineering and Computer Science, 2180(1), 532–539.
Karp, R.M., Shenker, S., Papadimitriou, C.H. (2003). A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS), 28(1), 51–55.
Kranen, P., Assent, I., Balduaf, C., Seidl, T. (2011). The clustree: indexing micro-clusters for anytime stream mining. Knowledge and Information Systems, 29(2), 249–272.
Lam, H.T. & Calders, T. (2010). Mining top-k frequent items in a data stream with flexible sliding windows. In Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 283–292). ACM Press.
Law, Y.-N., Wang, H., Zaniolo, C. (2011). Rational languages and data models for continuous queries on sequence and data streams. ACM Transactions on Database Systems, 36(2), 8:1–8:32.
Leung, C.K.-S. & Khan, Q.I. (2006). Dstree: a tree structure for the mining of frequent sets from data streams. In Proceedings of the 6th International Conference on Data Mining (ICDM’06) (pp. 928–932). IEEE Press.
Li, H.-F. (2009). A sliding window method for finding top-k path traversal patterns over streaming web click-sequences. Expert Systems with Applications, 36, 4382–4386.
Li, H.-F., Shan, M.-K., Lee, S.-Y. (2008). Dsm-fi: an efficient algorithm for mining frequent itemsets in data streams. Knowledge and Information Systems, 17(1), 79–97.
Manku, G.S. & Motwani, R. (2002). Approximate frequency counts over data streams. In Proceedings of the 28th VLDB conference (pp. 346–357). VLDB.
Metwally, A., Agrawal, D., Abbadi, A.E. (2005). Efficient computation of frequent and top-k elements in data streams. In Proceedings of the 10th international conference on database theory (pp. 398–412). Springer Press.
Mozafari, B. & Hetal Thakkar, C.Z. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of the IEEE 24th International Conference on Data Engineering, ICDE’08, (pp. 179–188). IEEE Press.
Mozafari, B., Thakkar, H., Zaniolo, C. (2008). Verifying and mining frequent patterns from large windows over data streams. In Proceedings of IEEE 24th international conference of data engineering (pp. 179–188). IEEE Press.
Park, N.H., Oh, S.H., Lee, W.S. (2010). Anomaly intrusion detection by clustering transactional audit streams in a host computer. Information Sciences, 180(12), 2375–2389.
Tanbeer, S.K., Ahmed, C.F., Jeong, B.-S., Lee, Y.-K. (2009). Sliding window-based frequent pattern mining over data streams. Information Sciences, 179(22), 3843–3865.
Tsai, P.S.M. (2010). Mining top-k frequent closed itemsets over data streams using the sliding window model. Expert Systems with Applications, 37, 6968–6973.
Wong, R.C.-W. & Fu, A.W.-C. (2005). Mining top-k itemsets over sliding window based zipfian distribution. In Proceedings of the 2005 SIAM international conference on data mining. SIAM Press.
Yu, J.X., Chong, Z., Lu, H., Zhangd, Z., Zhou, A. (2006). A false negative approach to mining frequent itemsets from high speed transactional data streams. Information Sciences, 176, 1986–2015.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partly supported by the National Science Foundation of China under grant No. 61262033, 61262009; the Natural Science Foundation of Jiangxi Province, China under grant No. 20122BAB201032; the Science Foundation of Jiangxi Provincial Department of Education, China under grant No. GJJ13303, GJJ12259; and the Middle Age and Young Teachers Development Program of Undergraduate Universities in Jiangxi Province, China.
Rights and permissions
About this article
Cite this article
Chen, H. Mining top-k frequent patterns over data streams sliding window. J Intell Inf Syst 42, 111–131 (2014). https://doi.org/10.1007/s10844-013-0265-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-013-0265-4