Skip to main content
Log in

Mining sequential patterns from data streams: a centroid approach

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • MAIDS project: http://maids.ncsa.uiuc.edu/index.html.

  • Agrawal, R., & Srikant, R. (March 1995). Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), Taiwan.

  • Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD ’03: Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (pp. 487–492).

  • Chang, J. H., & Lee, W. S. (2005). Efficient mining method for retrieving sequential patterns over online data streams. Journal of Information Science, 31(5), 420–432.

    Article  Google Scholar 

  • Chen, Y., Dong, G., Han, J., Wah, B., & Wang, J. (2002). Multidimensional regression analysis of time-series data streams.

  • Chen, G., Wu, X., & Zhu, X. (2005). Mining sequential patterns across data streams. University of Vermont Computer Science Technical Report, CS-05-04.

  • Cormode, G., & Muthukrishnan, S. (2005). What’s hot and what’s not: Tracking most frequent items dynamically. In Proceedings of ACM Conference on Principles of Database Systems, volume 30(1) (pp. 249–278).

  • Datar, M., Gionis, A., Indyk, P., & Motwani, R. (2002). Maintaining stream statistics over sliding windows. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (pp 635–644).

  • Garofalakis, M., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD ’02: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data.

  • Giannella, C., Han, J., Pei, J., Yan, X., & Yu., P. S. (2003). Mining frequent patterns in data streams at multiple time granularities. In H. Kargupta, A. Joshi, Sivakumar K. & Y. Yesha (Eds.), Next generation data mining. Cambridge, Massachusetts: MIT.

    Google Scholar 

  • Hay, B., Wets, G., & Vanhoof, K. (2002). Web usage mining by means of multidimensional sequence alignment method. In WEBKDD (pp. 50–65).

  • Kum, H., Pei, J., Wang, W., & Duncan, D. (2003). ApproxMAP: Approximate mining of consensus sequential patterns. In Proceedings of SIAM International Conference on Data Mining. San Francisco, California.

  • Masseglia, F., Cathala, F., & Poncelet, P. (September 1998). The PSP Approach for mining sequential patterns. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. Nantes, France.

  • Masseglia, F., Poncelet, P., & Cicchetti, R. (April 2000). An efficient algorithm for web usage mining. Networking and Information Systems Journal (NIS).

  • Masseglia, F., Tanasa, D., & Trousse, B. (2004). Web usage mining: Sequential pattern extraction with a very low support. In 6th Asia-Pacific Web Conference. APWeb, Hangzhou, China.

  • Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. C. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In 17th International Conference on Data Engineering (ICDE).

  • Raissi, C., Poncelet, P., & Teisseire, M. (October 2005). Need for SPEED: Mining sequential pattens in data streams. In Actes des 21iemes Journees Bases de Donnees Avancees (BDA 2005).

  • Teng, W.-G., Chen, M.-S., & Yu, P. S. (2003). A regression-based temporal pattern mining scheme for data streams. In VLDB (pp. 93–104).

  • Wang, J. & Han, J. (March 2004). BIDE: Efficient mining of frequent closed sequences. In Proceedings of the International Conference on Data Engineering (ICDE’04). Boston, Massachusetts.

  • Xu, K., Zheng, Q., & Ma, S. (2003). When to update the sequential patterns of stream data? In 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (pp. 545–550).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alice Marascu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marascu, A., Masseglia, F. Mining sequential patterns from data streams: a centroid approach. J Intell Inf Syst 27, 291–307 (2006). https://doi.org/10.1007/s10844-006-9954-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-9954-6

Keywords

Navigation