Mining sequential patterns from data streams: a centroid approach

Marascu, Alice; Masseglia, Florent

doi:10.1007/s10844-006-9954-6

Mining sequential patterns from data streams: a centroid approach

Published: 21 November 2006

Volume 27, pages 291–307, (2006)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Alice Marascu¹ &
Florent Masseglia¹

273 Accesses
23 Citations
3 Altmetric
Explore all metrics

Abstract

In recent years, emerging applications introduced new constraints for data mining methods. These constraints are typical of a new kind of data: the data streams. In data stream processing, memory usage is restricted, new elements are generated continuously and have to be considered in a linear time, no blocking operator can be performed and the data can be examined only once. At this time, only a few methods has been proposed for mining sequential patterns in data streams. We argue that the main reason is the combinatory phenomenon related to sequential pattern mining. In this paper, we propose an algorithm based on sequences alignment for mining approximate sequential patterns in Web usage data streams. To meet the constraint of one scan, a greedy clustering algorithm associated to an alignment method is proposed. We will show that our proposal is able to extract relevant sequences with very low thresholds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

MAIDS project: http://maids.ncsa.uiuc.edu/index.html.
Agrawal, R., & Srikant, R. (March 1995). Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE’95), Taiwan.
Chang, J. H., & Lee, W. S. (2003). Finding recent frequent itemsets adaptively over online data streams. In KDD ’03: Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining (pp. 487–492).
Chang, J. H., & Lee, W. S. (2005). Efficient mining method for retrieving sequential patterns over online data streams. Journal of Information Science, 31(5), 420–432.
Article Google Scholar
Chen, Y., Dong, G., Han, J., Wah, B., & Wang, J. (2002). Multidimensional regression analysis of time-series data streams.
Chen, G., Wu, X., & Zhu, X. (2005). Mining sequential patterns across data streams. University of Vermont Computer Science Technical Report, CS-05-04.
Cormode, G., & Muthukrishnan, S. (2005). What’s hot and what’s not: Tracking most frequent items dynamically. In Proceedings of ACM Conference on Principles of Database Systems, volume 30(1) (pp. 249–278).
Datar, M., Gionis, A., Indyk, P., & Motwani, R. (2002). Maintaining stream statistics over sliding windows. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (pp 635–644).
Garofalakis, M., Gehrke, J., & Rastogi, R. (2002). Querying and mining data streams: You only get one look a tutorial. In SIGMOD ’02: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data.
Giannella, C., Han, J., Pei, J., Yan, X., & Yu., P. S. (2003). Mining frequent patterns in data streams at multiple time granularities. In H. Kargupta, A. Joshi, Sivakumar K. & Y. Yesha (Eds.), Next generation data mining. Cambridge, Massachusetts: MIT.
Google Scholar
Hay, B., Wets, G., & Vanhoof, K. (2002). Web usage mining by means of multidimensional sequence alignment method. In WEBKDD (pp. 50–65).
Kum, H., Pei, J., Wang, W., & Duncan, D. (2003). ApproxMAP: Approximate mining of consensus sequential patterns. In Proceedings of SIAM International Conference on Data Mining. San Francisco, California.
Masseglia, F., Cathala, F., & Poncelet, P. (September 1998). The PSP Approach for mining sequential patterns. In Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery. Nantes, France.
Masseglia, F., Poncelet, P., & Cicchetti, R. (April 2000). An efficient algorithm for web usage mining. Networking and Information Systems Journal (NIS).
Masseglia, F., Tanasa, D., & Trousse, B. (2004). Web usage mining: Sequential pattern extraction with a very low support. In 6th Asia-Pacific Web Conference. APWeb, Hangzhou, China.
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. C. (2001). PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In 17th International Conference on Data Engineering (ICDE).
Raissi, C., Poncelet, P., & Teisseire, M. (October 2005). Need for SPEED: Mining sequential pattens in data streams. In Actes des 21iemes Journees Bases de Donnees Avancees (BDA 2005).
Teng, W.-G., Chen, M.-S., & Yu, P. S. (2003). A regression-based temporal pattern mining scheme for data streams. In VLDB (pp. 93–104).
Wang, J. & Han, J. (March 2004). BIDE: Efficient mining of frequent closed sequences. In Proceedings of the International Conference on Data Engineering (ICDE’04). Boston, Massachusetts.
Xu, K., Zheng, Q., & Ma, S. (2003). When to update the sequential patterns of stream data? In 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (pp. 545–550).

Download references

Author information

Authors and Affiliations

INRIA Sophia Antipolis, 2004 route des Lucioles - BP 93, 06902, Sophia Antipolis, France
Alice Marascu & Florent Masseglia

Authors

Alice Marascu
View author publications
You can also search for this author in PubMed Google Scholar
Florent Masseglia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alice Marascu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marascu, A., Masseglia, F. Mining sequential patterns from data streams: a centroid approach. J Intell Inf Syst 27, 291–307 (2006). https://doi.org/10.1007/s10844-006-9954-6

Download citation

Published: 21 November 2006
Issue Date: November 2006
DOI: https://doi.org/10.1007/s10844-006-9954-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining sequential patterns from data streams: a centroid approach

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A survey of methods for time series change point detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A survey of methods for time series change point detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation