Abstract
Given a set of (possibly infinite) sequences, we consider the problem of detecting events where a subset of the sequences is correlated for a short period. In other words, we want to find cases where a number of the sequences output exactly the same substring at the same time. Such substrings, together with the sequences in which they are contained, form a local correlation pattern. In practice we only want to find patterns that are longer than γ and appear in at least σ sequences.
Our main contribution is an algorithm for mining such patterns in an online case, where the sequences are read in parallel one symbol at a time (no random access) and the patterns must be reported as soon as they occur.
We conduct experiments on both artificial and real data. The results show that the proposed algorithm scales well as the number of sequences increases. We also conduct a case study using a public EEG dataset. We show that the local correlation patterns capture essential features that can be used to automatically distinguish subjects diagnosed with a genetic predisposition to alcoholism from a control group.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa, S.: Online algorithms for mining semi-structured data stream. In: Proceedings of the 2002 IEEE International Conference on Data Mining, p. 27 (2002)
Das, G., Lin, K.-I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1998)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Granularities. In: Data Mining: Next Generation Challenges and Future Directions. MIT Press, Cambridge (2004)
Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings of the 15th International Conference on Data Engineering (ICDE 1999), pp. 106–115 (1999)
Kannathal, N., Acharya, U., Lim, C., Sadasivan, P.: Characterization of eeg – a comparative study. Computer Methods and Programs in Biomedicine 80(1), 17–23 (2005)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM 53(6), 918–936 (2006)
Keogh, E., Leonardi, S., Chiu, B.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 550–556 (2002)
Lin, J., Keogh, E., Lonardi, S., Patel, P.: Finding motifs in time series. In: Proceedings of the Second Workshop on Temporal Data Mining (2002)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th international conference on Very Large Data Bases, pp. 346–357 (2002)
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of Algorithms 23(2), 262–272 (1976)
Raïssi, C., Poncelet, P., Teisseire, M.: Speed: Mining maximal sequential patterns over data streams. In: Proceedings of the 3rd International IEEE Conference on Intelligent Systems, pp. 546–552 (2006)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Yang, J., Wang, W., Yu, P.S.: Mining asynchronous periodic patterns in time series data. IEEE Transactions on Knowledge Engineering 15(3), 613–628 (2003)
Zhang, X.L., Begleiter, H., Porjesz, B., Wang, W., Litke, A.: Event related potentials during object recognition tasks. Brain Research Bulletin 38(6), 531–538 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ukkonen, A. (2009). Mining Local Correlation Patterns in Sets of Sequences. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-04747-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04746-6
Online ISBN: 978-3-642-04747-3
eBook Packages: Computer ScienceComputer Science (R0)