Abstract
Clustering time series is usually limited by the fact that the length of the time series has a significantly negative influence on the runtime. On the other hand, approximative clustering applied to existing compressed representations of time series (e.g. obtained through dimensionality reduction) usually suffers from low accuracy. We propose a method for the compression of time series based on mathematical models that explore dependencies between different time series. In particular, each time series is represented by a combination of a set of specific reference time series. The cost of this representation depend only on the number of reference time series rather than on the length of the time series. We show that using only a small number of reference time series yields a rather accurate representation while reducing the storage cost and runtime of clustering algorithms significantly. Our experiments illustrate that these representations can be used to produce an approximate clustering with high accuracy and considerably reduced runtime.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: Proceedings of the SIGMOD Conference, Boston, MA, pp. 47–57 (1984)
Faloutsos, C., Ranganathan, M., Maolopoulos, Y.: Fast Subsequence Matching in Time-series Databases. In: Proceedings of the SIGMOD Conference, Minneapolis, MN (1994)
Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search in Sequence Databases. In: Proc. 4th Conf. on Foundations of Data Organization and Algorithms (1993)
Wichert, S., Fokianos, K., Strimmer, K.: Identifying Periodically Expressed Transcripts in Microarray Time Series Data. Bioinformatics 20(1), 5–20 (2004)
Chan, K., Fu, W.: Efficient Time Series Matching by Wavelets. In: Proceedings of the 15th International Conference on Data Engineering (ICDE), Sydney, Australia (1999)
Yi, B.K., Faloutsos, C.: Fast Time Sequence Indexing for Arbitrary Lp Norms. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB), Cairo, Egypt (2000)
Cai, Y., Ng, R.: Index Spatio-Temporal Trajectories with Chebyshev Polynomials. In: Proceedings of the SIGMOD Conference (2004)
Korn, F., Jagadish, H., Faloutsos, C.: Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. In: Proceedings of the SIGMOD Conference, Tucson, AZ (1997)
Alter, O., Brown, P., Botstein, D.: Generalized Singular Value Decomposition for Comparative Analysis of Genome-Scale Expression Data Sets of two Different Organisms. Proc. Natl. Aca. Sci. USA 100, 3351–3356 (2003)
Keogh, E., Chakrabati, K., Mehrotra, S., Pazzani, M.: Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In: Proceedings of the SIGMOD Conference, Santa Barbara, CA (2001)
Bar-Joseph, Z., Gerber, G., Jaakkola, T., Gifford, D., Simon, I.: Continuous Representations of Time Series Gene Expression Data. J. Comput. Biol. 3-4, 341–356 (2003)
Ratanamahatana, C.A., Keogh, E., Bagnall, A.J., Lonardi, S.: A Novel Bit Level Time Series Representation with Implication for Similarity Search and Clustering. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, Springer, Heidelberg (2005)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Academic Press, London (2001)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Intelligent Information Systems Journal (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kriegel, HP., Kröger, P., Pryakhin, A., Renz, M., Zherdin, A. (2008). Approximate Clustering of Time Series Using Compact Model-Based Descriptions. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-78568-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78567-5
Online ISBN: 978-3-540-78568-2
eBook Packages: Computer ScienceComputer Science (R0)