Abstract
The problem of efficiently finding patterns in massive time series databases has attracted great interest, and, at least for the Euclidean distance measure, may now be regarded as a solved problem. However in recent years there has been an increasing awareness that Euclidean distance is inappropriate for many real world applications. The limitations of Euclidean distance stems from the fact that it is very sensitive to distortions in the time axis. A partial solution to this problem, Dynamic Time Warping (DTW), aligns the time axis before calculating the Euclidean distance. However, DTW can only address the problem of local scaling. As we demonstrate in this work, uniform scaling may be just as important in many domains, including applications as diverse as bioinformatics, space telemetry monitoring and motion editing for computer animation. In this work, we demonstrate a novel technique to speed up similarity search under uniform scaling. As we will demonstrate, our technique is simple and intuitive, and can achieve a speedup of 2 to 3 orders of magnitude under realistic settings.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Aach, J., Church, G.: Aligning gene expression time series with time warping algorithms. Bioinformatics 17, 495–508 (2001)
Chan, K., Fu, A.W.: Efficient time series matching by wavelets. In: proceedings of the 15th IEEE Int’l Conference on Data Engineering, Sydney, Australia, pp. 126–133 (1999)
Chu, K., Lam, S., Wong, M.: An Efficient Hash-Based Algorithm for Sequence Data Searching. The Computer Journal 41(6), 402–415 (1998)
DeCoste, D., Levine, M.: Automated Event Detection in Space Instruments: A Case Study Using IPEX-2 Data and Support Vector Machines. In: SPIE Conference Astronomical Telescopes and Instrumentation (2000)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proc. ACM SIGMOD Conf., Minneapolis, pp. 419–429 (1994)
Gavrila, D.M., Davis, L.S.: Towards 3-d model-based tracking and recognition of human movement: a multi-view approach. In: International Workshop on Automatic Faceand Gesture-Recognition (1995)
Hegland, M., Clarke, W., Kahn, M.: Mining the MACHO dataset. Computer Physics Communications 142(1-3), 22–28 (2002)
Hetland, M.: A Survey of Recent Methods for Efficient Retrieval of Similar Time Sequences. To appear in an Edited Volume, Data Mining in Time Series Databases. Published by the World Scientific Publishing Company (2003)
Kahveci, T., Singh, A.: Variable length queries for time series data. In: proceedings of the 17th Int’l Conference on Data Engineering, Heidelberg, Germany, pp. 273–282 (2001)
Keogh, E.: Exact indexing of dynamic time warping. In: 28th International Conference on Very Large Data Bases, Hong Kong, pp. 406–417 (2002)
Keogh, E., et al.: Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems, 263–286 (2000)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proc of ACM SIGMOD Conference on Management of Data, pp. 151–162 (2001)
Keogh, E., Kasetty, S.: On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In: 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, pp. 102–111 (2002)
Park, S., Chu, W.W., Yoon, J., Hsu, C.: Efficient searches for similar subsequences of different lengths in sequence databases. In: Proceedings of the 16th Int’l Conference on Data Engineering, San Diego, CA, pp. 23–32 (2000)
Perng, C., Wang, H., Zhang, S., Parker, S.: Landmarks: a new model for similarity- based pattern querying in time series databases. In: proceedings of 16th International Conference on Data Engineering, pp. 33–42 (2000)
Rath, T., Manmatha, R.: Lower-Bounding of Dynamic Time Warping Distances for Multivariate Time Series. Tech Report MM-40, University of Massachusetts Amherst (2002)
Roddick, J.F., Spiliopoulou, M.: A Survey of Temporal Knowledge Discovery Paradigms and Methods. IEEE Tran’s on Knowledge and Data Engineering, 750–767 (2001)
Vlachos, M., Kollios, G., Gunopulos, G.: Discovering similar multidimensional trajectories. In: Proc 18th International Conference on Data Engineering (2002)
Zhu, Y., Shasha, D. (2003). Query by Humming: a Time Series Database Approach. To appear in SIGMOD (2003)
Zordan, V.B., Hodgins, J.K.: Motion capture-driven simulations that hit and react, ACM SIGGRAPH Symposium on Computer Animation (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Keogh, E. (2003). Efficiently Finding Arbitrarily Scaled Patterns in Massive Time Series Databases. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive