Advertisement

Efficient Similarity Search for Time Series Data Based on the Minimum Distance

  • Sangjun Lee
  • Dongseop Kwon
  • Sukho Lee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2348)

Abstract

We address the problem of efficient similarity search based on the minimum distance in large time series databases. Most of previous work is focused on similarity matching and retrieval of time series based on the Euclidean distance. However, as we demonstrate in this paper, the Euclidean distance has limitations as a similarity measurement. It is sensitive to the absolute offsets of time sequences, so two time sequences that have similar shapes but with different vertical positions may be classified as dissimilar. The minimum distance is a more suitable similarity measurement than the Euclidean distance in many applications, where the shape of time series is a major consideration. To support minimum distance queries, most of previous work has the preprocessing step of vertical shifting that normalizes each time sequence by its mean before indexing. In this paper, we propose a novel and fast indexing scheme, called the segmented mean variation indexing(SMV-indexing). Our indexing scheme can match time series of similar shapes without vertical shifting and guarantees no false dismissals. Several experiments are performed on real data(stock price movement) to measure the performance of our indexing scheme. Experiments show that the SMV-indexing is more efficient than the sequential scanning in performance.

Keywords

Minimum Distance Singular Value Decomposition Discrete Wavelet Transform Time Series Data Discrete Fourier Transform 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami: Database Mining: A Performance Perspective. IEEE TKDE, Special issue on Learning and Discovery in Knowledge-Based Databases 5-6(1993) 914–925Google Scholar
  2. 2.
    Usama M. Fayyad, Gregory Piatetsky-Shapiroa and Padhraic Smyth: Knowledge Discovery and Data Mining: Towards a Unifying Framework. In Proc. of International Conference on Knowledge Discovery and Data Mining(1996) 82–88Google Scholar
  3. 3.
    A. Guttman: R-trees: A Dynamic Index Structure for Spatial Searching. In Proc. of SIGMOD Conference on Management of Data(1984) 47–57Google Scholar
  4. 4.
    N. Beckmann, H. P. Kriegel, R. Schneider and B. Seeger: The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In Proc. of SIGMOD Conference on Management of Data(1990) 322–331Google Scholar
  5. 5.
    Rakesh Agrawal, Christos Faloutsos and Arun N. Swami: Efficient Similarity Search In Sequence Databases. In Proc. of International Conference on Foundations of Data Organization and Algorithms(1993) 69–84Google Scholar
  6. 6.
    Christos Faloutsos, M. Ranganathan and Yannis. Manolopoulos: Fast Subsequence Matching in Time-Series Databases. In Proc. of SIGMOD Conference on Management of Data(1994) 419–429Google Scholar
  7. 7.
    Dina Q. Goldin, Paris C. Kanellakis: On Similarity Queries for Time-Series Data: Constraint Specification and Implementation. In Proc. of International Conference on Principles and Practice of Constraint Programming(1995) 137–153Google Scholar
  8. 8.
    Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney and Kyuseok Shim: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. In Proc. of International Conference on Very Large Data Bases(1995) 490–501Google Scholar
  9. 9.
    Chung-Sheng Li, Philip S. Yu and Vittorio Castelli: HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences. In Proc. of International Conference on Data Engineering(1996) 546–553Google Scholar
  10. 10.
    Flip Korn, H. V. Jagadish and Christos Faloutsos: Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. In Proc. of SIGMOD Conference on Management of Data(1997) 289–300Google Scholar
  11. 11.
    Davood Rafiei, Alberto O. Mendelzon: Similarity-Based Queries for Time Series Data. In Proc. of SIGMOD Conference on Management of Data(1997) 13–25Google Scholar
  12. 12.
    Tolga Bozkaya, Nasser Yazdani and Z. MeralOzsoyoglu: Matching and Indexing Sequences of Different Lengths. In Proc. of International Conference on Information and Knowledge Management(1997) 128–135Google Scholar
  13. 13.
    Nasser Yazdani and Z. Meral Ozsoyoglu: Sequence Matching of Images. In Proc. of International Conference on Scientific and Statistical Database Management(1996) 53–62Google Scholar
  14. 14.
    Gautam Das, Dimitrios Gunopulos and Heikki Mannila: Finding Similar Time Series. In Proc. of European Conference on Principles of Data Mining and Knowledge Discovery(1997) 88–100Google Scholar
  15. 15.
    Bela Bollobas, Gautam Das, Dimitrios Gunopulos and Heikki Mannila: Time-Series Similarity Problems and Well-Separated Geometric Sets. In Proc. of Symposium on Computational Geometry(1997) 454–456Google Scholar
  16. 16.
    Byoung-Kee Yi, H. V. Jagadish and Christos Faloutsos: Efficient Retrieval of Similar Time Sequences Under Time Warping. In Proc. of International Conference on Data Engineering(1998) 201–208Google Scholar
  17. 17.
    Davood Rafiei and Alberto O. Mendelzon: Efficient Retrieval of Similar Time Sequences Using DFT. In Proc. of International Conference on Foundations of Data Organization and Algorithms(1998)Google Scholar
  18. 18.
    Sze Kin Lam, Man Hon Wong: A Fast Projection Algorithm for Sequence Data Searching. Data and Knowledge Engineering 28-3(1998) 321–339CrossRefGoogle Scholar
  19. 19.
    Kelvin Kam Wing Chu, Sze Kin Lam and Man Hon Wong: An Efficient Hash-Based Algorithm for Sequence Data Searching. The Computer Journal 41-6(1998) 402–415CrossRefGoogle Scholar
  20. 20.
    Kelvin Kam Wing Chu, Man Hon Wong: Fast Time-Series Searching with Scaling and Shifting. In Proc. of Symposium on Principles of Database Systems(1999) 237–248Google Scholar
  21. 21.
    Davood Rafiei: On Similarity-Based Queries for Time-Series Data. In Proc. of International Conference on Data Engineering(1999) 410–417Google Scholar
  22. 22.
    Kin-pong Chan, Ada Wai-chee Fu: Efficient Time Series Matching by Wavelets. In Proc. of International Conference on Data Engineering(1999) 126–133Google Scholar
  23. 23.
    Eamonn J. Keogh, Michael J. Pazzani: A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining(2000) 122–133Google Scholar
  24. 24.
    Sanghyun Park, Wesley W. Chu, Jeehee Yoon and Chihcheng Hsu: Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases. In Proc. of International Conference on Data Engineering(2000) 23–32Google Scholar
  25. 25.
    Chang-Shing Perng, Haixun Wang, Sylvia R. Zhang and D. Stott Parker: Landmarks: a New Model for Similarity-based Pattern Querying in Time Series Databases. In Proc. of International Conference on Data Engineering(2000) 33–42Google Scholar
  26. 26.
    Byoung-Kee Yi, Christos Faloutsos: Fast Time Sequence Indexing for Arbitrary Lp Norms. In Proc. of International Conference on Very Large Data Bases(2000) 385–394Google Scholar
  27. 27.
    Eamonn J. Keogh, Kaushik Chakrabarti, Sharad Mehrotra and Michael J. Pazzani: Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In Proc. of SIGMOD Conference on Management of Data(2001) 151–162Google Scholar
  28. 28.
    M. H. Protter and C. B. Morrey: A First Course in Real Analysis. Springer-Verlag(1977)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Sangjun Lee
    • 1
  • Dongseop Kwon
    • 1
  • Sukho Lee
    • 1
  1. 1.School of Electrical Engineering and Computer ScienceSeoul National UniversitySeoulKorea

Personalised recommendations