# Similarity-Based Queries for Time Series Databases

## Abstract

We consider the similarity problem in time series databases. Given a set of sequences, we are interested in finding those sequences whose behaviors are similar. Several methods have been proposed so far to solve this problem. Among them, [AFS93] mapped a time sequence to the frequency domain using the Discrete Fourier Transformation (DFT), and kept only first few coefficients. The similarity of two sequences is determined by the Euclidean distance between their coefficients. [ALS+95] (hereafter referred to as ALS approach) introduced a new model. Two time-series are said to be similar if they have enough non-overlapping time-ordered pairs of subsequences that are similar. Different from the ALS approach, [DGM97] used another method to find the longest common subsequence. The idea is, given a set of transformation functions, to try to find a linear function *y=ax+b* such that two subsequences *X, Y* are e-similar. However no paper up to now, as to our knowledge, had compared these algorithms on the same benchmark. This paper discusses and compares these three popular methods. By overcoming the drawbacks of these algorithms, a new algorithm is proposed. Experiments are conducted on the Nasdaq-100 market.

## Keywords

Window Size Discrete Fourier Transform Range Query Similarity Query Longe Common Subsequence## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Rakesh Agrawal, Christos Faloutsos and Arun N. Swami.
*Efficient Similarity Search in Sequence Databases*. In Proceedings of 4^{N}Foundations of Data Organization and Algorithms (FODO’93), Chicago, Illinois, USA, October 13–15, 1993, pp: 69–84.Google Scholar - 2.Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney and Kyuseok Shim.
*Fast Similarity Search in the Presence of Noise*,*Scaling*,*and Translation in Time-Series Databases*. In Proceedings of 21 th International Conference on Very Large Data Bases, September 11–15, 1995, Zurich, Switzerland, pp 490–501.Google Scholar - 3.Béla Bollobâs, Gautam Das, Dimitrios Gunopulos and Heikki Mannila.
*TimeSeries Similarity Problems and Well Separated Geometric Sets*. Proceedings of the Thirteenth Annual Symposium on Computational Geometry, June 4–6, 1997, Nice, France, pp 454 —456.Google Scholar - 4.Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider and Bernhard Seeger.
*The R*-tree: An Efficient and Robust Access Method for Points and Rectangles*. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23–25, 1990, pp: 322–331.Google Scholar - 5.Gautam Das, Dimitrios Gunopulos and Heikki Mannila.
*Finding Similar Time Series*. Principles of Data Mining and Knowledge Discovery, First European Symposium, PKDD ‘87, Trondheim, Norway, June 24–27, 1997, pp: 88–100.Google Scholar - 6.Gautam Das, King-Ip Lin, Heikki Mannila, Copal Renganathan and Padhraic Smyth.
*Rule Discovery from Time Series*. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD98), August 27–31, 1998, New York City, New York, USA, pp: 16–22.Google Scholar - 7.R. D. Edwards and J. Magee.
*Technical Analysis of Stock Trends*. John Magee, Springfield, Massachsetts, 1969.Google Scholar - 8.Christos Faloutsos, H. V. Jagadish, Alberto O. Mendelzon and Tova Milo. A
*Signature Technique for Similarity-Based Queries*. SEQUENCES 97, Positano-Salerno, Italy June 11–13 1997.Google Scholar - 9.Christos Faloutsos, M. Ranganathan and Yannis Manolopoulos.
*Fast Subsequence Matching in Time-Series Databases*. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 24–27, 1994, pp: 419–429.Google Scholar - 10.Dina Q. Goldin and Paris C. Kanellakis.
*On Similarity Queries for Time-Series Data: Constraint Specification and Implementation*. International Conference, CP’95, Cassis, France, September 19–22, 1995, pp 137–153.Google Scholar - 11.Jiawei Han, Guozhou Dong and Yiwen Yin.
*Efficient Mining of Partial Periodic Patterns in Time Series Database*. In Proceedings of the 15th International Conference on Data Engineering, 23–26 March 1999, Sydney, Australia, pp: 106–115.Google Scholar - 12.Flip Korn, H. V. Jagadish and Christos Faloutsos.
*Efficient Supporting Ad Hoc Queries in Large Datasets of Time Sequences*. In Proceedings ACM SIGMOD International Conference on Management of Data, May 13–15, 1997, Tucson, Arizona, USA, pp: 289–300.Google Scholar - 13.Heikki Mannila.
*Methods and Problems in Data Mining*. Database Theory - ICDT ‘87, 6th International Conference, Delphi, Greece, January 8–10, 1997, pp: 41–55.Google Scholar - 14.T.Sellis, N. Roussopoulos, and C. Faloutsos.
*The R+-tree: a dynamic index for multidimensional objects*. In Proceedings of the 13`^{°}International Conference on VLDB, England, 1987, pp: 507–518.Google Scholar - 15.Davood Rafiei.
*On Similarity-Based Queries for Time Series Data*. In Proceedings of the 15th International Conference on Data Engineering, 23–26 March 1999, Sydney, Austrialia, pp: 410–417.Google Scholar - 16.Byoung-Kee Yi, H. V. Jagadish and Christos Faloutsos.
*Efficient Retrieval of Similar Time-Sequences under Time Warping*. In Proceedings of the Fourteenth International Conference on Data Engineering, February 23–27, 1998, Orlando, Florida, USA, pp: 201–208.Google Scholar