Using Multiple Indexes for Efficient Subsequence Matching in Time-Series Databases

Lim, Seung-Hwan; Park, Hee-Jin; Kim, Sang-Wook

doi:10.1007/11733836_7

Seung-Hwan Lim¹⁹,
Hee-Jin Park¹⁹ &
Sang-Wook Kim¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3882))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1087 Accesses
11 Citations

Abstract

Time-series subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to resolve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of pairs 〈length, frequency 〉 of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of experiments with a real-life stock data set and a large volume of a synthetic data set to show the superiority of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Faloutsos, C., Swami, A.: Efficient Similarity Search in Sequence DataBases. In: Proc. Int’l Conf. on Foundations of Data Organization and Algorithms (FODO), Chicago, Illinois, October 1993, pp. 69–84 (1993)
Google Scholar
Agrawal, R., et al.: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Database. In: Proc. Int’l Conf. on Very Large Data Bases (VLDB), Zurich, Switzerland, September 1995, pp. 490–501 (1995)
Google Scholar
Beckmann, N., et al.: The R*-tree: An efficient and Robust Access Method for Points and Rectangles. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 322–331 (May 1990)
Google Scholar
Chan, K.P., Fu, A.W.C.: Efficient Time Series Matching by Wavelets. In: Proc. Int’l Conf. on Data Engineering (ICDE), Sydney, Australia, pp. 126–133. IEEE, Los Alamitos (1999)
Google Scholar
Chu, K.K.W., Wong, M.H.: Fast Time-Series Searching with Scaling and Shifting. In: Proc. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Philadelphia, Pennsylvania, pp. 237–248. ACM, New York (1999)
Google Scholar
Argyros, T., Ermopoulos, C.: Efficient Subsequence Matching in Time Series Databases Under Time and Amplitude Transformations. In: ICDM (2003)
Google Scholar
Faloutsos, C., et al.: Fast Subsequence Matching in Time-Series Databases. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp. 419–429 (May 1994)
Google Scholar
Goldin, D.Q., Kanellakis, P.C.: On Similarity Queries for Time-Series Data: Constraint Specification and Implementation. In: Proc. Int’l Conf. on Principles and Practice of Constraint Programming, Cassis, France, September 1995, pp. 137–153 (1995)
Google Scholar
Kahveci, T., Ambuj, K.S.: Variable Length Queries for Time Series Data. In: Proc. Int’l Conf. on Data Engineering (2001)
Google Scholar
Kahveci, T., Ambuj, K.S.: Optimizing Similarity Search for Arbitrary Length Time Series Queries. IEEE Trans. Knowl. Data Eng. 16(4), 418–433 (2004)
Article Google Scholar
Loh, W.K., Kim, S.W., Whang, K.Y.: A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases. Data Mining and Knowledge Discovery Journal 9(1), 5–28 (2004)
Article MathSciNet Google Scholar
Moon, Y.S., et al.: Duality-Based Subsequence Matching in Time-Series Databases. In: Proc. Int’l Conf. on Data Engineering (ICDE), pp. 263–272. Heidelberg, Germany, Los Alamitos (2001)
Chapter Google Scholar
Rafiei, D., Mendelzon, A.: Similarity-based Queries for Time-Series Data. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Tucson, Arizona, pp. 13–24 (June 1997)
Google Scholar
Rafiei, D.: On Similarity-Based Queries for Time Series Data. In: Proc. Int’l Conf. on Data Engineering (ICDE), Sydney, Australia, pp. 410–417. IEEE, Los Alamitos (1999)
Google Scholar
Weber, R., et al.: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In: Proc. Int’l Conf. on Very Large Data Bases (VLDB), pp. 194–205. New York (August 1998)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information and Communications, Hanyang University, Korea
Seung-Hwan Lim, Hee-Jin Park & Sang-Wook Kim

Authors

Seung-Hwan Lim
View author publications
You can also search for this author in PubMed Google Scholar
Hee-Jin Park
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Wook Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, Singapore
Mong Li Lee
School of Computing, National University of Singapore, Singapore
Kian-Lee Tan
School of Engineering and Technology, Asian Institute of Technology, P.O. Box 4, 12120, Klong Luang, Pathum Thani, Thailand
Vilas Wuwongse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, SH., Park, HJ., Kim, SW. (2006). Using Multiple Indexes for Efficient Subsequence Matching in Time-Series Databases. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_7

Download citation

DOI: https://doi.org/10.1007/11733836_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics