Identifying Similar Subsequences in Data Streams

Toyoda, Machiko; Sakurai, Yasushi; Ichikawa, Toshikazu

doi:10.1007/978-3-540-85654-2_23

Machiko Toyoda¹,
Yasushi Sakurai² &
Toshikazu Ichikawa¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5181))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1194 Accesses
6 Citations

Abstract

Similarity search has been studied in a domain of time series data mining, and it is an important technique in stream mining. Since sampling rates of streams are frequently different, and their time period varies in practical situations, the method which deals with time warping such as Dynamic Time Warping (DTW) is suitable for measuring similarity. However, finding pairs of similar subsequences between co-evolving sequences is difficult due to increase of the complexity because DTW is a method for detecting sequences that are similar to a given query sequence.

In this paper, we focus on the problem of finding pairs of similar subsequences and periodicity over data streams. We propose a method to detect similar subsequences in streaming fashion. Our approach for measuring similarity relies on a proposed scoring function that incrementally updates a score, which is suitable for data stream processing. We also present an efficient algorithm based on the scoring function. Our experiments on real and synthetic data demonstrate that our method detects the pairs of qualifying subsequence correctly and that it is dramatically faster than the existing method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search in Sequence Database. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)
Google Scholar
Beckmann, N., Keriegel, H.P., Schneider, R., Segger, B.: The r*-tree: An efficient and robust access method for points and the rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 1990, pp. 322–331 (1990)
Google Scholar
Berndt, D.J., Clifford, J.: Using Dynamic Time Warping to Find Patterns in Time Series. In: AAAI 1994 Workshop on Knowledge Discovery in Databases (KDD Workshop 1994), Seattle, Washington, USA, July 1994, pp. 359–370. AAAI Press, Menlo Park (1994)
Google Scholar
Chan, K., Fu, A.W.-C.: Efficient Time Series Matching by Wavelets. In: Proceedings of the 15th International Conference on Data Engineering (ICDE 1999), Sydney, Austrialia, May 1999, pp. 126–133 (1999)
Google Scholar
Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC, USA, August 2003, pp. 493–498 (2003)
Google Scholar
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Database. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 1994, pp. 419–429 (1994)
Google Scholar
Gao, L., Wang, X.S.: Continually Evaluating Similarity-Based Pattern Queries on a Streaming Time Series. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 2002, pp. 370–381 (2002)
Google Scholar
Indyk, P., Koudas, N., Muthukrishnam, S.: Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. In: Proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, September 2000, pp. 363–372 (2000)
Google Scholar
Jang, J.-S.R., Lee, H.-R.: Hierarchical Filtering Method for Content-based Music Retrieval via Acoustic Input. In: Proceedings of the ninth ACM International Conference on Multimedia, Ottawa, Canada, September-October 2001, pp. 401–410 (2001)
Google Scholar
Kawasaki, H., Yatabe, T., Ikeuchi, K., Sakauchi, M.: Automatic Modeling of a 3D City Map from Real-World Video. In: Proceedings of the seventh ACM International Conference on Multimedia (Part 1), Orlando, Florida, USA, October-November 1999, pp. 11–18 (1999)
Google Scholar
Mount, D.W.: Bioinfomatics: Sequence and Genome Analysis. Cold Spring Harbor, New York (2000)
Google Scholar
Papadimitriou, S., Brockwell, A., Faloutsos, C.: Adaptive, Hands-Off Stream Mining. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, pp. 560–571. Springer, Heidelberg (2004)
Google Scholar
Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming Pattern Discovery in Multiple Time-Series. In: Proceedings of the 31th International Conference on Very Large Data Bases (VLDB 2005), Trondheim, Norway, August-September 2005, pp. 697–708 (2005)
Google Scholar
Popivanov, I., Miller, R.J.: Similarity Search Over Time-Series Data Using Wavelets. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, CA, USA, February-March, 2002, pp. 212–221 (2002)
Google Scholar
Sakurai, Y., Faloutsos, C., Yamamuro, M.: Stream Monitoring under the Time Warping Distance. In: Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 2007, pp. 1046–1055 (2007)
Google Scholar
Sakurai, Y., Papadimitriou, S.: Braid: Stream Mining through Group Lag Correlations. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, June 2005, pp. 599–610 (2005)
Google Scholar
Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Google Scholar
Tanaka, Y., Uehara, K.: Discover motifs in multi-dimensional time-series using the principal component analysis and the MDL principle. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 252–265. Springer, Heidelberg (2003)
Chapter Google Scholar
Wang, T.: TWStream: Finding Correlated Data Streams under Time Warping. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds.) APWeb 2006. LNCS, vol. 3841, pp. 213–225. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhu, Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 358–369. Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Information Sharing Platform laboratories, NTT Corporation, 9–11, Midori-cho 3-Chome Musashino-shi, Tokyo, 180–8585, Japan
Machiko Toyoda & Toshikazu Ichikawa
NTT Communication Science Laboratories, NTT Corporation, 2–4, Hikaridai, Seika-cho, Keihanna Science City, Kyoto, 619–0237, Japan
Yasushi Sakurai

Authors

Machiko Toyoda
View author publications
You can also search for this author in PubMed Google Scholar
Yasushi Sakurai
View author publications
You can also search for this author in PubMed Google Scholar
Toshikazu Ichikawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sourav S. Bhowmick Josef Küng Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toyoda, M., Sakurai, Y., Ichikawa, T. (2008). Identifying Similar Subsequences in Data Streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2008. Lecture Notes in Computer Science, vol 5181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85654-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-540-85654-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85653-5
Online ISBN: 978-3-540-85654-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics