Skip to main content

Identifying Similar Subsequences in Data Streams

  • Conference paper
Database and Expert Systems Applications (DEXA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5181))

Included in the following conference series:

Abstract

Similarity search has been studied in a domain of time series data mining, and it is an important technique in stream mining. Since sampling rates of streams are frequently different, and their time period varies in practical situations, the method which deals with time warping such as Dynamic Time Warping (DTW) is suitable for measuring similarity. However, finding pairs of similar subsequences between co-evolving sequences is difficult due to increase of the complexity because DTW is a method for detecting sequences that are similar to a given query sequence.

In this paper, we focus on the problem of finding pairs of similar subsequences and periodicity over data streams. We propose a method to detect similar subsequences in streaming fashion. Our approach for measuring similarity relies on a proposed scoring function that incrementally updates a score, which is suitable for data stream processing. We also present an efficient algorithm based on the scoring function. Our experiments on real and synthetic data demonstrate that our method detects the pairs of qualifying subsequence correctly and that it is dramatically faster than the existing method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient Similarity Search in Sequence Database. In: Lomet, D.B. (ed.) FODO 1993. LNCS, vol. 730, pp. 69–84. Springer, Heidelberg (1993)

    Google Scholar 

  2. Beckmann, N., Keriegel, H.P., Schneider, R., Segger, B.: The r*-tree: An efficient and robust access method for points and the rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 1990, pp. 322–331 (1990)

    Google Scholar 

  3. Berndt, D.J., Clifford, J.: Using Dynamic Time Warping to Find Patterns in Time Series. In: AAAI 1994 Workshop on Knowledge Discovery in Databases (KDD Workshop 1994), Seattle, Washington, USA, July 1994, pp. 359–370. AAAI Press, Menlo Park (1994)

    Google Scholar 

  4. Chan, K., Fu, A.W.-C.: Efficient Time Series Matching by Wavelets. In: Proceedings of the 15th International Conference on Data Engineering (ICDE 1999), Sydney, Austrialia, May 1999, pp. 126–133 (1999)

    Google Scholar 

  5. Chiu, B., Keogh, E., Lonardi, S.: Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington, DC, USA, August 2003, pp. 493–498 (2003)

    Google Scholar 

  6. Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Database. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 1994, pp. 419–429 (1994)

    Google Scholar 

  7. Gao, L., Wang, X.S.: Continually Evaluating Similarity-Based Pattern Queries on a Streaming Time Series. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 2002, pp. 370–381 (2002)

    Google Scholar 

  8. Indyk, P., Koudas, N., Muthukrishnam, S.: Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. In: Proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, September 2000, pp. 363–372 (2000)

    Google Scholar 

  9. Jang, J.-S.R., Lee, H.-R.: Hierarchical Filtering Method for Content-based Music Retrieval via Acoustic Input. In: Proceedings of the ninth ACM International Conference on Multimedia, Ottawa, Canada, September-October 2001, pp. 401–410 (2001)

    Google Scholar 

  10. Kawasaki, H., Yatabe, T., Ikeuchi, K., Sakauchi, M.: Automatic Modeling of a 3D City Map from Real-World Video. In: Proceedings of the seventh ACM International Conference on Multimedia (Part 1), Orlando, Florida, USA, October-November 1999, pp. 11–18 (1999)

    Google Scholar 

  11. Mount, D.W.: Bioinfomatics: Sequence and Genome Analysis. Cold Spring Harbor, New York (2000)

    Google Scholar 

  12. Papadimitriou, S., Brockwell, A., Faloutsos, C.: Adaptive, Hands-Off Stream Mining. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, pp. 560–571. Springer, Heidelberg (2004)

    Google Scholar 

  13. Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming Pattern Discovery in Multiple Time-Series. In: Proceedings of the 31th International Conference on Very Large Data Bases (VLDB 2005), Trondheim, Norway, August-September 2005, pp. 697–708 (2005)

    Google Scholar 

  14. Popivanov, I., Miller, R.J.: Similarity Search Over Time-Series Data Using Wavelets. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, CA, USA, February-March, 2002, pp. 212–221 (2002)

    Google Scholar 

  15. Sakurai, Y., Faloutsos, C., Yamamuro, M.: Stream Monitoring under the Time Warping Distance. In: Proceedings of IEEE 23rd International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, April 2007, pp. 1046–1055 (2007)

    Google Scholar 

  16. Sakurai, Y., Papadimitriou, S.: Braid: Stream Mining through Group Lag Correlations. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, June 2005, pp. 599–610 (2005)

    Google Scholar 

  17. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Google Scholar 

  18. Tanaka, Y., Uehara, K.: Discover motifs in multi-dimensional time-series using the principal component analysis and the MDL principle. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 252–265. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Wang, T.: TWStream: Finding Correlated Data Streams under Time Warping. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds.) APWeb 2006. LNCS, vol. 3841, pp. 213–225. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Zhu, Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 358–369. Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sourav S. Bhowmick Josef Küng Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Toyoda, M., Sakurai, Y., Ichikawa, T. (2008). Identifying Similar Subsequences in Data Streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2008. Lecture Notes in Computer Science, vol 5181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85654-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85654-2_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85653-5

  • Online ISBN: 978-3-540-85654-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics