Sublinear Methods for Detecting Periodic Trends in Data Streams

  • Funda Ergun
  • S. Muthukrishnan
  • S. Cenk Sahinalp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2976)

Abstract

We present sublinear algorithms — algorithms that use significantly less resources than needed to store or process the entire input stream – for discovering representative trends in data streams in the form of periodicities. Our algorithms involve sampling Õ\((\sqrt{n})\) positions. and thus they scan not the entire data stream but merely a sublinear sample thereof. Alternately, our algorithms may be thought of as working on streaming inputs where each data item is seen once, but we store only a sublinear – Õ\((\sqrt{n})\) – size sample from which we can identify periodicities. In this work we present a variety of definitions of periodicities of a given stream, present sublinear sampling algorithms for discovering them, and prove that the algorithms meet our specifications and guarantees. No previously known results can provide such guarantees for finding any such periodic trends. We also investigate the relationships between these different definitions of periodicity.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Batu, T., Ergun, F., Kilian, J., Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.: A sublinear algorithm for weakly approximating edit distance. In: STOC 2003, pp. 316–324 (2003)Google Scholar
  2. 2.
    Gilbert, A., Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M.: Near-optimal sparse fourier representations via sampling. In: Proc. STOC 2002, pp. 152–161 (2002)Google Scholar
  3. 3.
    Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. Journal of the ACM 45(4), 653–750 (1998)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Rubinfeld, R.: Talk on sublinear algorithms, http://external.nj.nec.com/homepages/ronitt/
  5. 5.
    Rubinfeld, R., Sudan, M.: Robust Characterization of Polynomials with Applications to Program Testing. SIAM Journal of Computing 25(2), 252–271 (1996)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Indyk, P., Koudas, N., Muthukrishnan, S.: Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. In: Proc. VLDB 2000, pp. 363–372 (2000)Google Scholar
  7. 7.
    Das, G., Gunopoulos, D.: Time Series Similarity Measures, http://www.acm.org/sigs/sigkdd/kdd2000/Tutorial-Das.htm
  8. 8.
  9. 9.
    Olken, F., Rotem, D.: Random sampling from databases: A Survey. Bibliography, at http://pueblo.lbl.gov/olken/mendel/sampling/bibliography.html
  10. 10.
    Chaudhuri, S., Das, G., Datar, M., Motwani, R., Narasayya, V.: Overcoming Limitations of Sampling for Aggregation Queries. In: Proc. ICDE (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Funda Ergun
    • 1
  • S. Muthukrishnan
    • 2
  • S. Cenk Sahinalp
    • 1
  1. 1.Department of EECSCase Western Reserve University 
  2. 2.Department of Computer ScienceRutgers University 

Personalised recommendations