Skip to main content

Fast Approximate Wavelet Tracking on Streams

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3896))

Abstract

Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for general-purpose summaries of the streaming data distribution (e.g., wavelets). The focus of existing work has typically been on minimizing space requirements of the maintained synopsis — however, to effectively support high-speed data-stream analysis, a crucial practical requirement is to also optimize: (1) the update time for incorporating a streaming data element in the sketch, and (2) the query time for producing an approximate summary (e.g., the top wavelet coefficients) from the sketch. Such time costs must be small enough to cope with rapid stream-arrival rates and the real-time querying requirements of typical streaming applications (e.g., ISP network monitoring). With cheap and plentiful memory, space is often only a secondary concern after query/update time costs.

In this paper, we propose the first fast solution to the problem of tracking wavelet representations of one-dimensional and multi-dimensional data streams, based on a novel stream synopsis, the Group-Count Sketch (GCS). By imposing a hierarchical structure of groups over the data and applying the GCS, our algorithms can quickly recover the most important wavelet coefficients with guaranteed accuracy. A tradeoff between query time and update time is established, by varying the hierarchical structure of groups, allowing the right balance to be found for specific data stream. Experimental analysis confirms this tradeoff, and shows that all our methods significantly outperform previously known methods in terms of both update time and query time, while maintaining a high level of accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking join and self-join sizes in limited storage. In: ACM PODS (1999)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM STOC (1996)

    Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R.: Jennifer Widom. “Models and issues in data stream systems”. In: ACM PODS (2002)

    Google Scholar 

  4. Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: VLDB (2000)

    Google Scholar 

  5. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, p. 693. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  6. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: ACM PODS (2003)

    Google Scholar 

  7. Cormode, G., Muthukrishnan, S.: What’s new: Finding significant differences in networkdata streams. In: IEEE Infocom (2004)

    Google Scholar 

  8. Deligiannakis, A., Roussopoulos, N.: Extended wavelets for multiple measures. In: ACM SIGMOD (2003)

    Google Scholar 

  9. Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM SIGMOD (2002)

    Google Scholar 

  10. Garofalakis, M., Kumar, A.: Deterministic Wavelet Thresholding for Maximum-Error Metrics. In: ACM PODS (2004)

    Google Scholar 

  11. Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M.: One-pass wavelet decomposition of data streams. IEEE TKDE 15(3) (2003)

    Google Scholar 

  12. Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, smallspace algorithms for approximate histogram maintenance. In: ACM STOC (2002)

    Google Scholar 

  13. Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M.: How to summarize the universe: Dynamic maintenance of quantiles. In: VLDB (2002)

    Google Scholar 

  14. Guha, S., Harb, B.: Wavelet Synopsis for Data Streams: Minimizing non-Euclidean Error. In: KDD (2005)

    Google Scholar 

  15. Jahangiri, M., Sacharidis, D., Shahabi, C.: Shift-Split: I/O efficient maintenance of wavelet-transformed multidimensional data. In: ACM SIGMOD (2005)

    Google Scholar 

  16. Jawerth, B., Sweldens, W.: An Overview of Wavelet Based Multiresolution Analyses. SIAM Review 36(3) (1994)

    Google Scholar 

  17. Karras, P., Mamoulis, N.: One-pass wavelet synopses for maximum-error metrics. In: VLDB (2005)

    Google Scholar 

  18. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB (2002)

    Google Scholar 

  19. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: ACM SIGMOD (1998)

    Google Scholar 

  20. Muthukrishnan, S.: Data streams: algorithms and applications. In: SODA (2003)

    Google Scholar 

  21. Schmidt, R.R., Shahabi, C.: Propolyne: A fast wavelet-based technique for progressive evaluation of polynomial range-sum queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 664. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Stollnitz, E.J., Derose, T.D., Salesin, D.H.: Wavelets for computer graphics: theory and applications. Morgan Kaufmann Publishers, San Francisco (1996)

    Google Scholar 

  23. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: ACM SIGMOD (2002)

    Google Scholar 

  24. Vitter, J.S., Wang, M.: Approximate computation of multidimensional aggregates of sparse data using wavelets. In: ACM SIGMOD (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cormode, G., Garofalakis, M., Sacharidis, D. (2006). Fast Approximate Wavelet Tracking on Streams. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_4

Download citation

  • DOI: https://doi.org/10.1007/11687238_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32960-2

  • Online ISBN: 978-3-540-32961-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics