Skip to main content

Estimating Aggregate Join Queries over Data Streams Using Discrete Cosine Transform

  • Conference paper
Database and Expert Systems Applications (DEXA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4080))

Included in the following conference series:

Abstract

Data stream processing is required to be an on-line, one-pass, and time and space efficient process. In this paper, we develop a framework for estimating equi-join query size based on the cosine transform. The discrete cosine transform (DCT) is able to provide concise and accurate representations of data distributions. It can also be updated easily in the presence of insertions and deletions. We have performed analyses and experiments to compare the DCT with sketch-based methods. The experimental results show that given the same amount of space, our method yields more accurate estimates than sketch methods most of the time. Experimental results have also confirmed that the cosine series can be updated quickly to cope with the rapid flow of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. In: SIGMOD, pp. 275–286. ACM Press, New York (1999)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximation the Frequency Moments. In: Proc. of 28th Annual ACM STOC, May 1996, pp. 20–29 (1996)

    Google Scholar 

  3. Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-join Sizes in Limited Storage. In: Proc. of the 18th ACM PODS, May 1999, pp. 10–20 (1999)

    Google Scholar 

  4. Babu, S., Widom, J.: Continuous queries over data streams. SIGMOD Record 30(3), 109–120 (2001)

    Article  Google Scholar 

  5. Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: IEEE 19th ICDE, pp. 303–314 (March 2003)

    Google Scholar 

  6. Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data stream. In: ACM-SIGMOD, June 2002, pp. 61–72 (2002)

    Google Scholar 

  7. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: Proc. of VLDB 2001, September 2001, pp. 79–88 (2001)

    Google Scholar 

  8. Ioannidis, Y., Christodoulakis, S.: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM TODS 18(4), 709–748 (1993)

    Article  Google Scholar 

  9. Ioannidis, Y.E., Poosala, V.: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In: ACM SIGMOD, pp. 233–244 (1995)

    Google Scholar 

  10. Koudas, N., Muthukrishnan, S., Srivastava, D.: Optimal Histograms for Hierarchical Range Queries (Extended Abstract). In: PODS 2000, pp.196–204 (2000)

    Google Scholar 

  11. Lee, J.-H., Kim, D.-H., Chung, C.-W.: Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In: SIGMOD 1999, pp. 205–214 (1999)

    Google Scholar 

  12. Vitter, J.S., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. In: SIGMOD, pp. 193–204 (1999)

    Google Scholar 

  13. Wu, Y.-L., Agrawal, D., Abbadi, A.E.: Applying the Golden Rule of Sampling for Query Estimation. In: ACM SIGMOD 2001, May 2001, pp. 449–460 (2001)

    Google Scholar 

  14. Yan, F., Hou, W.-C., Zhu, Q.: Selectivity Estimation Using Orthogonal Series. In: 8th DASFAA, March 2003, pp. 157–164 (2003)

    Google Scholar 

  15. Ganguly, S., Garofalakis, M., Rastogi, R.: Processing data-stream join aggregates using skimmed sketches. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 569–586. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Briggs, W.L., Henson, V.E.: DFT: an owner’s manual for the discrete Fourier transform. Society for Industrial and Applied Mathematics Published, Philadelphia (1995)

    MATH  Google Scholar 

  17. Jiang, Z., Hou, W., Feng, Y., Zhu, Q.: Estimating Aggregate Join Queries Over Data Streams Using Cosine Series, http://www.cs.siu.edu/~zjiang

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, Z., Luo, C., Hou, WC., Yan, F., Zhu, Q. (2006). Estimating Aggregate Join Queries over Data Streams Using Discrete Cosine Transform. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_18

Download citation

  • DOI: https://doi.org/10.1007/11827405_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37871-6

  • Online ISBN: 978-3-540-37872-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics