Abstract
Data stream processing is required to be an on-line, one-pass, and time and space efficient process. In this paper, we develop a framework for estimating equi-join query size based on the cosine transform. The discrete cosine transform (DCT) is able to provide concise and accurate representations of data distributions. It can also be updated easily in the presence of insertions and deletions. We have performed analyses and experiments to compare the DCT with sketch-based methods. The experimental results show that given the same amount of space, our method yields more accurate estimates than sketch methods most of the time. Experimental results have also confirmed that the cosine series can be updated quickly to cope with the rapid flow of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acharya, S., Gibbons, P., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. In: SIGMOD, pp. 275–286. ACM Press, New York (1999)
Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximation the Frequency Moments. In: Proc. of 28th Annual ACM STOC, May 1996, pp. 20–29 (1996)
Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-join Sizes in Limited Storage. In: Proc. of the 18th ACM PODS, May 1999, pp. 10–20 (1999)
Babu, S., Widom, J.: Continuous queries over data streams. SIGMOD Record 30(3), 109–120 (2001)
Bulut, A., Singh, A.K.: SWAT: Hierarchical stream summarization in large networks. In: IEEE 19th ICDE, pp. 303–314 (March 2003)
Dobra, A., Garofalakis, M., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data stream. In: ACM-SIGMOD, June 2002, pp. 61–72 (2002)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: Proc. of VLDB 2001, September 2001, pp. 79–88 (2001)
Ioannidis, Y., Christodoulakis, S.: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM TODS 18(4), 709–748 (1993)
Ioannidis, Y.E., Poosala, V.: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In: ACM SIGMOD, pp. 233–244 (1995)
Koudas, N., Muthukrishnan, S., Srivastava, D.: Optimal Histograms for Hierarchical Range Queries (Extended Abstract). In: PODS 2000, pp.196–204 (2000)
Lee, J.-H., Kim, D.-H., Chung, C.-W.: Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In: SIGMOD 1999, pp. 205–214 (1999)
Vitter, J.S., Wang, M.: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. In: SIGMOD, pp. 193–204 (1999)
Wu, Y.-L., Agrawal, D., Abbadi, A.E.: Applying the Golden Rule of Sampling for Query Estimation. In: ACM SIGMOD 2001, May 2001, pp. 449–460 (2001)
Yan, F., Hou, W.-C., Zhu, Q.: Selectivity Estimation Using Orthogonal Series. In: 8th DASFAA, March 2003, pp. 157–164 (2003)
Ganguly, S., Garofalakis, M., Rastogi, R.: Processing data-stream join aggregates using skimmed sketches. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 569–586. Springer, Heidelberg (2004)
Briggs, W.L., Henson, V.E.: DFT: an owner’s manual for the discrete Fourier transform. Society for Industrial and Applied Mathematics Published, Philadelphia (1995)
Jiang, Z., Hou, W., Feng, Y., Zhu, Q.: Estimating Aggregate Join Queries Over Data Streams Using Cosine Series, http://www.cs.siu.edu/~zjiang
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, Z., Luo, C., Hou, WC., Yan, F., Zhu, Q. (2006). Estimating Aggregate Join Queries over Data Streams Using Discrete Cosine Transform. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_18
Download citation
DOI: https://doi.org/10.1007/11827405_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37871-6
Online ISBN: 978-3-540-37872-3
eBook Packages: Computer ScienceComputer Science (R0)