AMS sketches are randomized summaries of the data that can be used to compute aggregates such as the second frequency moment (the self-join size) and sizes of joins. AMS sketches can be viewed as random projections of the data in the frequency domain on ± 1 pseudo-random vectors. The key property of AMS sketches is that the product of projections on the same random vector of frequencies of the join attribute of two relations is an unbiased estimate of the size of join of the relations. While a single AMS sketch is inaccurate, multiple such sketches can be computed and combined using averages and medians to obtain an estimate of any desired precision.
The AMS sketches were introduced in 1996 by Noga Alon, Yossi Matias, and Mario Szegedy as part of a suit of randomized algorithms for approximate computation of frequency moments. The same authors, together with Phillip Gibbons, extended the second frequency...
- 2.Alon N., Matias Y., and Szegedy M. The space complexity of approximating the frequency moments. In: Proceeding of 28th Annual ACM Symposium on Theory of Computing; 1996, p. 20–29.Google Scholar
- 3.Charikar M., Chen K., and Farach-Colton M. Finding frequent items in data streams. In: Proceeding of 29th International Colloquium on Automata, Languages and Programming; 2002, p. 693–703.Google Scholar
- 4.Cormode G. and Garofalakis M. Sketching streams through the net: distributed approximate query tracking. In: Proceeding of 31st International Conference on Very Large Data Bases; 2005, p. 13–24.Google Scholar
- 5.Das A., Gehrke J., and Riedewald M. Approximation techniques for spatial data. In: Proceeding of ACM SIGMOD International Conference on Management of Data; 2004, p. 695–706.Google Scholar
- 6.Dobra A., Garofalakis M., Gehrke J., and Rastogi R. Processing complex aggregate queries over data streams. In: Proceeding of ACM SIGMOD International Conference on Management of Data; 2002, p. 61–72.Google Scholar
- 8.Rusu F. and Dobra A. Statistical analysis of sketch estimators. In: Proceedings ACM SIGMOD International Conference on Management of Data; 2007, p. 187–198.Google Scholar