Discussion on Fast and Accurate Sketches for Skewed Data Streams: A Case Study
Sketch is a probabilistic data structure designed for the estimation of item frequencies in a multiset, which is extensively used in data stream processing. The key metrics of sketches for data streams are accuracy, speed, and memory usage. There are various sketches in the literature, but most of them cannot achieve high accuracy, high speed and using limited memory at the same time for skewed datasets. Recently, two new sketches, the Pyramid sketch  and the OM sketch , have been proposed to tackle the problem. In this paper, we look closely at five different but important aspects of these two solutions and discuss the details on conditions and limits of their methods. Three of them, memory utilization, isolation and neutralization are related to accuracy; the other two: memory access and hash calculation are related to speed. We found that the new techniques proposed: automatic enlargement and hierarchy for accuracy, word acceleration and hash bit technique for speed play the central role in the improvement, but they also have limitations and side-effects. Other properties of working sketches such as deletion and generality are also discussed. Our discussions are supported by extensive experimental results, and we believe they can help in future development for better sketches.
KeywordsSketch Skewed data Data structure
This work was supported by Shenzhen Basic Research Program (JCYJ20160525 154348175), the Shenzhen Municipal Development and Reform Commission (Disciplinary Development Program for Data Science and Intelligent Computing) and Shenzhen Key Lab Project (ZDSYS20170303140513705).
- 2.Zhou, Y., Liu, P., Jin, H., Yang, T., Dang, S., Li, X.: One memory access sketch: a more accurate and faster sketch for per-flow measurement. In: IEEE GLOBECOM (2017)Google Scholar
- 4.Cormode, G., Johnson, T., Korn, F., Muthukrishnan, S., Spatscheck, O., Srivastava, D.: Holistic UDAFs at streaming speeds. In: ACM SIGMOD, pp. 35–46. ACM (2004)Google Scholar
- 6.Roy, P., Khan, A., Alonso, G.: Augmented sketch: faster and more accurate stream processing. In: ACM SIGMOD, pp. 1449–1463. ACM (2016)Google Scholar
- 9.Chen, A., Jin, Y., Cao, J., Li, L.E.: Tracking long duration flows in network traffic. In: IEEE INFOCOM, pp. 1–5. IEEE (2010)Google Scholar
- 10.Liu, Z., Manousis, A., Vorsanger, G., Sekar, V., Braverman, V.: One sketch to rule them all: rethinking network flow monitoring with UnivMon. In: ACM SIGCOMM, pp. 101–114. ACM (2016)Google Scholar
- 11.Gilbert, A.C., Strauss, M.J., Tropp, J.A., Vershynin, R.: One sketch for all: fast algorithms for compressed sensing. In: ACM STOC, pp. 237–246. ACM (2007)Google Scholar
- 12.Durme, B.V., Lall, A.: Probabilistic counting with randomized storage. In: IJCAI, pp. 1574–1579. Morgan Kaufmann Publishers Inc. (2009)Google Scholar
- 13.Polyzotis, N., Garofalakis, M., Ioannidis, Y.: Approximate XML query answers. In: ACM SIGMOD, pp. 263–274. ACM (2004)Google Scholar
- 15.Powers, D.M.W.: Applications and explanations of Zipf’s law. Adv. Neural. Inf. Process. Syst. 5(4), 595–599 (1998)Google Scholar
- 17.Yang, T., Liu, L., Yan, Y., Shahzad, M., Shen, Y., Li, X., Cui, B., Xie, G.: SF-sketch: a fast, accurate, and memory efficient data structure to store frequencies of data items. In: IEEE ICDE. IEEE (2017)Google Scholar
- 18.Graham, C.: Sketch techniques for approximate query processing. Found. Trends Databases (2011)Google Scholar
- 19.Qiao, Y., Li, T., Chen, S.: One memory access bloom filters and their generalization. Proc. IEEE INFOCOM 28(6), 1745–1753 (2011)Google Scholar