Synonyms
CM Sketch
Definition
The Count-Min (CM) Sketch is a compact summary data structure capable of representing a high-dimensional vector and answering queries on this vector, in particular point queries and dot product queries, with strong accuracy guarantees. Such queries are at the core of many computations, so the structure can be used in order to answer a variety of other queries, such as frequent items (heavy hitters), quantile finding, join size estimation, and more. Since the data structure can easily process updates in the form of additions or subtractions to dimensions of the vector (which may correspond to insertions or deletions, or other transactions), it is capable of working over streams of updates, at high rates.
The data structure maintains the linear projection of the vector with a number of other random vectors. These vectors are defined implicitly by simple hash functions. Increasing the range of the hash functions increases the accuracy of the summary, and...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorith. 2005;55(1):58–75.
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming; 2002. p. 693–703.
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing; 1996. p. 20–9. Journal version in J Comput Syst Sci. 1999;58(1):137–47.
Estan C, Varghese G. New directions in traffic measurement and accounting. In: Proceedings of the ACM International Conference of the on Data Communication; 2002. p. 323–38.
Motwani R, Raghavan P. Randomized algorithms. Cambridge: Cambridge University Press; 1995.
Cormode G, Muthukrishnan S. Summarizing and mining skewed data streams. In: Proceedings of the 2005 SIAM International Conference on Data Mining; 2005.
Lee GM, Liu H, Yoon Y, Zhang Y. Improving sketch reconstruction accuracy using linear least squares method. In: Proceedings of the 5th ACM SIGCOMM Conference on Internet Measurement; 2005. p. 273–8.
Bhattacharrya S, Madeira A, Muthukrishnan S, Ye T. How to scalably skip past streams. In: Proceedings of the 1st International Workshop on Scalable Stream Processing Systems; 2007. p. 654–63.
Indyk P. Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms; 2003.
Lakshminath B, Ganguly S. Estimating entropy over data streams. In: Proceedings of the 14th European Symposium on Algorithms; 2006. p. 148–59.
Sarlós T, Benzúr A, Csalogány K, Fogaras D, Rácz B. To randomize or not to randomize: space optimal summaries for hyperlink analysis. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 297–306.
Spiegel J, Polyzotis N. Graph-based synopses for relational selectivity estimation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 205–16.
Rusu F, Dobra A. Statistical analysis of sketch estimators. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 187–98.
Cormode G, Muthukrishnan S. Space efficient mining of multigraph streams. In: Proceedings of the 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2005. p. 271–82.
Kollios G, Byers J, Considine J, Hadjieleftheriou M, Li F. Robust aggregation in sensor networks. Q Bull IEEE TC Data Eng. 2005;28(1):26–32.
Roughan M, Zhang Y. Secure distributed data mining and its application in large-scale network measurements. Computer Communication Review. 2006;36(1):7–14.
Cormode G, Korn F, Muthukrishnan S, Johnson T, Spatscheck O, Srivastava D. Holistic UDAFs at streaming speeds. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2004. p. 35–46.
Lai Y-K, Byrd GT. High-throughput sketch update on a low-power stream processor. In: Proceedings of the ACM/IEEE Symposium on Architecture for Networking and Communications Systems; 2006. p. 123–32.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Cormode, G. (2018). Count-Min Sketch. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_87
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_87
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering