Synonyms
Flajolet-Martin algorithm; Flajolet-Martin sketch; FM sketch
Definition
Given a multi-set S of values from a domain \( \mathcal{D} \), the distinct-values estimation problem is to estimate the number of distinct values in S, using only one pass over S and only small working space. The FM Synopsis algorithm, developed by Flajolet and Martin in the mid-1980s [7], provides provably accurate distinct-values estimation using only \( O( \log (|\mathcal{D}|)) \) space. The basic technique makes use of a hash function h() that maps each value in \( \mathcal{D} \) to one of \( m\approx \log \left(|\mathcal{D}|\right) \) bit positions, according to a geometric distribution. Specifically, h() maps half the values in \( \mathcal{D} \) to position 0, one-quarter of the values in \( \mathcal{D} \) to position 1, one-eighth of the values in to position 2, and so on. The steps of the FM Synopsis algorithm are:
- 1.
Initialize a bit vector M of m bits to all 0s.
- 2.
For each item in S do: Set M[h(...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. J Comput Syst Sci. 1999;58(1):137–47.
Bunge J, Fitzpatrick M. Estimating the number of species: a review. J Am Stat Assoc. 1993;88(421):364–73.
Charikar M, Chaudhuri S, Motwani R, Narasayya V. Towards estimation error guarantees for distinct values. In: Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2000. p. 268–79.
Considine J, Li F, Kollios G, Byers J. Approximate aggregation techniques for sensor databases. In: Proceedings of the 20th International Conference on Data Engineering; 2004. p. 449–60.
Durand M, Flajolet P. Loglog counting of large cardinalities. In: Proceedings of the 11th European Symposium on Algorithms; 2003. p. 604–17.
Estan C., Varghes G., Fisk M. Bitmap algorithms for counting active flows on high speed links. In: Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement; 2003. p. 153–66.
Flajolet P, Martin GN. Probabilistic counting algorithms for data base applications. J Comput Syst Sci. 1985;31(2):182–209.
Gibbons PB. Distinct-values estimation over data streams. In: Garofalakis M, Gehrke J, Rastogi R, editors. Data stream management: processing high-speed data streams. Secaucus: Springer; 2009.
Nath S, Gibbons PB, Seshan S, Anderson Z. Synopsis diffusion for robust aggregation in sensor networks. ACM Trans Sensor Networks. 2008;4(2):1–40.
Palmer CR, Gibbons PB, Faloutsos C. ANF: a fast and scalable tool for data mining in massive graphs. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2002. p. 81–90.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Gibbons, P.B. (2018). FM Synopsis. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_164
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_164
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering