Skip to main content

Approximate Counting with a Floating-Point Counter

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6196))

Abstract

When many objects are counted simultaneously in large data streams, as in the course of network traffic monitoring, or Webgraph and molecular sequence analyses, memory becomes a limiting factor. Robert Morris [Communications of the ACM, 21:840–842, 1978] proposed a probabilistic technique for approximate counting that is extremely economical. The basic idea is to increment a counter containing the value X with probability 2− X. As a result, the counter contains an approximation of \(\lg n\) after n probabilistic updates, stored in \(\lg\lg n\) bits. Here we revisit the original idea of Morris. We introduce a binary floating-point counter that combines a d-bit significand with a binary exponent, stored together on \(d+\lg\lg n\) bits. The counter yields a simple formula for an unbiased estimation of n with a standard deviation of about 0.6·n2− d/2.

We analyze the floating-point counter’s performance in a general framework that applies to any probabilistic counter. In that framework, we provide practical formulas to construct unbiased estimates, and to assess the asymptotic accuracy of any counter.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM TOCS 21, 270–313 (2003)

    Article  Google Scholar 

  2. Stanojević, R.: Small active counters. In: Proceedings INFOCOM, pp. 2153–2161 (2007)

    Google Scholar 

  3. Donato, D., Laura, L., Leonardi, S., Millozzi, S.: Large-scale properties of the Webgraph. Eur. Phys. J. B 38, 239–243 (2004)

    Article  Google Scholar 

  4. Karlin, S.: Statistical signals in bioinformatics. PNAS 102, 13355–13362 (2005)

    Article  Google Scholar 

  5. Jones, N.C., Pevzner, P.A.: Comparative genomics reveals unusually long motifs in mammalian genomes. Bioinformatics 22, e236–e242 (2006)

    Article  Google Scholar 

  6. Rigoutsos, I., Huynh, T., Miranda, K., Tsirigos, A., McHardy, A., Platt, D.: Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes. PNAS 103, 6605–6610 (2006)

    Article  Google Scholar 

  7. Csűrös, M., Noé, L., Kucherov, G.: Reconsidering the significance of genomic word frequencies. Trends Genet. 23, 543–546 (2007)

    Article  Google Scholar 

  8. Sindi, S.S., Hunt, B.R., Yorke, J.A.: Duplication count distributions in DNA sequences. Phys. Rev. E 78, 61912 (2008)

    Article  Google Scholar 

  9. Ning, Z., Cox, A.J., Mullikin, J.C.: SSAHA: A fast search method for large DNA databases. Genome. Res. 11(10), 1725–1729 (2001)

    Article  Google Scholar 

  10. Flajolet, P.: Approximate counting: A detailed analysis. BIT 25, 113–134 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  11. Morris, R.: Counting large number of events in small registers. CACM 21(10), 840–842 (1978)

    MATH  Google Scholar 

  12. Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings AofA, DMTCS Proceedings, pp. 127–146 (2007)

    Google Scholar 

  13. Kirschenhoffer, P., Prodinger, H.: Approximate counting: An alternative approach. RAIRO ITA 25, 43–48 (1991)

    Google Scholar 

  14. Kruskal, J.B., Greenberg, A.G.: A flexible way of counting large numbers approximately in small registers. Algorithmica 6, 590–596 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  15. Karlin, S., Taylor, H.M.: A First Course in Stochastic Processes, 2nd edn. Academic Press, San Diego (1975)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Csűrös, M. (2010). Approximate Counting with a Floating-Point Counter. In: Thai, M.T., Sahni, S. (eds) Computing and Combinatorics. COCOON 2010. Lecture Notes in Computer Science, vol 6196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14031-0_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14031-0_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14030-3

  • Online ISBN: 978-3-642-14031-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics