Abstract
This text is an informal review of several randomized algorithms that have appeared over the past two decades and have proved instrumental in extracting efficiently quantitative characteristics of very large data sets. The algorithms are by nature probabilistic and based on hashing. They exploit properties of simple discrete probabilistic models and their design is tightly coupled with their analysis, itself often founded on methods from analytic combinatorics. Singularly efficient solutions have been found that defy information theoretic lower bounds applicable to deterministic algorithms. Characteristics like the total number of elements, cardinality (the number of distinct elements), frequency moments, as well as unbiased samples can be gathered with little loss of information and only a small probability of failure. The algorithms are applicable to traffic monitoring in networks, to data base query optimization, and to some of the basic tasks of data mining. They apply to massive data streams and in many cases require strictly minimal auxiliary storage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)
Andrews, G.E., Crippa, D., Simon, K.: q-series arising from the study of random graphs. SIAM Journal on Discrete Mathematics 10(1), 41–56 (1997)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 1–16 (2002)
Bertoin, J., Biane, P., Yor, M.: Poissonian exponential functionals, q-series, q-integrals, and the moment problem for log-normal distributions. Tech. Rep. PMA-705, Laboratoire de Probabilitś et Modèles Aléatoires, Université Paris VI (2002)
Durand, M.: Combinatoire analytique et algorithmique des ensembles de données. PhD thesis, École Polytechnique, France (2004)
Durand, M., Flajolet, P.: Loglog counting of large cardinalities. In: Di Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 605–617. Springer, Heidelberg (2003)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of SIGCOMM 2002. ACM Press, New York (2002); Also: UCSD technical report CS2002-0699, February, available electronically (2002)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Transactions on Computer Systems 21(3), 270–313 (2003)
Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high speed links. In: Technical Report CS2003-0738, UCSD (March 2003); Available electronically. Summary in ACM SIGCOMM Computer Communication Review 32(3), 10 (July 2002)
Finch, S.: Mathematical Constants. Cambridge University Press, New-York (2003)
Flajolet, P.: Approximate counting: A detailed analysis. BIT 25, 113–134 (1985)
Flajolet, P.: On adaptive sampling. Computing 34, 391–400 (1990)
Flajolet, P., Gourdon, X., Dumas, P.: Mellin transforms and asymptotics: Harmonic sums. Theoretical Computer Science 144(1-2), 1–2 (1995)
Flajolet, P., Martin, G.N.: Probabilistic counting. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science, pp. 76–82. IEEE Computer Society Press, Los Alamitos (1983)
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31(2), 182–209 (1985)
Flajolet, P., Sedgewick, R.: Analytic Combinatorics (2004); Book in preparation; Individual chapters are available electronically
Guillemin, F., Robert, P., Zwart, B.: AIMD algorithms and exponential functionals. Annals of Applied Probability 14(1), 90–117 (2004)
Hofri, M.: Analysis of Algorithms: Computational Methods and Mathematical Tools. Oxford University Press, Oxford (1995)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 189–197 (2000)
Jacquet, P., Szpankowski, W.: Analytical de-Poissonization and its applications. Theoretical Computer Science 201(1-2), 1–62 (1998)
Knuth, D.E.: The Art of Computer Programming, 3rd edn. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1998)
Knuth, D.E.: The Art of Computer Programming, 2nd edn. Sorting and Searching, vol. 3. Addison-Wesley, Reading (1998)
Morris, R.: Counting large numbers of events in small registers. Communications of the ACM 21(10), 840–842 (1977)
Prodinger, H.: Approximate counting via Euler transform. Mathematica Slovaka 44, 569–574 (1994)
Sedgewick, R., Flajolet, P.: An Introduction to the Analysis of Algorithms. Addison-Wesley Publishing Company, Reading (1996)
Szpankowski, W.: Average-Case Analysis of Algorithms on Sequences. John Wiley, New York (2001)
Vitter, J.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1) (1985)
Whang, K.-Y., Vander-Zanden, B., Taylor, H.: A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems 15(2), 208–229 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Flajolet, P. (2004). Counting by Coin Tossings. In: Maher, M.J. (eds) Advances in Computer Science - ASIAN 2004. Higher-Level Decision Making. ASIAN 2004. Lecture Notes in Computer Science, vol 3321. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30502-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-30502-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24087-7
Online ISBN: 978-3-540-30502-6
eBook Packages: Computer ScienceComputer Science (R0)