Abstract
Per-flow cardinality measurement over big network data consisting of numerous flows is a fundamental problem with many practical applications. Traditionally the research on this problem focused on using a small amount of memory to estimate each flow’s cardinality from a large range (up to 109). However, although the memory needed for each flow has been greatly compressed, when there is an extremely large number of flows, the overall memory demand can still be very high, exceeding the availability under some important scenarios, such as implementing online measurement modules in network processors using only on-chip cache memory. In this chapter, instead of allocating a separated data structure (called estimator) for each flow, we take a different path by viewing all the flows together as a whole: Each flow is allocated with a virtual estimator, and these virtual estimators share a common memory space. We show that sharing at the register (multi-bit) level is superior than sharing at the bit level. We present a framework of virtual estimators that allows us to apply the idea of sharing to an array of cardinality estimation solutions, achieving far better memory efficiency than the best existing work. Experimental results show that the new solution can work in a tight memory space of less than 1 bit per flow or even one tenth of a bit per flow—a quest that has never been realized before.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bar-yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L., Luca: Counting distinct elements in a data stream. In: Proceedings of the RANDOM: Workshop on Randomization and Approximation (2002)
Beyer, K., Haas, P.J., Reinwald, B., Sismanis, Y., Gemulla, R.: On synopses for distinct-value estimation under multiset operations. In: Proceedings of the ACM SIGMOD (2007)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the Count-Min sketch and its applications. In: Proceedings of the LATIN (2004)
Costa, M., Crowcroft, J., Castro, M., Rowstron, A., Zhou, L., Zhang, L., Barham, P.: Vigilante: end-to-end containment of internet worms. SIGOPS Operat. Syst. Rev. 39 (5), 133–147 (2005)
Durand, M., Flajolet, P.: Loglog counting of large cardinalities. In: ESA: European Symposia on Algorithms, pp. 605–617 (2003)
Estan, C., Varghese, G.: New directions in traffic measurement and accounting. In: Proceedings of the ACM SIGCOMM (2002)
Estan, C., Varghese, G., Fish, M.: Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Trans. Netw. 14 (5), 925–937 (2006)
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for database applications. J. Comput. Syst. Sci. 31 (2), 182–209 (1985)
Flajolet, P., Fusy, E., Gandouet, O., Meunier., F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: Proceedings of the AOFA: International Conference on Analysis of Algorithms (2007)
Heule, S., Nunkesser, M., Hall, A.: HyperLogLog in practice: algorithmic engineering of a state-of-the-art cardinality estimation algorithm. In: Proceedings of the EDBT (2013)
Li, T., Chen, S., Ling, Y.: Fast and compact per-flow traffic measurement through randomized counter sharing. In: Proceedings of the IEEE INFOCOM, pp. 1799–1807 (2011)
Lieven, P., Scheuermann, B.: High-speed per-flow traffic measurement with probabilistic multiplicity counting. In: Proceedings of IEEE INFOCOM, pp. 1–9 (2010). doi:10.1109/INFCOM.2010.5461921
Lu, Y., Montanari, A., Prabhakar, B., Dharmapurikar, S., Kabbani, A.: Counter braids: a novel counter architecture for per-flow measurement. In: Proceedings of ACM SIGMETRICS (2008)
Neustar.biz: How to choose a good hash function: part 3. http://research.neustar.biz/2012/02/02/choosing-a-good-hash-function-part-3 (2012)
Ntarmos, N., Triantafillou, P., Weikum, G.: Counting at large: efficient cardinality estimation in internet-scale data networks. In: Proceedings of the ICDE, pp. 40–40 (2006). doi:10.1109/ICDE.2006.44
The CAIDA UCSD Anonymized 2013 Internet Traces - January 17 (2013). http://www.caida.org/data/passive/passive_2013_dataset.xml
Whang, K.Y., Vander-Zanden, B.T., Taylor, H.M.: A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst. 15 (2), 208–229 (1990)
Xiao, Q., Xiao, B., Chen, S.: Differential estimation in dynamic RFID systems. In: Proceedings of the INFOCOM (Mini-Conference), pp. 295–299 (2013)
Xiao, Q., Qiao, Y., Zhen, M., Chen, S.: Estimating the persistent spreads in high-speed networks. In: Proceedings of the IEEE ICNP, pp. 131–142 (2014)
Yoon, M., Li, T., Chen, S., Peir, J.K.: Fit a spread estimator in small memory. In: Proceedings of the IEEE INFOCOM (2009)
Zhao, Q., Xu, J., Kumar, A.: Detection of super sources and destinations in high-speed networks: algorithms, analysis and evaluation. IEEE JASC 24 (10), 1840–1852 (2006)
Zou, C.C., Gao, L., Gong, W., Towsley, D.: Monitoring and early warning for internet worms. In: Proceedings of the 10th ACM Conference on Computer and Communications Security (2003)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Chen, S., Chen, M., Xiao, Q. (2017). Per-Flow Cardinality Measurement. In: Traffic Measurement for Big Network Data. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-47340-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-47340-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47339-0
Online ISBN: 978-3-319-47340-6
eBook Packages: EngineeringEngineering (R0)