Abstract
The elements in a large universe \(U\) have different membership likelihoods and query frequencies in many applications. Thus, the number of hash functions assigned to each element of \(U\) in the traditional Bloom filter can be further optimized to minimize the average false positive rate. We propose an improved weighted Bloom filter (IWBF) that assigns an optimal number of hash functions to each element and has a less average false positive rate compared to the weighted Bloom filter. We show a tight space lower bound for any approximated membership querying algorithm that represents a small subset \(S\) of \(U\) and answers membership queries with predefined false positive rates, when the query frequencies and membership likelihoods of the elements in \(U\) are known. We also provide an approximate space lower bound for approximated membership querying algorithms that have an average false positive rate, and show that the number of bits used in IWBF is within a factor of \(1.44\) of the approximate space lower bound.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ablayev, F.: Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Computer Science 157, 139–159 (1996)
Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity, vol. 68, pp. 702–732. Academic Press Inc. (2004)
Berinde, R., Indyk, P., Cormode, G., Strauss, M.J.: Space-optimal heavy hitters with strong error bounds. ACM Transactions on Database Systems (TODS) 35(4), 26 (2010)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Bonomi, F., Mitzenmacher, M., Panigrah, R., Singh, S., Varghese, G.: Beyond bloom filters: from approximate membership checks to approximate state machines. In: ACM SIGCOMM Computer Communication Review, vol.36, pp. 315–326. ACM (2006)
Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting bloom filters. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 684–695. Springer, Heidelberg (2006)
Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. Internet Mathematics 1(4), 485–509 (2004)
Bruck, J., Gao, J., Jiang, A.: Weighted bloom filter. In: 2006 IEEE International Symposium on Information Theory, pp. 2304–2308. IEEE (2006). Extented version in http://www.paradise.caltech.edu/papers/etr072.pdf
Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, pp. 59–65. ACM (1978)
Chakrabarti, K., Chaudhuri, S., Ganti, V., Xin, D.: An efficient filter for approximate membership checking. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 805–818. ACM (2008)
Chung, F., Lu, L.: Concentration inequalities and martingale inequalities: a survey. Internet Mathematics 3(1), 79–127 (2006)
Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. The VLDB Journal 19(1), 3–20 (2010)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Transactions on Database Systems (TODS) 30(1), 249–278 (2005)
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Deng, F., Rafiei, D.: Approximately detecting duplicates for streaming data using stable bloom filters. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2006)
Guo, D., Li, M.: Set reconciliation via counting bloom filters. IEEE Transactions on Knowledge and Data Engineering 25(10), 2367–2380 (2013)
Guo, D., Liu, Y., Li, X., Yang, P.: False negative problem of counting bloom filter. IEEE Transactions on Knowledge and Data Engineering 22(5), 651–664 (2010)
Guo, D., Wu, J., Chen, H., Yuan, Y., Luo, X.: The dynamic bloom filters. IEEE Transactions on Knowledge and Data Engineering 22(1), 120–133 (2010)
Hua, Y., Xiao, B., Veeravalli, B., Feng, D.: Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers 61(6), 817–830 (2012)
Jayram, T.: Information complexity: a tutorial. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159–168. ACM (2010)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28(1), 51–55 (2003)
Kirsch, A., Mitzenmacher, M.: Less hashing, same performance: building a better bloom filter. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 456–467. Springer, Heidelberg (2006)
Liu, Y., Chen, W., Guan, Y.: Near-optimal approximate membership query over time-decaying windows. In: 2013 Proceedings IEEE, INFOCOM, pp. 1447–1455. IEEE (2013)
Metwally, A., Agrawal, D., El Abbadi, A.: Duplicate detection in click streams. In: Proceedings of the 14th International Conference on World Wide Web, pp. 12–21. ACM (2005)
Pagh, R., Rodler, F.F.: Cuckoo hashing. Journal of Algorithms 51(2), 122–144 (2004)
Zhong, M., Lu, P., Shen, K., Seiferas, J.: Optimizing data popularity conscious bloom filters. In: Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing, pp. 355–364. ACM (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, X., Ji, Y., Dang, Z., Zheng, X., Zhao, B. (2015). Improved Weighted Bloom Filter and Space Lower Bound Analysis of Algorithms for Approximated Membership Querying. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9050. Springer, Cham. https://doi.org/10.1007/978-3-319-18123-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-18123-3_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18122-6
Online ISBN: 978-3-319-18123-3
eBook Packages: Computer ScienceComputer Science (R0)