Skip to main content

Improved Weighted Bloom Filter and Space Lower Bound Analysis of Algorithms for Approximated Membership Querying

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9050))

Included in the following conference series:

Abstract

The elements in a large universe \(U\) have different membership likelihoods and query frequencies in many applications. Thus, the number of hash functions assigned to each element of \(U\) in the traditional Bloom filter can be further optimized to minimize the average false positive rate. We propose an improved weighted Bloom filter (IWBF) that assigns an optimal number of hash functions to each element and has a less average false positive rate compared to the weighted Bloom filter. We show a tight space lower bound for any approximated membership querying algorithm that represents a small subset \(S\) of \(U\) and answers membership queries with predefined false positive rates, when the query frequencies and membership likelihoods of the elements in \(U\) are known. We also provide an approximate space lower bound for approximated membership querying algorithms that have an average false positive rate, and show that the number of bits used in IWBF is within a factor of \(1.44\) of the approximate space lower bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ablayev, F.: Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Computer Science 157, 139–159 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bar-Yossef, Z., Jayram, T., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity, vol. 68, pp. 702–732. Academic Press Inc. (2004)

    Google Scholar 

  3. Berinde, R., Indyk, P., Cormode, G., Strauss, M.J.: Space-optimal heavy hitters with strong error bounds. ACM Transactions on Database Systems (TODS) 35(4), 26 (2010)

    Article  Google Scholar 

  4. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  5. Bonomi, F., Mitzenmacher, M., Panigrah, R., Singh, S., Varghese, G.: Beyond bloom filters: from approximate membership checks to approximate state machines. In: ACM SIGCOMM Computer Communication Review, vol.36, pp. 315–326. ACM (2006)

    Google Scholar 

  6. Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting bloom filters. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 684–695. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. Internet Mathematics 1(4), 485–509 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bruck, J., Gao, J., Jiang, A.: Weighted bloom filter. In: 2006 IEEE International Symposium on Information Theory, pp. 2304–2308. IEEE (2006). Extented version in http://www.paradise.caltech.edu/papers/etr072.pdf

  9. Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of the Tenth Annual ACM Symposium on Theory of Computing, pp. 59–65. ACM (1978)

    Google Scholar 

  10. Chakrabarti, K., Chaudhuri, S., Ganti, V., Xin, D.: An efficient filter for approximate membership checking. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 805–818. ACM (2008)

    Google Scholar 

  11. Chung, F., Lu, L.: Concentration inequalities and martingale inequalities: a survey. Internet Mathematics 3(1), 79–127 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. The VLDB Journal 19(1), 3–20 (2010)

    Article  Google Scholar 

  13. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Transactions on Database Systems (TODS) 30(1), 249–278 (2005)

    Article  MathSciNet  Google Scholar 

  14. Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Deng, F., Rafiei, D.: Approximately detecting duplicates for streaming data using stable bloom filters. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2006)

    Google Scholar 

  16. Guo, D., Li, M.: Set reconciliation via counting bloom filters. IEEE Transactions on Knowledge and Data Engineering 25(10), 2367–2380 (2013)

    Article  Google Scholar 

  17. Guo, D., Liu, Y., Li, X., Yang, P.: False negative problem of counting bloom filter. IEEE Transactions on Knowledge and Data Engineering 22(5), 651–664 (2010)

    Article  Google Scholar 

  18. Guo, D., Wu, J., Chen, H., Yuan, Y., Luo, X.: The dynamic bloom filters. IEEE Transactions on Knowledge and Data Engineering 22(1), 120–133 (2010)

    Article  Google Scholar 

  19. Hua, Y., Xiao, B., Veeravalli, B., Feng, D.: Locality-sensitive bloom filter for approximate membership query. IEEE Transactions on Computers 61(6), 817–830 (2012)

    Article  MathSciNet  Google Scholar 

  20. Jayram, T.: Information complexity: a tutorial. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159–168. ACM (2010)

    Google Scholar 

  21. Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28(1), 51–55 (2003)

    Article  Google Scholar 

  22. Kirsch, A., Mitzenmacher, M.: Less hashing, same performance: building a better bloom filter. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 456–467. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  23. Liu, Y., Chen, W., Guan, Y.: Near-optimal approximate membership query over time-decaying windows. In: 2013 Proceedings IEEE, INFOCOM, pp. 1447–1455. IEEE (2013)

    Google Scholar 

  24. Metwally, A., Agrawal, D., El Abbadi, A.: Duplicate detection in click streams. In: Proceedings of the 14th International Conference on World Wide Web, pp. 12–21. ACM (2005)

    Google Scholar 

  25. Pagh, R., Rodler, F.F.: Cuckoo hashing. Journal of Algorithms 51(2), 122–144 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  26. Zhong, M., Lu, P., Shen, K., Seiferas, J.: Optimizing data popularity conscious bloom filters. In: Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing, pp. 355–364. ACM (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiujun Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, X., Ji, Y., Dang, Z., Zheng, X., Zhao, B. (2015). Improved Weighted Bloom Filter and Space Lower Bound Analysis of Algorithms for Approximated Membership Querying. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9050. Springer, Cham. https://doi.org/10.1007/978-3-319-18123-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18123-3_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18122-6

  • Online ISBN: 978-3-319-18123-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics