Advertisement

Locality-Sensitive Bloom Filter for Approximate Membership Query

  • Yu HuaEmail author
  • Xue Liu
Chapter

Abstract

In many network applications, Bloom filters are used to support exact-matching membership query for their randomized space-efficient data structure with a small probability of false answers. We extend the standard Bloom filter to Locality-Sensitive Bloom Filter (LSBF) to provide Approximate Membership Query (AMQ) service. We achieve this by replacing uniform and independent hash functions with locality-sensitive hash functions. Such replacement makes the storage in LSBF to be locality sensitive. Meanwhile, LSBF is space efficient and query responsive by employing the Bloom filter design. In the design of the LSBF structure, we propose a bit vector to reduce False Positives (FP). The bit vector can verify multiple attributes belonging to one member. We also use an active overflowed scheme to significantly decrease False Negatives (FN). Rigorous theoretical analysis (e.g., on FP, FN, and space overhead) shows that the design of LSBF is space compact and can provide accurate response to approximate membership queries. We have implemented LSBF in a real distributed system to perform extensive experiments using real-world traces. Experimental results show that LSBF, compared with a baseline approach and other state-of-the-art work in the literature (SmartStore and LSB-tree), takes less time to respond to AMQ and consumes much less storage space (\(\copyright \){2012}IEEE. Reprinted, with permission, from Ref. [1].).

References

  1. 1.
    Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. (TC) 61(6), 817–830 (2012)MathSciNetCrossRefGoogle Scholar
  2. 2.
    L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman, Exact and approximate membership testers, in Proceedings of STOC (1978), pp. 59–65Google Scholar
  3. 3.
    Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, Multi-probe lsh: efficient indexing for high-dimensional similarity search, in Proceedings of VLDB (2007), pp. 950–961Google Scholar
  4. 4.
    F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, G. Varghese, Beyond bloom filters: from approximate membership checks to approximate state machines, in Proceedings of ACM SIGCOMM (2006)Google Scholar
  5. 5.
    Y. Zhu, H. Jiang, False rate analysis of Bloom filter replicas in distributed systems, in Proceedings of ICPP (2006), pp. 255–262Google Scholar
  6. 6.
    W. Feng, D.D. Kandlur, D. Saha, K.G. Shin, Stochastic fair blue: a queue management algorithm for enforcing fairness, in Proceedings of INFOCOM (2001)Google Scholar
  7. 7.
    F.M. Cuenca-Acuna, C.Peery, R.P. Martin, T.D. Nguyen, PlantP: using gossiping to build content addressable peer-to-peer information sharing communities, in IEEE HPDC (2003)Google Scholar
  8. 8.
    A. Pagh, R. Pagh, S. Rao, An optimal bloom filter replacement, in Proceedings of SODA (2005), pp. 823–829Google Scholar
  9. 9.
    S. Dharmapurikar, P. Krishnamurthy, D.E. Taylor, Longest prefix matching using bloom filters, in Proceedings of ACM SIGCOMM (2003), pp. 201–212Google Scholar
  10. 10.
    A. Broder, M. Mitzenmacher, Using multiple hash functions to improve IP lookups, inProceedings of INFOCOM (2001), pp. 1454–1463Google Scholar
  11. 11.
    F. Baboescu, G. Varghese, Scalable packet classification. IEEE/ACM Trans. Netw. 13(1), 2–14 (2005)CrossRefGoogle Scholar
  12. 12.
    P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of STOC (1998), pp. 604–613Google Scholar
  13. 13.
    A. Kirsch, M. Mitzenmacher, Distance-sensitive bloom filters, in Proceedings of Algorithm Engineering and Experiments (ALENEX) (2006)Google Scholar
  14. 14.
    A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 1, 117–122 (2008)CrossRefGoogle Scholar
  15. 15.
    L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)CrossRefGoogle Scholar
  16. 16.
    M. Mitzenmacher, Compressed bloom filters. IEEE/ACM Trans. Netw. 10(5), 604–612 (2002)CrossRefGoogle Scholar
  17. 17.
    Y. Hua, Y. Zhu, H. Jiang, D. Feng, L. Tian, Scalable and adaptive metadata management in ultra large-scale file systems, in Proceedings of ICDCS (2008), pp. 403–410Google Scholar
  18. 18.
    A. Kumar, J.J. Xu, J. Wang, O. Spatschek, L.E. Li, Space-code bloom filter for efficient per-flow traffic measurement, in Proceedings of INFOCOM (2004), pp. 1762–1773Google Scholar
  19. 19.
    C. Saar, M. Yossi, Spectral bloom filters, Proceedings of ACM SIGMOD (2003), pp. 241–252Google Scholar
  20. 20.
    D. Guo, J. Wu, H. Chen, X. Luo, Theory and network application of dynamic bloom filters, in Proceedings of INFOCOM (2006)Google Scholar
  21. 21.
    B. Xiao, Y. Hua, Using parallel bloom filters for multi-attribute representation on network services. IEEE Trans. Parallel Distrib. Syst. 1, 20–32 (2010)CrossRefGoogle Scholar
  22. 22.
    H. Song, F. Hao, M. Kodialam, T.V. Lakshman, IPv6 lookups using distributed and load balanced bloom filters for 100Gbps core router line cards, in INFOCOM (2009)Google Scholar
  23. 23.
    F. Hao, M. Kodialam, T.V. Lakshman, H. Song, Fast multiset membership testing using combinatorial bloom filters, in Proceedings of INFOCOM (2009)Google Scholar
  24. 24.
    F. Hao, M. Kodialam, T.V. Lakshman, Incremental bloom filters, in Proceedings of INFOCOM (2008), pp. 1741–1749Google Scholar
  25. 25.
    A. Broder, M. Mitzenmacher, Network applications of bloom filters: a survey. Internet Math. 1, 485–509 (2005)MathSciNetCrossRefGoogle Scholar
  26. 26.
    A. Joly, O. Buisson, A posteriori multi-probe locality sensitive hashing, in Proceedings of ACM Multimedia (2008)Google Scholar
  27. 27.
    Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of ICPP (2008), pp. 644–651Google Scholar
  28. 28.
    M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Proceedings of the Annual Symposium on Computational Geometry (2004), pp. 253–262Google Scholar
  29. 29.
    A. Andoni, M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing using stable distributions, in Nearest Neighbor Methods in Learning and Vision: Theory and Practice, ed. by T. Darrell, P. Indyk, G. Shakhnarovich (MIT Press, 2006)Google Scholar
  30. 30.
    M. Charikar, Similarity estimation techniques from rounding algorithms, in Proceedings of STOC (2002), pp. 380–388Google Scholar
  31. 31.
    N. Agrawal, W. Bolosky, J. Douceur, J. Lorch, A five-year study of file-system metadata, in Proceedings of FAST (2007)CrossRefGoogle Scholar
  32. 32.
    The Forest CoverType dataset, UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/CovertypeGoogle Scholar
  33. 33.
    Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems, in Proceedings of ACM/IEEE Supercomputing Conference (SC) (2009)Google Scholar
  34. 34.
    Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of SIGMOD (2009)Google Scholar
  35. 35.
    A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of ACM SIGMOD (1984), pp. 47–57CrossRefGoogle Scholar
  36. 36.
    A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in textitVLDB (1999), pp. 518–529Google Scholar
  37. 37.
    A. Leung, I. Adams, E.L. Miller, Magellan: a searchable metadata architecture for large-scale file systems, in University of California, Santa Cruz, UCSC-SSRC-09-07 (2009)Google Scholar
  38. 38.
    V. Athitsos, M. Potamias, P. Papapetrou, G. Kollios, Nearest neighbor retrieval using distance-based hashing, in Proceedings of ICDE (2008)Google Scholar
  39. 39.
    Y. Hua, Y. Zhu, H. Jiang, D. Feng, L. Tian, Supporting scalable and adaptive metadata management in ultra large-scale file systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22(4), 580–593 (2011)CrossRefGoogle Scholar
  40. 40.
    J. Bruck, J. Gao, A. Jiang, Weighted bloom filter, in, Proceedings of the 2006 IEEE International Symposium on Information Theory (ISIT 2006) (2006), pp. 2304–2308Google Scholar
  41. 41.
    M. Zhong, P. Lu, K. Shen, J. Seiferas, Optimizing data popularity conscious bloom filters, in PODC (2008)Google Scholar
  42. 42.
    F. Hao, M. Kodialam, T. Lakshman, Building high accuracy Bloom filters using partitioned hashing, in Proceedings of SIGMETRICS (2007), pp. 277–288CrossRefGoogle Scholar
  43. 43.
    B. Donnet, B. Baynat, T. Friedman, Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives, in Proceedings of ACM CoNEXT (2006)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Huazhong University of Science and TechnologyWuhanChina
  2. 2.McGill UniversityMontrealCanada

Personalised recommendations