Skip to main content

Flash Assisted Segmented Bloom Filter for Deduplication

  • Conference paper
Afro-European Conference for Industrial Advancement

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 334))

  • 2094 Accesses

Abstract

It is known that large amount of unique fingerprints are generated during deduplication for storage on the cloud. This large number of fingerprints is often tackled using bloom filters. Usually bloom filters are implemented using Random Access Memory (RAM) which is limited and expensive. This leads to higher false positive probability which decreases the deduplication efficiency. In this paper, a Flash Assisted Segmented Bloom Filter for Deduplication (FASBF) is proposed which implements its bloom filter (BF) on solid state drive where only part of the whole bloom filter will be kept in RAM while the full bloom filter is on solid state drive. This improves duplicate lookup in three ways. First the size of the bloom filter can be sufficiently large. Second, more number of hash functions can be used. And last, there will be more RAM space for fingerprint cache. This approach is evaluated using a prototype implemented on a study oriented network backup system. The result shows that this approach saves a considerable amount of memory space while satisfying an underlying 100MB/s backup throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Broder, A., Mitzenmacher, M.: Network Applications of Bloom Filters: A Survey (2003)

    Google Scholar 

  2. Canim, M., Mihalia, G.A., Bhattacharjee, B., Lang, C.A., Ross, K.A.: Buffered bloom filters on solid state storage (2010)

    Google Scholar 

  3. Debnath, B., Du, D.H.C., Lu, G.: A Forest-structured Bloom Filter with Flash Memory. In: Mass Storage Systems and Technologies, MSST (2011)

    Google Scholar 

  4. Mokbel, M.F., Lilja, D.J., Du, D., Debnath, B.: Deferred updates for flash-based storage. In: Mass Storage Systems and Technologies (MSST), Washington, DC, USA (2010)

    Google Scholar 

  5. Jiang, H., Zhou, K., Feng, D., Wei, J.: MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services. In: 26th IEEE MSST, Incline Village, NV, USA (May 2010)

    Google Scholar 

  6. Li., K., Zhu, B.: Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In: 6th USENIX Conference on File and Storage Technologies

    Google Scholar 

  7. Policroniades, C., Pratt, I.: Alternatives for Detecting Redundancy in Storage Systems Data. In: Proceedings of the 2004 USENIX Annual Technical Conference, Boston, MA, USA (June 2004)

    Google Scholar 

  8. Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The bloomier filter: an efficient data structure for static support lookup tables. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Descrete Algorithms, Philaderlphia, USA, pp. 30–39 (2004)

    Google Scholar 

  9. Meister, D., Brinkmann, A.: dedupv1: Improving deduplication throughput using solid state drives (SSD). In: MSST 2010 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), Washington, DC, USA, pp. 1-6 (2010)

    Google Scholar 

  10. Debnath, B., Sengupta, S., Li, J.: ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory. In: 2010 USENIX Annual Technical Conference (ATC) (June 2010)

    Google Scholar 

  11. Debnath, B., Sengupta, S., Li, J., Lilja, D.J., Du. BloomFlash, D.: Bloom Filter on Flash-based Storage. In: International Conference on Distributed Computing Systems (ICDCS), Minneapolis, USA (2011)

    Google Scholar 

  12. Tarkoma, S., Rothenberg, C.E., Lagerspetz, E.: Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Surveys & Tutorials 14(1), 131–155 (2012)

    Article  Google Scholar 

  13. Rothenberg, C.E., Macapuna, C.A.B., Verdi, F.L., Magalhães, M.F.: The deletable Bloom filter: a new member of the Bloom family. IEEE Communications Letters 14(6), 557–559 (2010)

    Article  Google Scholar 

  14. Xia, W., Jiang, H., Feng, D., Hua, Y.: Silo, A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In: USENIX Conference on USENIX Annual Technical Conference, Berkeley, CA, USA

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Girum Dagnaw .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Dagnaw, G., Teferi, A., Berhan, E. (2015). Flash Assisted Segmented Bloom Filter for Deduplication. In: Abraham, A., Krömer, P., Snasel, V. (eds) Afro-European Conference for Industrial Advancement. Advances in Intelligent Systems and Computing, vol 334. Springer, Cham. https://doi.org/10.1007/978-3-319-13572-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13572-4_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13571-7

  • Online ISBN: 978-3-319-13572-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics