Skip to main content

SAFE: Structure-Aware File and Email Deduplication for Cloud-Based Storage Systems

  • Chapter
  • First Online:
Data Deduplication for Data Optimization for Storage and Network Systems

Abstract

In this chapter, we introduce Structure-Aware File and Email Deduplication for Cloud-based Storage Systems (SAFE), a client-based deduplication technique that is fast and provides the same space savings as variable-size block deduplication using structure-based granularity rather than physical chunk granularity for cloud-based storage. Cloud-based storage, including Dropbox (http://www.dropbox.com), JustCloud (http://www.justcloud.com/), and Mozy (http://mozy.com/), has become popular because people can access data at any time, anywhere and using various types of devices such as laptops, tablets and smartphones. Cloud-based storage services use deduplication techniques to avoid sending and storing duplicate files (or blocks), reducing network bandwidth and storage space, which gives the subsequent benefit of data upload speed. Existing deduplication techniques (file-level and fixed-size block deduplication) that cloud-based storage uses are fast and have low index overhead but find fewer redundancies than variable-size block deduplication. However, owing to excessive CPU and memory overhead from chunking, indexing and fragmentation, variable-size block deduplication cannot be used for cloud-based storage. Thus, we developed SAFE, Structure-Aware File and Email Deduplication, that achieves both fast speeds and shows good space savings in clients by using structure-based granularity for cloud-based storage systems. Evaluation results show that SAFE has as good storage space savings as existing variable-size block deduplication while being as fast as file-level or a large fixed-size block deduplication technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adobe: ISO32000: Document management:Portable document format. http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf (2008)

  2. Amazon: Amazon simple storage service. http://aws.amazon.com/s3/ (2016)

  3. Drago, I., Mellia, M., Munafo, M.M., Sperotto, A., Sadre, R., Pras, A.: Inside dropbox: understanding personal cloud storage services. In: Proceeding of the 2012 ACM Conference on Internet Measurement Conference (IMC), pp. 481–494 (2012)

    Google Scholar 

  4. Dropbox: http://www.dropbox.com (2016)

  5. Dropbox: REST API. https://www.dropbox.com/developers/core/docs (2016)

  6. ECMA: Standard ECMA-376: office open XML file formats. http://www.ecma-international.org/publications/standards/Ecma-376.htm (2012)

  7. Freed, N., Borenstein, N.S.: Multipurpose Internet Mail Extensions (MIME) part one: format of internet message bodies. http://tools.ietf.org/html/rfc2045 (1996)

  8. ISO, IEC: ISO/IEC 29500-1:2008. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51463 (2008)

  9. JustCloud: http://www.justcloud.com/ (2016)

  10. Milter.org: Sendmail mail filters. http://www.sendmail.com/sm/partners/milter_partners/open_source_milter_partners/ (2015)

  11. Mozy: http://mozy.com/ (2016)

  12. National Institute of Standards and Technology (NIST): Secure Hash Standard 1 (SHA1). http://csrc.nist.gov/publications/fips/fips180-4/fips-180-4.pdf (2015)

  13. National Institute of Standards and Technology (NIST): Secure Hash Standard 256 (SHA256). http://csrc.nist.gov/publications/fips/fips180-3/fips180-3_final.pdf (2008)

  14. PKWARE: ZIP file format specification. http://www.pkware.com/documents/casestudies/APPNOTE.TXT (2014)

  15. Sendmail.com: Sendmail. http://www.sendmail.com/sm/open_source/ (2016)

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kim, D., Song, S., Choi, BY. (2017). SAFE: Structure-Aware File and Email Deduplication for Cloud-Based Storage Systems. In: Data Deduplication for Data Optimization for Storage and Network Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-42280-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42280-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42278-7

  • Online ISBN: 978-3-319-42280-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics