Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Delta Compression Techniques

  • Torsten SuelEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_63-1

Synonyms

Definition

Delta compression techniques encode a target file with respect to one or more reference files, such that a decoder who has access to the same reference files can recreate the target file from the compressed data. Delta compression is usually applied in cases where there is a high degree of redundancy between target and references files, leading to a much smaller compressed size than could be achieved by just compressing the target file by itself. Typical application scenarios include revision control systems and versioned file systems that store many versions of a file or software or content updates over networks where the recipient already has an older version of the data. Most work on delta compression techniques has focused on the case of textual and binary files, but the concept can also be applied to multimedia and structured data.

Delta compression should not be confused with Elias delta codes, a...

This is a preview of subscription content, log in to check access.

References

  1. Adler M, Mitzenmacher M (2001) Towards compressing web graphs. In: IEEE data compression conferenceGoogle Scholar
  2. Agarwal R, Amalapuraru S, Jain S (2004) An approximation to the greedy algorithm for differential compression of very large files. In: IEEE data compression conferenceGoogle Scholar
  3. Ajtai M, Burns R, Fagin R, Long D, Stockmeyer L (2002) Compactly encoding unstructured inputs with differential compression. J ACM 49(3):318–367MathSciNetCrossRefGoogle Scholar
  4. Alakuijala J, Szabadka Z (2016) Rfc7932: Brotli compressed data format. Available at https://tools.ietf.org/html/rfc7932
  5. Bagchi A, Bhargava A, Suel T (2006) Approximate maximum weighted branchings. Inf Process Lett 99(2): 54–58Google Scholar
  6. Banga G, Douglis F, Rabinovich M (1997) Optimistic deltas for WWW latency reduction. In: USENIX annual technical conferenceGoogle Scholar
  7. Bentley J, McIlroy D (1999) Data compression using long common strings. In: IEEE data compression conferenceGoogle Scholar
  8. Berliner B (1990) CVS II: Parallelizing software development. In: Winter 1990 USENIX conferenceGoogle Scholar
  9. Burrows M, Wheeler D (1994) A block-sorting lossless data compression algorithm. Technical report. 124, SRC. Digital Systems Research Center, Palo AltoGoogle Scholar
  10. Chan M, Woo T (1999) Cache-based compaction: a new technique for optimizing web transfer. In: INFOCOM conferenceGoogle Scholar
  11. Chang F, Dean J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber R (2006) Bigtable: a distributed storage system for structured data. In: Seventh symposium on operating system design and implementationGoogle Scholar
  12. Chen Y, Douglis F, Huang H, Vo K (2000) Topblend: an efficient implementation of HtmlDiff in Java. In: WebNet 2000 conferenceGoogle Scholar
  13. Douglis F, Haro A, Rabinovich M (1997) HPP: HTML macro-preprocessing to support dynamic document caching. In: USENIX symposium on internet technologies and systemsGoogle Scholar
  14. Drago I, Bocchi E, Mellia M, Slatman H, Pras A (2013) Benchmarking personal cloud storage. In: Internet measurement conferenceGoogle Scholar
  15. Ferragina P, Manzini G (2010) On compressing the textual web. In: ACM international conference on web search and data miningGoogle Scholar
  16. Gailly J (2017) zlib compression library, version 1.2.11. Available at https://zlib.net
  17. Housel B, Lindquist D (1996) WebExpress: a system for optimizing web browsing in a wireless environment. In: ACM conference on mobile computing and networking, pp 108–116Google Scholar
  18. Hunt J, Vo KP, Tichy W (1998) Delta algorithms: an empirical analysis. ACM Trans Softw Eng Methodol 7:192–213CrossRefGoogle Scholar
  19. Korn D, Vo KP (2002) Engineering a differencing and compression data format. In: USENIX annual technical conference, pp 219–228Google Scholar
  20. Kulkarni P, Douglis F, LaVoie J, Tracey JM (2014) Redundancy elimination within large collections of files. In: USENIX annual technical conferenceGoogle Scholar
  21. MacDonald J (2000) File system support for delta compression. MS thesis, University of California, BerkeleyGoogle Scholar
  22. Mogul JC, Douglis F, Feldmann A, Krishnamurthy B (1997) Potential benefits of delta-encoding and data compression for HTTP. In: ACM SIGCOMM conference, pp 181–196CrossRefGoogle Scholar
  23. Molfetas A, Wirth A, Zobel J (2014a) Scalability in recursively stored delta compressed collections of files. In: Second Australasian web conferenceGoogle Scholar
  24. Molfetas A, Wirth A, Zobel J (2014b) Using inter-file similarity to improve intra-file compression. In: IEEE international congress on big dataGoogle Scholar
  25. Motta G, Gustafson J, Chen S (2007) Differential compression of executable code. In: IEEE data compression conferenceGoogle Scholar
  26. Nakanishi T, Shih H, Hisazumi K, Fukuda A (2013) A software update scheme by airwaves for automotve equipment. In: International conference on information, electronics, and visionGoogle Scholar
  27. Ouyang Z, Memon N, Suel T, Trendafilov D (2002) Cluster-based delta compression of a collection of files. In: Third international conference on web information systems engineeringGoogle Scholar
  28. Percival C (2006) Matching with mismatches and assorted applications. PhD thesis, University of OxfordGoogle Scholar
  29. Rochkind M (1975) The source code control system. IEEE Trans Softw Eng 1:364–370CrossRefGoogle Scholar
  30. Samteladze N, Christensen K (2012) Delta: delta encoding for less traffic for apps. In: IEEE conference on local computer networksGoogle Scholar
  31. Savant A, Suel T (2003) Server-friendly delta compression for efficient web access. In: 8th international workshop on web content caching and distributionGoogle Scholar
  32. Shilane P, Huang M, Wallace G, Hsu W (2012) WAN optimized replication of backup datasets using stream-informed delta compression. In: USENIX symposium on file and storage technologiesCrossRefGoogle Scholar
  33. Tate S (1997) Band ordering in lossless compression of multispectral images. IEEE Trans Comput 46(45): 211–320MathSciNetCrossRefGoogle Scholar
  34. Tichy W (1984) The string-to-string correction problem with block moves. ACM Trans Comput Syst 2(4): 309–321CrossRefGoogle Scholar
  35. Tichy W (1985) RCS: a system for version control. Softw Pract Exp 15:637–654Google Scholar
  36. Trendafilov D, Memon N, Suel T (2002) zdelta: a simple delta compression tool. Technical report. Polytechnic University, CIS DepartmentGoogle Scholar
  37. Trendafilov D, NMemon, Suel T (2004) Compressing file collections with a TSP-based approach. Technical report TR-CIS-2004-02. Polytechnic UniversityGoogle Scholar
  38. Tridgell A (2000) Efficient algorithms for sorting and synchronization. PhD thesis, Australian National UniversityGoogle Scholar
  39. Wagner RA, Fisher MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173MathSciNetCrossRefGoogle Scholar
  40. Wang J, Guo Y, Huang B, Ma J, Mo Y (2008) Delta compression for information push services. In: International conference on advanced information networking and applications – workshopsGoogle Scholar
  41. Xia W, Jiang H, Feng D, Tian L (2014a) Combining deduplication and delta compression to achieve low-overhead data reduction on backup datasets. In: IEEE data compression conferenceGoogle Scholar
  42. Xia W, Jiang H, Feng D, Tian L, Fu M, Zhou Y (2014b) Ddelta: a deduplication-inspired fast delta compression approach. Perform Eval 79:258–272CrossRefGoogle Scholar
  43. Xia W, Li C, Jiang H, Feng D, Hua Y, Qin L, Zhang Y (2015) Edelta: a word-enlarging based fast delta compression approach. In: USENIX workshop on hot topics in storage and file systemsGoogle Scholar
  44. Xiao C, Bing B, Chang GK (2005) Delta compression for fast wireless internet downloads. In: IEEE GlobeComGoogle Scholar
  45. Ziv J, Lempel A (1977) A universal algorithm for data compression. IEEE Trans Inf Theory 23(3):337–343MathSciNetCrossRefGoogle Scholar
  46. Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans Inf Theory 24(5):530–536MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringTandon School of Engineering, New York UniversityBrooklynUSA

Section editors and affiliations

  • Paolo Ferragina
    • 1
  1. 1.Department of Computer ScienceUniversity of PisaPisaItaly