A Deduplication Algorithm Based on Data Similarity and Delta Encoding
Satellite applications such as remote sensing application are overwhelmed with vast quantities of data. Nevertheless, the storage resources in the satellite are so limited that it should be used more efficient. The similarity between the remote sensing data is high, but the dissimilar parts of the data distribute irregularly. When using the traditional deduplication algorithm to split the file into chunks, a large amount of chunks are exactly similar but not the same, which results in the bad effect of data deduplication. We propose a deduplication algorithm based on data similarity and delta encoding to reduce the usage of storage resources. The data similarity analysis can find out the similar data. The delta encoding technology can reduce the usage of storage resources. Through experiments on remote sensing application data, we have achieved deduplication ratios up to 30:1, and analyzed how the chunksize affect the experiment results.
KeywordsDeduplication Similarity Delta encoding Satellite
This work was supported by the National Natural Science Foundation of China under Grant No. 61370059, the National Natural Science Foundation of China under Grant No. 61232009, Beijing Natural Science Foundation under Grant No. 4152030, the fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2016ZX-13, the Open Research Fund of The Academy of Satellite Application under Grant No. Y20A-E03 and the Open Project Program of National Engineering Research Center for Science & Technology Resources Sharing Service (Beihang University).
- 2.Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. ACM Trans. Storage (TOS) 7(4), 14 (2012)Google Scholar
- 3.Rivest, R.: The MD5 message-digest algorithm. RFC Editor (1992)Google Scholar
- 4.Eastlake 3rd, D., Jones, P.: US secure hash algorithm 1 (SHA1) (2001)Google Scholar
- 5.Manogar, E., Abirami, S.: A study on data deduplication techniques for optimized storage. In: 2014 Sixth International Conference on Advanced Computing (ICoAC), pp. 161–166. IEEE (2014)Google Scholar
- 7.Kruus, E., Ungureanu, C., Dubnicki, C.: Bimodal content defined chunking for backup streams. In: FAST, pp. 239–252 (2010)Google Scholar
- 8.Manogar, E., Abirami, S.: A study on data deduplication techniques for optimized storage. In: 2014 Sixth International Conference on Advanced Computing (ICoAC), pp. 161–166. IEEE (2014)Google Scholar