Advertisement

A Deduplication Algorithm Based on Data Similarity and Delta Encoding

  • Bin Song
  • Limin XiaoEmail author
  • Guangjun Qin
  • Li Ruan
  • Shida Qiu
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 699)

Abstract

Satellite applications such as remote sensing application are overwhelmed with vast quantities of data. Nevertheless, the storage resources in the satellite are so limited that it should be used more efficient. The similarity between the remote sensing data is high, but the dissimilar parts of the data distribute irregularly. When using the traditional deduplication algorithm to split the file into chunks, a large amount of chunks are exactly similar but not the same, which results in the bad effect of data deduplication. We propose a deduplication algorithm based on data similarity and delta encoding to reduce the usage of storage resources. The data similarity analysis can find out the similar data. The delta encoding technology can reduce the usage of storage resources. Through experiments on remote sensing application data, we have achieved deduplication ratios up to 30:1, and analyzed how the chunksize affect the experiment results.

Keywords

Deduplication Similarity Delta encoding Satellite 

Notes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 61370059, the National Natural Science Foundation of China under Grant No. 61232009, Beijing Natural Science Foundation under Grant No. 4152030, the fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2016ZX-13, the Open Research Fund of The Academy of Satellite Application under Grant No. Y20A-E03 and the Open Project Program of National Engineering Research Center for Science & Technology Resources Sharing Service (Beihang University).

References

  1. 1.
    Wang, L., Ma, Y., Zomaya, A.Y., et al.: A parallel file system with application-aware data layout policies for massive remote sensing image processing in digital earth. IEEE Trans. Parallel Distrib. Syst. 26(6), 1497–1508 (2015)CrossRefGoogle Scholar
  2. 2.
    Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. ACM Trans. Storage (TOS) 7(4), 14 (2012)Google Scholar
  3. 3.
    Rivest, R.: The MD5 message-digest algorithm. RFC Editor (1992)Google Scholar
  4. 4.
    Eastlake 3rd, D., Jones, P.: US secure hash algorithm 1 (SHA1) (2001)Google Scholar
  5. 5.
    Manogar, E., Abirami, S.: A study on data deduplication techniques for optimized storage. In: 2014 Sixth International Conference on Advanced Computing (ICoAC), pp. 161–166. IEEE (2014)Google Scholar
  6. 6.
    Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. ACM Trans. Storage 2(4), 424–448 (2006)CrossRefGoogle Scholar
  7. 7.
    Kruus, E., Ungureanu, C., Dubnicki, C.: Bimodal content defined chunking for backup streams. In: FAST, pp. 239–252 (2010)Google Scholar
  8. 8.
    Manogar, E., Abirami, S.: A study on data deduplication techniques for optimized storage. In: 2014 Sixth International Conference on Advanced Computing (ICoAC), pp. 161–166. IEEE (2014)Google Scholar
  9. 9.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefzbMATHGoogle Scholar
  10. 10.
    Broder, A., Mitzenmacher, M.: Network applications of bloom filters: a survey. Internet Math. 1(4), 485–509 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Hunt, J.J., Vo, K.P., Tichy, W.F.: An empirical study of delta algorithms. In: Sommerville, I. (ed.) SCM 1996. LNCS, vol. 1167, pp. 49–66. Springer, Heidelberg (1996). doi: 10.1007/BFb0023080 CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • Bin Song
    • 1
  • Limin Xiao
    • 1
    • 2
    Email author
  • Guangjun Qin
    • 1
    • 2
  • Li Ruan
    • 1
    • 3
  • Shida Qiu
    • 1
    • 3
  1. 1.State Key Laboratory of Software Development Environment, School of Computer Science and EngineeringBeihang UniversityBeijingChina
  2. 2.National Engineering Research Center for Science and Technology Resources Sharing ServiceBeijingChina
  3. 3.Space Star Technology Co., Ltd.BeijingChina

Personalised recommendations