Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Deduplication

  • Kazuo Goda
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1323

Synonyms

Dedup; Single instancing

Definition

The term Deduplication refers to the task of eliminating redundant data in data storage so as to reduce the required capacity. The benefits of Deduplication include saving rack space and power consumption of the data storage. Deduplication is often implemented in archival storage systems such as content-addressable storage (CAS) systems and virtual tape libraries (VTLs). The term Deduplication is sometimes shortened to Dedup.

Key Points

An address mapping table and a hash index are often used for implementing Deduplication. The address mapping table converts a logical address to a physical location for each block, and the hash index converts a hash value to a physical location for each block. When a block X is to be written to the data storage, a hash value is calculated from the content of X, and then the hash index is searched. If the same hash value is not found in the hash index, a new block is allocated in the storage space, X is...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Diligent Technologies. Hyper Factor: a breakthrough in data reduction technology. White Paper. 2008.Google Scholar
  2. 2.
    Patterson H. Dedupe-centric storage for general applications. White Paper, Data Domain. 2008.Google Scholar
  3. 3.
    Quinlan S, Dorward S. Venti: a new approach to archival storage. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies; 2002. p. 89–102.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The University of TokyoTokyoJapan

Section editors and affiliations

  • Masaru Kitsuregawa
    • 1
  1. 1.Inst. of Industrial ScienceUniv. of TokyoTokyoJapan