Abstract
Deduplication of primary storage volumes in a cloud computing environment is increasingly desirable, as the resulting space savings contribute to the cost effectiveness of a large scale multi-tenant infrastructure. However, traditional archival and backup deduplication systems impose prohibitive overhead for latency-sensitive applications deployed at these infrastructures while, current primary deduplication systems rely on special cluster filesystems, centralized components, or restrictive workload assumptions.
We present DEDIS, a fully-distributed and dependable system that performs exact and cluster-wide background deduplication of primary storage. DEDIS does not depend on data locality and works on top of any unsophisticated storage backend, centralized or distributed, that exports a basic shared block device interface. The evaluation of an open-source prototype shows that DEDIS scales out and adds negligible overhead even when deduplication and intensive storage I/O run simultaneously.
Chapter PDF
Similar content being viewed by others
References
Bolosky, W.J., Corbin, S., Goebel, D., Douceur, J.R.: Single Instance Storage in Windows 2000. In: Proceedings of USENIX Windows System Symposium, WSS (2000)
Chute, C., Manfrediz, A., Minton, S., Reinsel, D., Schlichting, W., Toncheva, A.: The Diverse and Exploding Digital Universe: An updated forecast of worldwide information growth through 2011. IDC White Paper - sponsored by EMC (2008), http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf
Citrix. Blktap page (January 2014), http://wiki.xen.org/wiki/Blktap2
Clements, A.T., Ahmad, I., Vilayannur, M., Li, J.: Decentralized Deduplication in SAN Cluster File Systems. In: Proceedings of USENIX Annual Technical Conference, ATC (2009)
El-Shimi, A., Kalach, R., Kumar, A., Oltean, A., Li, J., Sengupta, S.: Primary Data Deduplication Large Scale Study and System Design. In: Proceedings of USENIX Annual Technical Conference, ATC (2012)
Hewlett-Packard Development Company, L.P. Complete storage and data protection architecture for vmware vsphere. White Paper (2011)
Hong, B., Long, D.D.E.: Duplicate Data Elimination in a San File System. In: Proceedings of Conference on Mass Storage Systems, MSST (2004)
Lessfs. Lessfs page (January 2014), http://www.lessfs.com/wordpress/
Liguori, A., Van Hensbergen, E.: Experiences with Content Addressable Storage and Virtual Disks. In: Proceedings of USENIX Workshop on I/O Virtualization, WIOV (2008)
Meyer, D.T., Aggarwal, G., Cully, B., Lefebvre, G., Feeley, M.J., Hutchinson, N.C., Warfield, A.: Parallax: Virtual Disks for Virtual Machines. In: Proceedings of European Conference on Computer Systems, EuroSys (2008)
Meyer, D.T., Bolosky, W.J.: A Study of Practical Deduplication. In: Proceedings of USENIX Conference on File and Storage Technologies, FAST (2011)
Ng, C.-H., Ma, M., Wong, T.-Y., Lee, P.P.C., Lui, J.C.S.: Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud. In: Kon, F., Kermarrec, A.-M. (eds.) Middleware 2011. LNCS, vol. 7049, pp. 81–100. Springer, Heidelberg (2011)
Opendedup. Opendedup page (January 2014), http://opendedup.org
OpenSolaris. Zfs documentation (January 2014), http://www.freebsd.org/doc/en/books/handbook/filesystems-zfs.html
Paulo, J., Pereira, J.: Model checking a decentralized storage deduplication protocol. In: Fast Abstract in Latin-American Symposium on Dependable Computing (2011)
Paulo, J., Reis, P., Pereira, J., Sousa, A.: Towards an Accurate Evaluation of Deduplicated Storage Systems. International Journal of Computer Systems Science and Engineering 29 (2013)
Quinlan, S., Dorward, S.: Venti: A New Approach to Archival Storage. In: Proceedings of USENIX Conference on File and Storage Technologies, FAST (2002)
Srinivasan, K., Bisson, T., Goodson, G., Voruganti, K.: iDedup: Latency-aware, Inline Data Deduplication for Primary Storage. In: Proceedings of USENIX Conference on File and Storage Technologies, FAST (2012)
Tsuchiya, Y., Watanabe, T.: DBLK: Deduplication for Primary Block Storage. In: Proceedings of Conference on Mass Storage Systems, MSST (2011)
Tsuyoshi, O., Kazutaka, M.: Accord page (January 2014), http://www.osrg.net/accord/
Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., Bohra, A.: HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System. In: Proceedings of USENIX Conference on File and Storage Technologies, FAST (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Paulo, J., Pereira, J. (2014). Distributed Exact Deduplication for Primary Storage Infrastructures. In: Magoutis, K., Pietzuch, P. (eds) Distributed Applications and Interoperable Systems. DAIS 2014. Lecture Notes in Computer Science(), vol 8460. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43352-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-43352-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43351-5
Online ISBN: 978-3-662-43352-2
eBook Packages: Computer ScienceComputer Science (R0)