DA Placement: A Dual-Aware Data Placement in a Deduplicated and Erasure-Coded Storage System

Deng, Mingzhu; Zhao, Ming; Liu, Fang; Chen, Zhiguang; Xiao, Nong

doi:10.1007/978-3-030-05051-1_25

Mingzhu Deng^16,17,
Ming Zhao¹⁷,
Fang Liu¹⁸,
Zhiguang Chen^16,18 &
…
Nong Xiao^16,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11334))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1526 Accesses

Abstract

Simultaneously incorporating deduplication as well as erasure coding is preferred for modern storage systems for the enhanced storage efficiency and economical data reliability. However, simple incorporation suffers from the “read imbalance problem”, in which parallel data accesses are curbed by throttled storage nodes. This problem is due to the uneven data placement in the system, which is unaware of the employment of both deduplication and erasure coding, each of whom alters the order of data if unattended. This paper proposes a systematic design and implementation of a Dual-Aware(DA) placement in a combined storage system to achieve both deduplication-awareness and erasure-coding-awareness at the same time. DA not only records the node number of each unique data to allow for quick references with ease, but also dynamically tracks used nodes for each writes request. In this way, deduplication awareness is formed to skip inconvenient placement locations. Besides, DA serializes the placement of parity blocks with a stripe and across stripes. Such realization of erasure coding awareness ensures the separation of data and parity, as well as maintains data sequentiality at bordering stripes. Additionally, DA manages to extend with further load-balancing through an innovative use of the deduplication level, which intuitively predicts future accesses of a piece of data. In short, DA manages to boost system performance with little memory or computation cost. Extensive experiments using both real-world traces and synthesized workloads, prove DA achieves a better read performance. For example, DA respectively leads an average latency margin of 30.86% and 29.63%, over the baseline rolling placement(BA) and random placement(RA) under CAFTL traces over a default cluster of 12 nodes with RS(8,4).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, F., Luo, T., Zhang, X.: CAFTL: a content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. FAST 11, 77–90 (2011)
Google Scholar
Hong, B., Plantenberg, D., Long, D.D., Sivan-Zimet, M.: Duplicate data elimination in a SAN file system. In: MSST, pp. 301–314 (2004)
Google Scholar
Huang, C., et al.: Erasure coding in windows azure storage. In: Usenix Annual Technical Conference, pp. 15–26. , Boston, MA (2012)
Google Scholar
Jin, K., Miller, E.L.: The effectiveness of deduplication on virtual machine disk images. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, p. 7. ACM (2009)
Google Scholar
Li, W., Jean-Baptise, G., Riveros, J., Narasimhan, G., Zhang, T., Zhao, M.: Cachededup: In-line deduplication for flash caching. In: FAST, pp. 301–314 (2016)
Google Scholar
Li, X., Dong, B., Xiao, L., Ruan, L., Liu, D.: CEFLS: a cost-effective file lookup service in a distributed metadata file system. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 25–32. IEEE Computer Society (2012)
Google Scholar
Li, X., Dong, B., Xiao, L., Ruan, L., Liu, D.: HCCache: a hybrid client-side cache management scheme for i/o-intensive workloads in network-based file systems. In: 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 467–473. IEEE (2012)
Google Scholar
Li, X., Xiao, L., Ke, X., Dong, B., Li, R., Liu, D.: Towards hybrid client-side cache management in network-based file systems. Comput. Sci. Inf. Syst. 11(1), 271–289 (2014)
Article Google Scholar
Liu, N., et al.: On the role of burst buffers in leadership-class storage systems. In: 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11. IEEE (2012)
Google Scholar
Liu, Y., Figueiredo, R., Xu, Y., Zhao, M.: On the design and implementation of a simulator for parallel file system research. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013)
Google Scholar
Meister, D., Brinkmann, A.: Multi-level comparison of data deduplication in a backup scenario. In: Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, p. 8. ACM (2009)
Google Scholar
Ng, C.H., Lee, P.P.: Revdedup: a reverse deduplication storage system optimized for reads to latest backups. In: Proceedings of the 4th Asia-Pacific Workshop on Systems, p. 15. ACM (2013)
Google Scholar
Plank, J.S.: Erasure codes for storage systems: a brief primer. Usenix Mag. 38(6), 44–50 (2013)
MathSciNet Google Scholar
Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. FAST 2, 89–101 (2002)
Google Scholar
Rashmi, K., Chowdhury, M., Kosaian, J., Stoica, I., Ramchandran, K.: EC-Cache: load-balanced, low-latency cluster caching with online erasure coding. In: OSDI, pp. 401–417 (2016)
Google Scholar
Rashmi, K., Shah, N.B., Gu, D., Kuang, H., Borthakur, D., Ramchandran, K.: A Hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers. ACM SIGCOMM Comput. Commun. Rev. 44(4), 331–342 (2015)
Article Google Scholar
Rivest, R.: The MD5 message-digest algorithm (1992)
Google Scholar
Secure Hash Standard: Federal information processing standards publication 180-1 (1995)
Google Scholar
Xu, M., Zhu, Y., Lee, P.P., Xu, Y.: Even data placement for load balance in reliable distributed deduplication storage systems. In: 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS), pp. 349–358. IEEE (2015)
Google Scholar

Download references

Acknowledgment

We would like to greatly appreciate the anonymous reviewers for their insightful comments. This work is supported by the National Natural Science Foundation of China under Grant Nos. 61433019, U1435217, and the National High Technology Research and Development Program of China under Grant No. 2016YFB1000302.

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, China
Mingzhu Deng, Zhiguang Chen & Nong Xiao
Arizona State University, Tempe, USA
Mingzhu Deng & Ming Zhao
School of Data and Computer Science, SUN YAT-SEN University, Guangzhou, China
Fang Liu, Zhiguang Chen & Nong Xiao

Authors

Mingzhu Deng
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhiguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Nong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingzhu Deng .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, M., Zhao, M., Liu, F., Chen, Z., Xiao, N. (2018). DA Placement: A Dual-Aware Data Placement in a Deduplicated and Erasure-Coded Storage System. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-05051-1_25
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05050-4
Online ISBN: 978-3-030-05051-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics