Delta: Data Reduction for Integrated Application Workflows and Data Storage

Lofstead, Jay; Jean-Baptiste, Gregory; Oldfield, Ron

doi:10.1007/978-3-319-46079-6_11

Jay Lofstead¹⁶,
Gregory Jean-Baptiste¹⁷ &
Ron Oldfield¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

International Conference on High Performance Computing

2354 Accesses
1 Citations

Abstract

Data sizes are growing far faster than storage bandwidth. To address this growing gap, Integrated Application Workflows (IAWs) are being investigated as a potential to replace using a centralized storage array for storing intermediate data. IAWs run multiple simulation workflow components concurrently on an HPC resource connecting these components using compute area resources. These IAWs require high frequency and high volume data transfers between compute nodes and staging area nodes during the lifetime of a large parallel computation. The available network bandwidth between the two areas may not be enough to efficiently support the data movement. As the processing power available to compute resources increases, the requirements for this data transfer will become more difficult to satisfy and perhaps will not be satisfiable at all since network capabilities are not expanding at a comparable rate. It is necessary to reduce the volume of data without reducing the quality of data when it is being processed and analyzed. Delta resolves the issue by addressing the lifetime data transfer operations. Delta removes subsequent identical copies of already transmitted data prior to transfer and restores those pieces once the data has reached the destination using previously transmitted data. Delta is able to identify duplicated information and determine the most space efficient way to represent it. Initial tests show about 50 % reduction in data movement while maintaining the same data quality and transmission frequency. Given the simplicity of the approach and the log-based format employed by ADIOS, the approach can also be used to write less data to the storage array outside of IAW considerations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baker, A.H., Xu, H., Dennis, J.M., Levy, M.N., Nychka, D., Mickelson, S.A., Edwards, J., Vertenstein, M., Wegener, A.: A methodology for evaluating the impact of data compression on climate simulation data. In: The 23rd International Symposium on High-Performance Parallel, pp. 203–214 (2014)
Google Scholar
Bangerth, W., Hartmann, R., Kanschat, G.: deal.II - a general purpose object oriented finite element library. ACM Trans. Math. Softw. 33(4), 24/1–24/27 (2007)
Article MathSciNet Google Scholar
Burns, R.C., Long, D.D.E.: Efficient distributed backup with delta compression. In: Proceedings of the Fifth Workshop on I/O in Parallel and Distributed Systems, IOPADS 1997, New York, NY, USA, pp. 27–36. ACM (1997)
Google Scholar
Housel, B.C., Lindquist, D.B.: Webexpress: a system for optimizing web browsing in a wireless environment. In: Proceedings of the 2nd Annual International Conference on Mobile Computing and Networking, MobiCom 1996, New York, NY, USA, pp. 108–116. ACM (1996)
Google Scholar
Klappenecker, A., May, F.U.: Evolving better wavelet compression schemes. In: Proceedings of Wavelet Applications in Signal and Image Processing III, vol. 1214, pp. 614–622 (1995)
Google Scholar
Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 366–379. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23400-2_34
Chapter Google Scholar
Laros III, J.H., Pedretti, K.T., Kelly, S.M., Shu, W., Vaughan, C.T.: Energy based performance tuning for large scale high performance computing systems. In: Proceedings of the 2012 Symposium on High Performance Computing. Society for Computer Simulation International, p. 6 (2012)
Google Scholar
Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: IPDPS, Rome, Italy (2009)
Google Scholar
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system: research articles. Concurr. Comput. Pract. Exper. 18(10), 1039–1065 (2006)
Article Google Scholar
Malewicz, G., Foster, I., Rosenberg, A., Wilde, M.: A tool for prioritizing DAGMan jobs and its evaluation. In: 2006 15th IEEE International Symposium on High Performance Distributed Computing, pp. 156–168 (2006)
Google Scholar
Manber, U., Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference, pp. 1–10 (1994)
Google Scholar
Mullender, S.J., Leslie, I.M., McAuley, D.: Operating-system support for distributed multimedia. In: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, vol. 1, pp. 209–219 (1994)
Google Scholar
Nicolae, B., Cappello, F.: Ai-ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 155–166. ACM (2013)
Google Scholar
Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995)
Article MATH Google Scholar
Spring, N.T., Wetherall, D.: A protocol-independent technique for eliminating redundant network traffic. ACM SIGCOMM Comput. Commun. Rev. 30(4), 87–95 (2000)
Article Google Scholar
Xia, L., Hale, K.C., Dinda, P.A.: Concord: easily exploiting memory content redundancy through the content-aware service command. In: The 23rd International Symposium on High-Performance Parallel, pp. 25–36 (2014)
Google Scholar

Download references

Acknowledgments

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2014-17090 C.

Author information

Authors and Affiliations

Sandia National Laboratories, Albuquerque, NM, USA
Jay Lofstead & Ron Oldfield
Florida International University, Miami, FL, USA
Gregory Jean-Baptiste

Authors

Jay Lofstead
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Jean-Baptiste
View author publications
You can also search for this author in PubMed Google Scholar
Ron Oldfield
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jay Lofstead .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lofstead, J., Jean-Baptiste, G., Oldfield, R. (2016). Delta: Data Reduction for Integrated Application Workflows and Data Storage. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_11
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics