Toward Decoupling the Selection of Compression Algorithms from Quality Constraints

Kunkel, Julian; Novikova, Anastasiia; Betke, Eugen; Schaare, Armin

doi:10.1007/978-3-319-67630-2_1

Toward Decoupling the Selection of Compression Algorithms from Quality Constraints

Julian Kunkel¹⁷,
Anastasiia Novikova¹⁸,
Eugen Betke¹⁷ &
…
Armin Schaare¹⁸

Conference paper
First Online: 20 October 2017

1782 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10524))

Abstract

Data intense scientific domains use data compression to reduce the storage space needed. Lossless data compression preserves the original information accurately but on the domain of climate data usually yields a compression factor of only 2:1. Lossy data compression can achieve much higher compression rates depending on the tolerable error/precision needed. Therefore, the field of lossy compression is still subject to active research. From the perspective of a scientist, the compression algorithm does not matter but the qualitative information about the implied loss of precision of data is a concern.

With the Scientific Compression Library (SCIL), we are developing a meta-compressor that allows users to set various quantities that define the acceptable error and the expected performance behavior. The ongoing work a preliminary stage for the design of an automatic compression algorithm selector. The task of this missing key component is the construction of appropriate chains of algorithms to yield the users requirements. This approach is a crucial step towards a scientifically safe use of much-needed lossy data compression, because it disentangles the tasks of determining scientific ground characteristics of tolerable noise, from the task of determining an optimal compression strategy given target noise levels and constraints. Future algorithms are used without change in the application code, once they are integrated into SCIL.

In this paper, we describe the user interfaces and quantities, two compression algorithms and evaluate SCIL’s ability for compressing climate data. This will show that the novel algorithms are competitive with state-of-the-art compressors ZFP and SZ and illustrate that the best algorithm depends on user settings and data properties.

This is a preview of subscription content, log in via an institution.

Notes

1.
We define compression ratio as \(r = \frac{\text {size compressed}}{\text {size original}}\); inverse is the compr. factor.
2.
The current version of the library is publicly available under LGPL license:
https://github.com/JulianKunkel/scil.
3.
https://github.com/JulianKunkel/statistical-file-scanner.
4.
The implementation for the automatic algorithm selection is ongoing effort and not the focus of this paper. SCIL will utilize a model for performance and compression ratio for the different algorithms, data properties and user settings.
5.
The versions used are SZ from Mar 5 2017 (git hash e1bf8b), zfp 0.5.0, LZ4 (May 1 2017, a8dd86).
6.
This applies first the Sigbits algorithm and then the lossless LZ4 compression.
7.
This is done to allow comparison across variables regardless of their min/max. In practice, a scientist would set the reltol or define the abstol depending on the variable.
8.
Even when we added the number of bits necessary for encoding the mantissa to ZFP.

References

Hubbe, N., Kunkel, J.: Reducing the HPC-Datastorage Footprint with MAFISC - Multidimensional Adaptive Filtering Improved Scientific data Compression. Computer Science - Research and Development, pp. 231–239 (2013)
Google Scholar
Kunkel, J.: Analyzing Data Properties using Statistical Sampling Techniques - Illustrated on Scientific File Formats and Compression Features. In Taufer, M., Mohr, B., Kunkel, J., eds.: High Performance Computing: ISC High Performance 2016 International Workshops, ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P3MA, VHPC, WOPSSS, 130–141. Number 9945 2016 in Lecture Notes in Computer Science. Springer, Heidelberg (2016)
Google Scholar
LZ77. https://cs.stanford.edu/people/eroberts/courses/soco/projects/data-compression/lossless/lz77/example.htm. Accessed 04 Oct 2016
DEFLATE algorithm. https://en.wikipedia.org/wiki/DEFLATE. Accessed 04 Oct 2016
Huffman coding. A Method for the Construction of Minimum-Redundancy Codes. Accessed 04 Oct 2016
Google Scholar
GZIP algorithm. http://www.gzip.org/algorithm.txt. Accessed 04 Oct 2016
Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Visual Comput. Graphics 12(5), 1245–1250 (2006)
Article Google Scholar
Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ (2015)
Google Scholar
Hübbe, N., Wegener, A., Kunkel, J.M., Ling, Y., Ludwig, T.: Evaluating lossy compression on climate data. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 343–356. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38750-0_26
Chapter Google Scholar
Bicer, T., Agrawal, G.: A compression framework for multidimensional scientific datasets. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pp. 2250–2253 (2013)
Google Scholar
Laney, D., Langer, S., Weber, C., Lindstrom, P., Wegener, A.: Assessing the effects of data compression in simulations using physically motivated metrics. Super Computing (2013)
Google Scholar
Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Compressing the incompressible with isabela: in-situ reduction of spatio-temporal data. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 366–379. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23400-2_34
Chapter Google Scholar
Iverson, J., Kamath, C., Karypis, G.: Fast and effective lossy compression algorithms for scientific datasets. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 843–856. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32820-6_83
Chapter Google Scholar
Gomez, L.A.B., Cappello, F.: Improving floating point compression through binary masks. In: 2013 IEEE International Conference on Big Data (2013)
Google Scholar
Lindstrom, P.: Fixed-Rate Compressed Floating-Point Arrays. IEEE Trans. Visualization Comput Graphics 2012 (2014)
Google Scholar
Baker, A.H., et al.: Evaluating lossy data compression on climate simulation data within a large ensemble. Geosci. Model Dev. 9, 4381–4403 (2016)
Google Scholar
OpenSimplex Noise in Java. https://gist.github.com/KdotJPG/b1270127455a94ac5d19. Accessed 05 Feb 2017
Roeckner, E., Bäuml, G., Bonaventura, L., Brokopf, R., Esch, M., Giorgetta, M., Hagemann, S., Kirchner, I., Kornblueh, L., Manzini, E., et al.: The Atmospheric General Circulation Model ECHAM 5. Model description, PART I (2003)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the German Research Foundation (DFG) through the Priority Programme 1648 “Software for Exascale Computing” (SPPEXA) (GZ: LU 1353/11-1).

Author information

Authors and Affiliations

Deutsches Klimarechenzentrum, Hamburg, Germany
Julian Kunkel & Eugen Betke
Universität Hamburg, Hamburg, Germany
Anastasiia Novikova & Armin Schaare

Authors

Julian Kunkel
View author publications
You can also search for this author in PubMed Google Scholar
Anastasiia Novikova
View author publications
You can also search for this author in PubMed Google Scholar
Eugen Betke
View author publications
You can also search for this author in PubMed Google Scholar
Armin Schaare
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Kunkel .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Hamburg, Germany
Julian M. Kunkel
TITECH, Tokyo, Japan
Rio Yokota
Department of Computer Science, University of Delaware, Newark, Delaware, USA
Michela Taufer
Lawrence Berkeley National Laboratory, Berkeley, California, USA
John Shalf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kunkel, J., Novikova, A., Betke, E., Schaare, A. (2017). Toward Decoupling the Selection of Compression Algorithms from Quality Constraints. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-67630-2_1
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67629-6
Online ISBN: 978-3-319-67630-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics