Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets

Tao, Dingwen; Di, Sheng; Chen, Zizhong; Cappello, Franck

doi:10.1007/978-3-319-67630-2_4

Dingwen Tao¹⁷,
Sheng Di¹⁸,
Zizhong Chen¹⁷ &
…
Franck Cappello^18,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10524))

Included in the following conference series:

International Conference on High Performance Computing

1766 Accesses
11 Citations

Abstract

Because of the vast volume of data being produced by today’s scientific simulations, lossy compression allowing user-controlled information loss can significantly reduce the data size and the I/O burden. However, for large-scale cosmology simulation, such as the Hardware/Hybrid Accelerated Cosmology Code (HACC), where memory overhead constraints restrict compression to only one snapshot at a time, the lossy compression ratio is extremely limited because of the fairly low spatial coherence and high irregularity of the data. In this work, we propose a pattern-matching (similarity searching) technique to optimize the prediction accuracy and compression ratio of SZ lossy compressor on the HACC data sets. We evaluate our proposed method with different configurations and compare it with state-of-the-art lossy compressors. Experiments show that our proposed optimization approach can improve the prediction accuracy and reduce the compressed size of quantization codes compared with SZ. We present several lessons useful for future research involving pattern-matching techniques for lossy compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)
Article MathSciNet MATH Google Scholar
Baker, A.H., Xu, H., Dennis, J.M., Levy, M.N., Nychka, D., Mickelson, S.A., Edwards, J., Vertenstein, M., Wegener, A.: A methodology for evaluating the impact of data compression on climate simulation data. In: HPDC 2014, pp. 203–214 (2014)
Google Scholar
Bernholdt, D., Bharathi, S., Brown, D., Chanchio, K., Chen, M., Chervenak, A., Cinquini, L., Drach, B., Foster, I., Fox, P., et al.: The earth system grid: supporting the next generation of climate modeling research. Proc. IEEE 93(3), 485–495 (2005)
Article Google Scholar
Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: Proceedings of the 15th International Conference on Data Engineering, pp. 126–133. IEEE (1999)
Google Scholar
Chanussot, J., Lambert, P.: Total ordering based on space filling curves for multivalued morphology. Comput. Imaging Vis. 12, 51–58 (1998)
MATH Google Scholar
Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W., Choudhary, A.N.: NUMARCK: machine learning algorithm for resiliency and checkpointing. In: SC 2014, pp. 733–744 (2014)
Google Scholar
Committee, I.S., et al.: 754–2008 IEEE standard for floating-point arithmetic. IEEE Comput. Soc. Std 2008 (2008)
Google Scholar
Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 36(5), 961–1005 (1990)
Article MathSciNet MATH Google Scholar
Deutsch, L.P.: GZIP file format specification version 4.3 (1996)
Google Scholar
Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, 23–27 May 2016, pp. 730–739 (2016)
Google Scholar
Gleckler, P.J., Durack, P.J., Stouffer, R.J., Johnson, G.C., Forest, C.E.: Industrial-era global ocean heat uptake doubles in recent decades. Nat. Clim. Chang. (2016)
Google Scholar
Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T., et al.: Hacc: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. 42, 49–65 (2016)
Article Google Scholar
Huffman, D.A., et al.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)
Article MATH Google Scholar
Kumar, A., Zhu, X., Tu, Y.-C., Pandit, S.: Compression in molecular simulation datasets. In: Sun, C., Fang, F., Zhou, Z.-H., Yang, W., Liu, Z.-Y. (eds.) IScIDE 2013. LNCS, vol. 8261, pp. 22–29. Springer, Heidelberg (2013). doi:10.1007/978-3-642-42057-3_4
Chapter Google Scholar
Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S., Chang, C., Klasky, S., Latham, R., Ross, R.B., Samatova, N.F.: ISABELA for effective in situ compression of scientific data. Concurr. Comput. Pract. Exp. 25(4), 524–540 (2013)
Article Google Scholar
Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)
Article Google Scholar
Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. TVCG 12(5), 1245–1250 (2006)
Google Scholar
Meyer, T., Ferrer-Costa, C., Pérez, A., Rueda, M., Bidon-Chanal, A., Luque, F.J., Laughton, C., Orozco, M.: Essential dynamics: a tool for efficient trajectory compression and management. J. Chem. Theory Comput. 2(2), 251–258 (2006)
Article Google Scholar
Omeltchenko, A., Campbell, T.J., Kalia, R.K., Liu, X., Nakano, A., Vashishta, P.: Scalable i/o of large-scale molecular dynamics simulations: A data-compression algorithm. Comput. Phys. Commun. 131(1), 78–85 (2000)
Article MATH Google Scholar
Ratanaworabhan, P., Ke, J., Burtscher, M.: Fast lossless compression of scientific floating-point data. In: Proceedings of the Data Compression Conference, DCC 2006, pp. 133–142. IEEE (2006)
Google Scholar
Sasaki, N., Sato, K., Endo, T., Matsuoka, S.: Exploration of lossy compression for application-level checkpoint/restart. In: 2015 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 914–922. IEEE (2015)
Google Scholar
Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, Florida, USA, 29 May–2 June, 2017, pp. 1129–1139 (2017)
Google Scholar
Yang, D.Y., Grama, A., Sarin, V.: Bounded-error compression of particle data from hierarchical approximate methods. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC 1999. ACM, New York, NY, USA (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Riverside, CA, USA
Dingwen Tao & Zizhong Chen
Argonne National Laboratory, Lemont, IL, USA
Sheng Di & Franck Cappello
University of Illinois at Urbana-Champaign, Champaign, IL, USA
Franck Cappello

Authors

Dingwen Tao
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Di
View author publications
You can also search for this author in PubMed Google Scholar
Zizhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Franck Cappello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dingwen Tao .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Hamburg, Germany
Julian M. Kunkel
TITECH, Tokyo, Japan
Rio Yokota
Department of Computer Science, University of Delaware, Newark, Delaware, USA
Michela Taufer
Lawrence Berkeley National Laboratory, Berkeley, California, USA
John Shalf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tao, D., Di, S., Chen, Z., Cappello, F. (2017). Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-67630-2_4
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67629-6
Online ISBN: 978-3-319-67630-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics