Skip to main content

Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10524))

Included in the following conference series:

Abstract

Because of the vast volume of data being produced by today’s scientific simulations, lossy compression allowing user-controlled information loss can significantly reduce the data size and the I/O burden. However, for large-scale cosmology simulation, such as the Hardware/Hybrid Accelerated Cosmology Code (HACC), where memory overhead constraints restrict compression to only one snapshot at a time, the lossy compression ratio is extremely limited because of the fairly low spatial coherence and high irregularity of the data. In this work, we propose a pattern-matching (similarity searching) technique to optimize the prediction accuracy and compression ratio of SZ lossy compressor on the HACC data sets. We evaluate our proposed method with different configurations and compare it with state-of-the-art lossy compressors. Experiments show that our proposed optimization approach can improve the prediction accuracy and reduce the compressed size of quantization codes compared with SZ. We present several lessons useful for future research involving pattern-matching techniques for lossy compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  2. Baker, A.H., Xu, H., Dennis, J.M., Levy, M.N., Nychka, D., Mickelson, S.A., Edwards, J., Vertenstein, M., Wegener, A.: A methodology for evaluating the impact of data compression on climate simulation data. In: HPDC 2014, pp. 203–214 (2014)

    Google Scholar 

  3. Bernholdt, D., Bharathi, S., Brown, D., Chanchio, K., Chen, M., Chervenak, A., Cinquini, L., Drach, B., Foster, I., Fox, P., et al.: The earth system grid: supporting the next generation of climate modeling research. Proc. IEEE 93(3), 485–495 (2005)

    Article  Google Scholar 

  4. Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: Proceedings of the 15th International Conference on Data Engineering, pp. 126–133. IEEE (1999)

    Google Scholar 

  5. Chanussot, J., Lambert, P.: Total ordering based on space filling curves for multivalued morphology. Comput. Imaging Vis. 12, 51–58 (1998)

    MATH  Google Scholar 

  6. Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W., Choudhary, A.N.: NUMARCK: machine learning algorithm for resiliency and checkpointing. In: SC 2014, pp. 733–744 (2014)

    Google Scholar 

  7. Committee, I.S., et al.: 754–2008 IEEE standard for floating-point arithmetic. IEEE Comput. Soc. Std 2008 (2008)

    Google Scholar 

  8. Daubechies, I.: The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 36(5), 961–1005 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  9. Deutsch, L.P.: GZIP file format specification version 4.3 (1996)

    Google Scholar 

  10. Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, 23–27 May 2016, pp. 730–739 (2016)

    Google Scholar 

  11. Gleckler, P.J., Durack, P.J., Stouffer, R.J., Johnson, G.C., Forest, C.E.: Industrial-era global ocean heat uptake doubles in recent decades. Nat. Clim. Chang. (2016)

    Google Scholar 

  12. Habib, S., Pope, A., Finkel, H., Frontiere, N., Heitmann, K., Daniel, D., Fasel, P., Morozov, V., Zagaris, G., Peterka, T., et al.: Hacc: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. 42, 49–65 (2016)

    Article  Google Scholar 

  13. Huffman, D.A., et al.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)

    Article  MATH  Google Scholar 

  14. Kumar, A., Zhu, X., Tu, Y.-C., Pandit, S.: Compression in molecular simulation datasets. In: Sun, C., Fang, F., Zhou, Z.-H., Yang, W., Liu, Z.-Y. (eds.) IScIDE 2013. LNCS, vol. 8261, pp. 22–29. Springer, Heidelberg (2013). doi:10.1007/978-3-642-42057-3_4

    Chapter  Google Scholar 

  15. Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S., Chang, C., Klasky, S., Latham, R., Ross, R.B., Samatova, N.F.: ISABELA for effective in situ compression of scientific data. Concurr. Comput. Pract. Exp. 25(4), 524–540 (2013)

    Article  Google Scholar 

  16. Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)

    Article  Google Scholar 

  17. Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. TVCG 12(5), 1245–1250 (2006)

    Google Scholar 

  18. Meyer, T., Ferrer-Costa, C., Pérez, A., Rueda, M., Bidon-Chanal, A., Luque, F.J., Laughton, C., Orozco, M.: Essential dynamics: a tool for efficient trajectory compression and management. J. Chem. Theory Comput. 2(2), 251–258 (2006)

    Article  Google Scholar 

  19. Omeltchenko, A., Campbell, T.J., Kalia, R.K., Liu, X., Nakano, A., Vashishta, P.: Scalable i/o of large-scale molecular dynamics simulations: A data-compression algorithm. Comput. Phys. Commun. 131(1), 78–85 (2000)

    Article  MATH  Google Scholar 

  20. Ratanaworabhan, P., Ke, J., Burtscher, M.: Fast lossless compression of scientific floating-point data. In: Proceedings of the Data Compression Conference, DCC 2006, pp. 133–142. IEEE (2006)

    Google Scholar 

  21. Sasaki, N., Sato, K., Endo, T., Matsuoka, S.: Exploration of lossy compression for application-level checkpoint/restart. In: 2015 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 914–922. IEEE (2015)

    Google Scholar 

  22. Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, Florida, USA, 29 May–2 June, 2017, pp. 1129–1139 (2017)

    Google Scholar 

  23. Yang, D.Y., Grama, A., Sarin, V.: Bounded-error compression of particle data from hierarchical approximate methods. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC 1999. ACM, New York, NY, USA (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dingwen Tao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Tao, D., Di, S., Chen, Z., Cappello, F. (2017). Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67630-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67629-6

  • Online ISBN: 978-3-319-67630-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics