Compressing unstructured mesh data from simulations using machine learning

  • Chandrika KamathEmail author


The amount of data output from a computer simulation has grown to terabytes and petabytes as increasingly complex simulations are being run on massively parallel systems. As we approach exaflop computing in the next decade, it is expected that the I/O subsystem will not be able to write out these large volumes of data. In this paper, we explore the use of machine learning to compress the data before it is written out. Despite the computational constraints that limit us to using very simple learning algorithms, our results show that machine learning is a viable option for compressing unstructured data. We demonstrate that by simply using a better sampling algorithm to generate the training set, we can obtain more accurate results compared to random sampling, but at no extra cost. Further, by carefully selecting and incorporating points with high prediction error, we can improve reconstruction accuracy without sacrificing the compression rate.


Regression Compression Computer simulations Mesh data 



I thank the reviewers of both the original DSAA’2017 paper, and this extended version, for their careful review and thoughtful suggestions for improvements. I also thank Prof. Zhihong Lin, from UC Irvine, for providing access to the data generated as part of the GSEP SciDAC project. This work was funded by the ASCR Program (Dr. Lucille Nowell, Program Manager) at the Office of Science, US Department of Energy. LLNL-JRNL-750460 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Compliance with ethical standards

Conflict of interest

The author states that there is no conflict of interest.


  1. 1.
    Atkeson, C., Schaal, S.A., Moore, A.W.: Locally weighted learning. AI Rev. 11, 75–133 (1997)Google Scholar
  2. 2.
    Bridson, R.: Fast Poisson disk sampling in arbitrary dimensions. In: ACM SIGGRAPH 2007 Sketches, SIGGRAPH ’07. ACM, New York (2007).
  3. 3.
    Chen, Z., Son, S.W., Hendrix, W., Agrawal, A., Liao, W.k., Choudhary, A.: NUMARCK: machine learning algorithm for resiliency and checkpointing. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’14, pp. 733–744. IEEE Press, Piscataway (2014).
  4. 4.
    Cheng, L., Vishwanathan, S.V.N.: Learning to compress images and videos. In: Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pp. 161–168. ACM, New York (2007).
  5. 5.
    Childs, H., et al.: In situ processing. In: Bethel, E.W., Childs, H., Hansen, C. (eds.) High Performance Visualization-Enabling Extreme-Scale Scientific Insight, pp. 171–198. CRC Press/Francis-Taylor Group, Boca Raton (2012)Google Scholar
  6. 6.
    Di, S., Cappello, F.: Fast error-bounded lossy HPC data compression with SZ. In: Proceedings of the International Parallel and Distributed Processing Symposium, pp. 730–739. IEEE (2016)Google Scholar
  7. 7.
    Fan, Y.J., Kamath, C.: A comparison of compressed sensing and sparse recovery algorithms applied to simulation data. Stat. Optim. Inf. Comput. 4(3), 194–213 (2016)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Iverson, J., Kamath, C., Karypis, G.: Fast and effective lossy compression algorithms for scientific datasets. In: Proceedings of the 18th International Conference on Parallel Processing, Euro-Par’12, Berlin, pp. 843–856 (2012)Google Scholar
  9. 9.
    Kamath, C.: Learning to compress unstructured mesh data from simulations. In: 2017 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, October 19–21, 2017, pp. 621–630 (2017)Google Scholar
  10. 10.
    Lakshminarasimhan, S., Shah, N., Ethier, S., Ku, S.H., Chang, C.S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Isabela for effective in situ compression of scientific data. Concurr. Comput. Pract. Exp. 25(4), 524–540 (2013). CrossRefGoogle Scholar
  11. 11.
    Lin, Z., Hahm, T.S., Lee, W.W., Tang, W.M., White, R.B.: Turbulent transport reduction by zonal flows: massively parallel simulations. Science 281, 1835 (1998)CrossRefGoogle Scholar
  12. 12.
    Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014). CrossRefGoogle Scholar
  13. 13.
    Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Trans. Vis. Comput. Graph. 12(5), 1245–1250 (2006)CrossRefGoogle Scholar
  14. 14.
    Mitchell, D.P.: Spectrally optimal sampling for distribution ray tracing. Comput. Graph. 25(4), 157–164 (1991)CrossRefGoogle Scholar
  15. 15.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
  16. 16.
    Sakai, R., Sasaki, D., Obayashi, S., Nakahashi, K.: Wavelet-based data compression for flow simulation on block-structured cartesian mesh. Int. J. Numer. Methods Fluids 73(5), 462–476 (2013). CrossRefGoogle Scholar
  17. 17.
    Salloum, M., Fabian, N., Hensinger, D.M., Templeton, J.A.: Compressed sensing and reconstruction of unstructured mesh datasets. arXiv:1508.06314 (2015)
  18. 18.
    Shiflet, A.B., Shiflet, G.W.: Introduction to Computational Science: Modeling and Simulation for the Sciences. Princeton University Press, Princeton (2006)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Lawrence Livermore National LaboratoryLivermoreUSA

Personalised recommendations