ALACRITY: Analytics-Driven Lossless Data Compression for Rapid In-Situ Indexing, Storing, and Querying

Jenkins, John; Arkatkar, Isha; Lakshminarasimhan, Sriram; Boyuka, David A.; Schendel, Eric R.; Shah, Neil; Ethier, Stephane; Chang, Choong-Seock; Chen, Jackie; Kolla, Hemanth; Klasky, Scott; Ross, Robert; Samatova, Nagiza F.

doi:10.1007/978-3-642-41221-9_4

John Jenkins^21,22,
Isha Arkatkar^21,22,
Sriram Lakshminarasimhan^21,22,
David A. Boyuka II^21,22,
Eric R. Schendel^21,22,
Neil Shah^21,22,
Stephane Ethier²³,
Choong-Seock Chang²³,
Jackie Chen²⁴,
Hemanth Kolla²⁴,
Scott Klasky²²,
Robert Ross²⁵ &
…
Nagiza F. Samatova^21,22

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8220))

528 Accesses
11 Citations

Abstract

High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e.g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

IEEE standard for floating-point arithmetic. IEEE Standard 754-2008 (2008)
Google Scholar
Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD 2006, pp. 671–682. ACM, New York (2006)
Chapter Google Scholar
Anh, V.N., Moffat, A.: Index compression using fixed binary codewords. In: Proceedings of the 15th Australasian Database Conference, ADC 2004, vol. 27, pp. 61–67. Australian Computer Society, Inc., Darlinghurst (2004)
Google Scholar
Antoshenkov, G.: Byte-aligned bitmap compression. In: Data Compression Conference, p. 476 (1995)
Google Scholar
Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: An adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. The Astrophysical Journal Supplement Series 131, 273–334 (2000)
Article Google Scholar
Burtscher, M., Ratanaworabhan, P.: High throughput compression of double-precision floating-point data. In: IEEE Data Compression Conference, pp. 293–302 (2007)
Google Scholar
Burtscher, M., Ratanaworabhan, P.: FPC: A high-speed compressor for double-precision floating-point data. IEEE Transactions on Computers 58, 18–31 (2009)
Article MathSciNet Google Scholar
Chen, J.H., Choudhary, A., Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W., Ma, K., Mellor-Crummey, J., Podhorszki, N., Sankaran, R., Shende, S., Yoo, C.: Terascale direct numerical simulations of turbulent combustion using S3D. Comp. Sci. and Discovery 2(1)
Google Scholar
Comer, D.: The ubiquitous B-Tree. ACM Comput. Surv. 11, 121–137 (1979)
Article MATH Google Scholar
Goeman, B., Vandierendonck, H., Bosschere, K.D.: Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In: Seventh International Symposium on High Performance Computer Architecture, pp. 207–216 (2001)
Google Scholar
Graefe, G., Shapiro, L.: Data compression and database performance. In: Proceedings of the 1991 Symposium on Applied Computing, pp. 22–27 (April 1991)
Google Scholar
Ibarria, L., Lindstrom, P., Rossignac, J., Szymczak, A.: Out-of-core compression and decompression of large n-dimensional scalar fields. Computer Graphics Forum 22, 343–348 (2003)
Article Google Scholar
Isenburg, M., Lindstrom, P., Snoeyink, J.: Lossless compression of predicted floating-point geometry. Computer-Aided Design 37(8), 869–877 (2005); CAD 2004 Special Issue: Modelling and Geometry Representations for CAD
Google Scholar
Iyer, B.R., Wilhite, D.: Data compression support in databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 695–704. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Jenkins, J., et al.: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In: Liddle, S.W., Schewe, K.-D., Tjoa, A.M., Zhou, X. (eds.) DEXA 2012, Part II. LNCS, vol. 7447, pp. 16–30. Springer, Heidelberg (2012)
Chapter Google Scholar
Ku, S., Chang, C., Diamond, P.: Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic Tokamak geometry. Nuclear Fusion 49(11), 115021 (2009)
Article Google Scholar
Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics 12, 1245–1250 (2006)
Article Google Scholar
Schendel, E.R., Jin, Y., Shah, N., Chen, J., Chang, C., Ku, S.-H., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: ISOBAR preconditioner for effective and high-throughput lossless data compression. In: Proceedings of the 28th International Conference on Data Engineering, ICDE 2012. IEEE (2012)
Google Scholar
Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst. 32 (2007)
Google Scholar
Wang, W.X., Lin, Z., Tang, W.M., Lee, W.W., Ethier, S., Lewandowski, J.L.V., Rewoldt, G., Hahm, T.S., Manickam, J.: Gyro-kinetic simulation of global turbulent transport properties in Tokamak experiments. Physics of Plasmas 13(9), 092505 (2006)
Google Scholar
Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)
Article Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)
Google Scholar
Wu, K.: Fastbit: an efficient indexing technology for accelerating data-intensive science. Journal of Physics: Conference Series 16, 556 (2005)
Article Google Scholar
Wu, K., Ahern, S., Bethel, E.W., Chen, J., Childs, H., Cormier-Michel, E., Geddes, C., Gu, J., Hagen, H., Hamann, B., Koegler, W., Lauret, J., Meredith, J., Messmer, P., Otoo, E., Perevoztchikov, V., Poskanzer, A., Prabhat, Rubel, O., Shoshani, A., Sim, A., Stockinger, K., Weber, G., Zhang, W.-M.: FastBit: interactively searching massive data. Journal of Physics: Conference Series 180(1), 012053 (2009)
Google Scholar
Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proc. of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30, pp. 24–35 (2004)
Google Scholar
Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31, 1–38 (2006)
Google Scholar
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 401–410. ACM, New York (2009)
Chapter Google Scholar
Yiannakis, S., Smith, J.E.: The predictability of data values. In: Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 30, pp. 248–258. IEEE Computer Society, Washington, DC (1997)
Google Scholar
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2) (July 2006)
Google Scholar
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar ram-cpu cache compression. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, pp. 59–71. IEEE Computer Society, Washington, DC (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

North Carolina State University, NC, 27695, USA
John Jenkins, Isha Arkatkar, Sriram Lakshminarasimhan, David A. Boyuka II, Eric R. Schendel, Neil Shah & Nagiza F. Samatova
Oak Ridge National Laboratory, TN, 37831, USA
John Jenkins, Isha Arkatkar, Sriram Lakshminarasimhan, David A. Boyuka II, Eric R. Schendel, Neil Shah, Scott Klasky & Nagiza F. Samatova
Princeton Plasma Physics Laboratory, Princeton, NJ, 08543, USA
Stephane Ethier & Choong-Seock Chang
Sandia National Laboratory, Livermore, CA, 94551, USA
Jackie Chen & Hemanth Kolla
Argonne National Laboratory, Argonne, IL, 60439, USA
Robert Ross

Authors

John Jenkins
View author publications
You can also search for this author in PubMed Google Scholar
Isha Arkatkar
View author publications
You can also search for this author in PubMed Google Scholar
Sriram Lakshminarasimhan
View author publications
You can also search for this author in PubMed Google Scholar
David A. Boyuka II
View author publications
You can also search for this author in PubMed Google Scholar
Eric R. Schendel
View author publications
You can also search for this author in PubMed Google Scholar
Neil Shah
View author publications
You can also search for this author in PubMed Google Scholar
Stephane Ethier
View author publications
You can also search for this author in PubMed Google Scholar
Choong-Seock Chang
View author publications
You can also search for this author in PubMed Google Scholar
Jackie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hemanth Kolla
View author publications
You can also search for this author in PubMed Google Scholar
Scott Klasky
View author publications
You can also search for this author in PubMed Google Scholar
Robert Ross
View author publications
You can also search for this author in PubMed Google Scholar
Nagiza F. Samatova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, 118 route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
FAW, University of Linz, Altenbergerstraße 69, 4040, Linz, Austria
Josef Küng & Roland Wagner &
Brigham Young University, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Centre Hagenberg, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
The Universtiy of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jenkins, J. et al. (2013). ALACRITY: Analytics-Driven Lossless Data Compression for Rapid In-Situ Indexing, Storing, and Querying. In: Hameurlain, A., Küng, J., Wagner, R., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems X. Lecture Notes in Computer Science, vol 8220. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41221-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-41221-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41220-2
Online ISBN: 978-3-642-41221-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics