Advertisement

ALACRITY: Analytics-Driven Lossless Data Compression for Rapid In-Situ Indexing, Storing, and Querying

  • John Jenkins
  • Isha Arkatkar
  • Sriram Lakshminarasimhan
  • David A. BoyukaII
  • Eric R. Schendel
  • Neil Shah
  • Stephane Ethier
  • Choong-Seock Chang
  • Jackie Chen
  • Hemanth Kolla
  • Scott Klasky
  • Robert Ross
  • Nagiza F. Samatova
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8220)

Abstract

High-performance computing architectures face nontrivial data processing challenges, as computational and I/O components further diverge in performance trajectories. For scientific data analysis in particular, methods based on generating heavyweight access acceleration structures, e.g. indexes, are becoming less feasible for ever-increasing dataset sizes. We present ALACRITY, demonstrating the effectiveness of a fused data and index encoding of scientific, floating-point data in generating lightweight data structures amenable to common types of queries used in scientific data analysis. We exploit the representation of floating-point values by extracting significant bytes, using the resulting unique values to bin the remaining data along fixed-precision boundaries. To optimize query processing, we use an inverted index, mapping each generated bin to a list of records contained within, allowing us to optimize query processing with attribute range constraints. Overall, the storage footprint for both index and data is shown to be below numerous configurations of bitmap indexing, while matching or outperforming query performance.

Keywords

Compression Ratio Query Processing Range Query Inverted Index Lossless Compression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    IEEE standard for floating-point arithmetic. IEEE Standard 754-2008 (2008)Google Scholar
  2. 2.
    Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD 2006, pp. 671–682. ACM, New York (2006)CrossRefGoogle Scholar
  3. 3.
    Anh, V.N., Moffat, A.: Index compression using fixed binary codewords. In: Proceedings of the 15th Australasian Database Conference, ADC 2004, vol. 27, pp. 61–67. Australian Computer Society, Inc., Darlinghurst (2004)Google Scholar
  4. 4.
    Antoshenkov, G.: Byte-aligned bitmap compression. In: Data Compression Conference, p. 476 (1995)Google Scholar
  5. 5.
    Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: An adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. The Astrophysical Journal Supplement Series 131, 273–334 (2000)CrossRefGoogle Scholar
  6. 6.
    Burtscher, M., Ratanaworabhan, P.: High throughput compression of double-precision floating-point data. In: IEEE Data Compression Conference, pp. 293–302 (2007)Google Scholar
  7. 7.
    Burtscher, M., Ratanaworabhan, P.: FPC: A high-speed compressor for double-precision floating-point data. IEEE Transactions on Computers 58, 18–31 (2009)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chen, J.H., Choudhary, A., Supinski, B., DeVries, M., Hawkes, E.R., Klasky, S., Liao, W., Ma, K., Mellor-Crummey, J., Podhorszki, N., Sankaran, R., Shende, S., Yoo, C.: Terascale direct numerical simulations of turbulent combustion using S3D. Comp. Sci. and Discovery 2(1)Google Scholar
  9. 9.
    Comer, D.: The ubiquitous B-Tree. ACM Comput. Surv. 11, 121–137 (1979)CrossRefzbMATHGoogle Scholar
  10. 10.
    Goeman, B., Vandierendonck, H., Bosschere, K.D.: Differential FCM: Increasing value prediction accuracy by improving table usage efficiency. In: Seventh International Symposium on High Performance Computer Architecture, pp. 207–216 (2001)Google Scholar
  11. 11.
    Graefe, G., Shapiro, L.: Data compression and database performance. In: Proceedings of the 1991 Symposium on Applied Computing, pp. 22–27 (April 1991)Google Scholar
  12. 12.
    Ibarria, L., Lindstrom, P., Rossignac, J., Szymczak, A.: Out-of-core compression and decompression of large n-dimensional scalar fields. Computer Graphics Forum 22, 343–348 (2003)CrossRefGoogle Scholar
  13. 13.
    Isenburg, M., Lindstrom, P., Snoeyink, J.: Lossless compression of predicted floating-point geometry. Computer-Aided Design 37(8), 869–877 (2005); CAD 2004 Special Issue: Modelling and Geometry Representations for CADGoogle Scholar
  14. 14.
    Iyer, B.R., Wilhite, D.: Data compression support in databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 695–704. Morgan Kaufmann Publishers Inc., San Francisco (1994)Google Scholar
  15. 15.
    Jenkins, J., et al.: Analytics-driven lossless data compression for rapid in-situ indexing, storing, and querying. In: Liddle, S.W., Schewe, K.-D., Tjoa, A.M., Zhou, X. (eds.) DEXA 2012, Part II. LNCS, vol. 7447, pp. 16–30. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Ku, S., Chang, C., Diamond, P.: Full-f gyrokinetic particle simulation of centrally heated global ITG turbulence from magnetic axis to edge pedestal top in a realistic Tokamak geometry. Nuclear Fusion 49(11), 115021 (2009)CrossRefGoogle Scholar
  17. 17.
    Lindstrom, P., Isenburg, M.: Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics 12, 1245–1250 (2006)CrossRefGoogle Scholar
  18. 18.
    Schendel, E.R., Jin, Y., Shah, N., Chen, J., Chang, C., Ku, S.-H., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: ISOBAR preconditioner for effective and high-throughput lossless data compression. In: Proceedings of the 28th International Conference on Data Engineering, ICDE 2012. IEEE (2012)Google Scholar
  19. 19.
    Sinha, R.R., Winslett, M.: Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst. 32 (2007)Google Scholar
  20. 20.
    Wang, W.X., Lin, Z., Tang, W.M., Lee, W.W., Ethier, S., Lewandowski, J.L.V., Rewoldt, G., Hahm, T.S., Manickam, J.: Gyro-kinetic simulation of global turbulent transport properties in Tokamak experiments. Physics of Plasmas 13(9), 092505 (2006)Google Scholar
  21. 21.
    Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The implementation and performance of compressed databases. SIGMOD Rec. 29(3), 55–67 (2000)CrossRefGoogle Scholar
  22. 22.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann (1999)Google Scholar
  23. 23.
    Wu, K.: Fastbit: an efficient indexing technology for accelerating data-intensive science. Journal of Physics: Conference Series 16, 556 (2005)CrossRefGoogle Scholar
  24. 24.
    Wu, K., Ahern, S., Bethel, E.W., Chen, J., Childs, H., Cormier-Michel, E., Geddes, C., Gu, J., Hagen, H., Hamann, B., Koegler, W., Lauret, J., Meredith, J., Messmer, P., Otoo, E., Perevoztchikov, V., Poskanzer, A., Prabhat, Rubel, O., Shoshani, A., Sim, A., Stockinger, K., Weber, G., Zhang, W.-M.: FastBit: interactively searching massive data. Journal of Physics: Conference Series 180(1), 012053 (2009)Google Scholar
  25. 25.
    Wu, K., Otoo, E., Shoshani, A.: On the performance of bitmap indices for high cardinality attributes. In: Proc. of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30, pp. 24–35 (2004)Google Scholar
  26. 26.
    Wu, K., Otoo, E.J., Shoshani, A.: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31, 1–38 (2006)Google Scholar
  27. 27.
    Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 401–410. ACM, New York (2009)CrossRefGoogle Scholar
  28. 28.
    Yiannakis, S., Smith, J.E.: The predictability of data values. In: Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 30, pp. 248–258. IEEE Computer Society, Washington, DC (1997)Google Scholar
  29. 29.
    Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing Surveys 38(2) (July 2006)Google Scholar
  30. 30.
    Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar ram-cpu cache compression. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, pp. 59–71. IEEE Computer Society, Washington, DC (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • John Jenkins
    • 1
    • 2
  • Isha Arkatkar
    • 1
    • 2
  • Sriram Lakshminarasimhan
    • 1
    • 2
  • David A. BoyukaII
    • 1
    • 2
  • Eric R. Schendel
    • 1
    • 2
  • Neil Shah
    • 1
    • 2
  • Stephane Ethier
    • 3
  • Choong-Seock Chang
    • 3
  • Jackie Chen
    • 4
  • Hemanth Kolla
    • 4
  • Scott Klasky
    • 2
  • Robert Ross
    • 5
  • Nagiza F. Samatova
    • 1
    • 2
  1. 1.North Carolina State UniversityUSA
  2. 2.Oak Ridge National LaboratoryUSA
  3. 3.Princeton Plasma Physics LaboratoryPrincetonUSA
  4. 4.Sandia National LaboratoryLivermoreUSA
  5. 5.Argonne National LaboratoryArgonneUSA

Personalised recommendations