Skip to main content

Needles in the Haystack — Tackling Bit Flips in Lightweight Compressed Data

  • Conference paper
  • First Online:
Data Management Technologies and Applications (DATA 2015)

Abstract

Modern database systems are very often in the position to store their entire data in main memory. Aside from increased main emory capacities, a further driver for in-memory database system has been the shift to a column-oriented storage format in combination with lightweight data compression techniques. Using both mentioned software concepts, large datasets can be held and efficiently processed in main memory with a low memory footprint. Unfortunately, hardware becomes more and more vulnerable to random faults, so that e.g., the probability rate for bit flips in main memory increases, and this rate is likely to escalate in future dynamic random-access memory (DRAM) modules. Since the data is highly compressed by the lightweight compression algorithms, multi bit flips will have an extreme impact on the reliability of database systems. To tackle this reliability issue, we introduce our research on error resilient lightweight data compression algorithms in this paper. Of course, our software approach lacks the efficiency of hardware realization, but its flexibility and adaptability will play a more important role regarding differing error rates, e.g. due to hardware aging effects and aggressive processor voltage and frequency scaling. Arithmetic AN encoding is one family of codes which is an interesting candidate for effective software-based error detection. We present results of our research showing tradeoffs between compressibility and resiliency characteristics of data. We show that particular choices of the AN-code parameter lead to a moderate loss of performance. We provide evaluation for two proposed techniques, namely AN-encoded Null Suppression and AN-encoded Run Length Encoding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Single-error correcting and double-error detecting.

  2. 2.

    Please note that some codes for lossless data compression are also called arithmetic codes. These are not equivalent with the ones used throughout this paper.

  3. 3.

    The probabilities for the table are taken from the experimental results of [11]; they can be found on https://www4.cs.fau.de/Research/CoRed/experiments.

References

  1. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 671–682 (2006)

    Google Scholar 

  2. Antoshenkov, G., Lomet, D.B., Murray, J.: Order preserving compression. In: Proceedings of the Twelfth International Conference on Data Engineering, ICDE 1996, pp. 655–663 (1996)

    Google Scholar 

  3. Bassiouni, M.A.: Data compression in scientific and statistical databases. IEEE Trans. Softw. Eng. 11(10), 1047–1058 (1985)

    Article  Google Scholar 

  4. Bohannon, P., Rastogi, R., Seshadri, S., Silberschatz, A., Sudarshan, S.: Detection and recovery techniques for database corruption. IEEE Trans. Knowl. Data Eng. 15(5), 1120–1136 (2003)

    Article  Google Scholar 

  5. Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: memory access. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 54–65 (1999)

    Google Scholar 

  6. Chen, Z., Gehrke, J., Korn, F.: Query optimization in compressed database systems. SIGMOD Rec. 30(2), 271–282 (2001)

    Article  Google Scholar 

  7. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of 14th International Conference on Data Engineering, pp. 370–379, February 1998

    Google Scholar 

  8. Graefe, G., Kuno, H., Seeger, B.: Self-diagnosing and self-healing indexes. In: DBTest, pp. 8:1–8:8 (2012)

    Google Scholar 

  9. Graefe, G., Stonecipher, R.: Efficient verification of b-tree integrity. In: BTW, pp. 27–46 (2009)

    Google Scholar 

  10. Hildebrandt, J., Habich, D., Damme, P., Lehner, W.: Modularization of lightweight data compression algorithms. Technical report, Department of Computer Science, Technische Universität Dresden, November 2015. https://wwwdb.inf.tu-dresden.de/misc/team/habich/dcc2016.pdf. submitted to DCC 2016

  11. Hoffmann, M., Ulbrich, P., Dietrich, C., Schirmeier, H., Lohmann, D., Schröder-Preikschat, W.: A practitioner’s guide to software-based soft-error mitigation using AN-codes. In: HASE 2014, pp. 33–40 (2014)

    Google Scholar 

  12. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40(9), 1098–1101 (1952)

    Google Scholar 

  13. Hwang, A.A., Stefanovici, I.A., Schroeder, B.: Cosmic rays don’t strike twice: understanding the nature of DRAM errors and the implications for system design. SIGARCH Comput. Archit. News 40(1), 111–122 (2012)

    Article  Google Scholar 

  14. Kissinger, T., Kiefer, T., Schlegel, B., Habich, D., Molka, D., Lehner, W.: ERIS: a numa-aware in-memory storage engine for analytical workload. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures - ADMS, pp. 74–85 (2014)

    Google Scholar 

  15. Kolditz, T., Kissinger, T., Schlegel, B., Habich, D., Lehner, W.: Online bit flip detection for in-memory b-trees on unreliable hardware. In: DaMoN, pp. 5:1–5:9 (2014)

    Google Scholar 

  16. Lehman, T.J., Carey, M.J.: Query processing in main memory database management systems. In: Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, SIGMOD 1986, pp. 239–250 (1986)

    Google Scholar 

  17. Lehner, W.: Energy-efficient in-memory database computing. In: Design, Automation and Test in Europe, DATE 13, Grenoble, France, 18–22 March 2013, pp. 470–474 (2013)

    Google Scholar 

  18. Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. CoRR abs/1209.2137 (2012)

    Google Scholar 

  19. May, T.C., Woods, M.H.: Alpha-particle-induced soft errors in dynamic memories. IEEE Trans. Electron Devices 26(1), 2–9 (1979)

    Article  Google Scholar 

  20. Moon, T.K.: Error Correction Coding: Mathematical Methods and Algorithms. Wiley, Hoboken (2005)

    Book  Google Scholar 

  21. Reghbati, H.K.: An overview of data compression techniques. IEEE Comput. 14(4), 71–75 (1981)

    Article  Google Scholar 

  22. Roth, M.A., Van Horn, S.J.: Database compression. SIGMOD Rec. 22(3), 31–39 (1993)

    Article  Google Scholar 

  23. Schiffel, U.: Hardware Error Detection Using AN-Codes. Ph.D. thesis, Technische Universität Dresden (2011)

    Google Scholar 

  24. Schlegel, B., Gemulla, R., Lehner, W.: Fast integer compression using simd instructions. In: DaMoN. pp. 34–40 (2010)

    Google Scholar 

  25. Schroeder, B., Gibson, G.A.: A large-scale study of failures in high performance-computing systems. Dependable Secure Comput. 7(4), 337–350 (2010)

    Article  Google Scholar 

  26. Schroeder, B., Pinheiro, E., Weber, W.D.: Dram errors in the wild: a large-scale field study. In: Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2009, pp. 193–204 (2009)

    Google Scholar 

  27. Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.: Simd-based decoding of posting lists. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 317–326 (2011)

    Google Scholar 

  28. Stonebraker, M.: Technical perspective - one size fits all: an idea whose time has come and gone. Commun. ACM 51(12), 76 (2008)

    Article  Google Scholar 

  29. Sullivan, M., Stonebraker, M.: Using write protected data structures to improve software fault tolerance in highly available database management systems. In: VLDB, pp. 171–180 (1991)

    Google Scholar 

  30. Warren, H.S.: Hacker’s Delight. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (2002)

    Google Scholar 

  31. Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (1987)

    Article  Google Scholar 

  32. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  33. Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar ram-cpu cache compression. In: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, pp. 59–59, April 2006

    Google Scholar 

Download references

Acknowledgements

This work is partly supported by the German Research Foundation (DFG) within the Cluster of Excellence “Center for Advanced Electronics Dresden” (cfAED) and by the DFG-grant LE-1416/26-1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dirk Habich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kolditz, T., Habich, D., Kuvaiskii, D., Lehner, W., Fetzer, C. (2016). Needles in the Haystack — Tackling Bit Flips in Lightweight Compressed Data. In: Helfert, M., Holzinger, A., Belo, O., Francalanci, C. (eds) Data Management Technologies and Applications. DATA 2015. Communications in Computer and Information Science, vol 584. Springer, Cham. https://doi.org/10.1007/978-3-319-30162-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30162-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30161-7

  • Online ISBN: 978-3-319-30162-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics