Skip to main content

Efficient Compression of Text Attributes of Data Warehouse Dimensions

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3589))

Included in the following conference series:

Abstract

This paper proposes the compression of data in Relational Database Management Systems (RDBMS) using existing text compression algorithms. Although the technique proposed is general, we believe it is particularly advantageous for the compression of medium size and large dimension tables in data warehouses. In fact, dimensions usually have a high number of text attributes and a reduction in their size has a big impact in the execution time of queries that join dimensions with fact tables. In general, the high complexity and long execution time of most data warehouse queries make the compression of dimension text attributes (and possible text attributes that may exist in the fact table, such as false facts) an effective approach to speed up query response time. The proposed approach has been evaluated using the well-known TPC-H benchmark and the results show that speed improvements greater than 40% can be achieved for most of the queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Goldstein, J., Ramakrishna, R., Shaft, U.: Squeezing the most out of relational database systems. In: Proc. of ICDE, p. 81 (2000)

    Google Scholar 

  2. Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The Implementation and Performance of Compressed Databases. ACM SIGMOD Record 29(3), 55–67 (2000)

    Article  Google Scholar 

  3. Lelewer, D., Hirschberg, D.: Data Compression. ACM Computing Surveys (1987)

    Google Scholar 

  4. Data Compression Conference. DCC Home page, http://www.cs.brandeis.edu/~dcc/index.html

  5. Roth, M., Horn, S.: Database compression. SIGMOD Record 22(3), 31–39 (1993)

    Article  Google Scholar 

  6. Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Francisco (1993)

    MATH  Google Scholar 

  7. Ramakrishnan, R., Gehrke, J.: Database Management Systems. McGraw Hill, New York (2000)

    Google Scholar 

  8. Huffman, D.: A Method for the Construction of Minimum Redundancy Codes. Proc. IRE 40(9), 1098–1101 (1952)

    Article  Google Scholar 

  9. Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 22(1), 337–343 (1997)

    MathSciNet  Google Scholar 

  10. Karadimitriou, K., Tyler, J.: Min-Max Compression Methods for Medical Image Databases. In: SIGMOD Record, vol. 26(1) (1997)

    Google Scholar 

  11. Moffatt, A., Zobel, J.: Text Compression for Dynamic Document Databases. IEEE Transactions on Knowledge and Data Engineering 9(2) (1997)

    Google Scholar 

  12. Chen, Z., Gehrke, J., Korn, F.: Query Optimization In Compressed Database Systems. In: ACM SIGMOD, pp. 271–282 (2001)

    Google Scholar 

  13. Poess, M., Potapov, D.: Data Compression in Oracle. In: Proceedings of the 29th VLDB Conference (2003)

    Google Scholar 

  14. Morris, M.: Teradata Multi-Value Compression V2R5.0, Teradata White Paper (2002), available at, http://www.teradata.com/t/page/86995/

  15. IBM Redbooks: DB2 V3 Performance Topics, 75-90 (1994), available at, http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/gg244284.html?Open

  16. Krneta, P.: Sybase Adaptive Server® IQ with Multiplex, Sybase White Paper (2000), available at, http://www.sybase.com/products/databaseservers/asiq

  17. Kimball, R., Ross, M.: The data warehouse toolkit, 2nd edn. Ed. John Wiley & Sons, Inc., Chichester (2002)

    Google Scholar 

  18. Bernardino, J., Furtado, P., Madeira, H.: Approximate Query Answering Using Data Warehouse Striping. Journal of Data and Knowledge Engineering 19(2) (2002)

    Google Scholar 

  19. Costa, M., Vieira, J., Bernardino, J., Furtado, P., Madeira, H.: A middle layer for distributed data warehouses using the DWS-AQA technique. In: 8th Conference on Software Engineering and Databases (2003)

    Google Scholar 

  20. Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An Efficient Compression Code for Text Databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  21. Silva da Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Compression: A Key for Next-Generation Text Retrieval Systems. ACM transactions on informations systems, 113–139 (2000)

    Google Scholar 

  22. TPC-H benchmark, http://www.tpc.org/tpch/spec/tpch2.1.0.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vieira, J., Bernardino, J., Madeira, H. (2005). Efficient Compression of Text Attributes of Data Warehouse Dimensions. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_35

Download citation

  • DOI: https://doi.org/10.1007/11546849_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28558-8

  • Online ISBN: 978-3-540-31732-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics