Abstract
This paper proposes the compression of data in Relational Database Management Systems (RDBMS) using existing text compression algorithms. Although the technique proposed is general, we believe it is particularly advantageous for the compression of medium size and large dimension tables in data warehouses. In fact, dimensions usually have a high number of text attributes and a reduction in their size has a big impact in the execution time of queries that join dimensions with fact tables. In general, the high complexity and long execution time of most data warehouse queries make the compression of dimension text attributes (and possible text attributes that may exist in the fact table, such as false facts) an effective approach to speed up query response time. The proposed approach has been evaluated using the well-known TPC-H benchmark and the results show that speed improvements greater than 40% can be achieved for most of the queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Goldstein, J., Ramakrishna, R., Shaft, U.: Squeezing the most out of relational database systems. In: Proc. of ICDE, p. 81 (2000)
Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The Implementation and Performance of Compressed Databases. ACM SIGMOD Record 29(3), 55–67 (2000)
Lelewer, D., Hirschberg, D.: Data Compression. ACM Computing Surveys (1987)
Data Compression Conference. DCC Home page, http://www.cs.brandeis.edu/~dcc/index.html
Roth, M., Horn, S.: Database compression. SIGMOD Record 22(3), 31–39 (1993)
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann, San Francisco (1993)
Ramakrishnan, R., Gehrke, J.: Database Management Systems. McGraw Hill, New York (2000)
Huffman, D.: A Method for the Construction of Minimum Redundancy Codes. Proc. IRE 40(9), 1098–1101 (1952)
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 22(1), 337–343 (1997)
Karadimitriou, K., Tyler, J.: Min-Max Compression Methods for Medical Image Databases. In: SIGMOD Record, vol. 26(1) (1997)
Moffatt, A., Zobel, J.: Text Compression for Dynamic Document Databases. IEEE Transactions on Knowledge and Data Engineering 9(2) (1997)
Chen, Z., Gehrke, J., Korn, F.: Query Optimization In Compressed Database Systems. In: ACM SIGMOD, pp. 271–282 (2001)
Poess, M., Potapov, D.: Data Compression in Oracle. In: Proceedings of the 29th VLDB Conference (2003)
Morris, M.: Teradata Multi-Value Compression V2R5.0, Teradata White Paper (2002), available at, http://www.teradata.com/t/page/86995/
IBM Redbooks: DB2 V3 Performance Topics, 75-90 (1994), available at, http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/gg244284.html?Open
Krneta, P.: Sybase Adaptive Server® IQ with Multiplex, Sybase White Paper (2000), available at, http://www.sybase.com/products/databaseservers/asiq
Kimball, R., Ross, M.: The data warehouse toolkit, 2nd edn. Ed. John Wiley & Sons, Inc., Chichester (2002)
Bernardino, J., Furtado, P., Madeira, H.: Approximate Query Answering Using Data Warehouse Striping. Journal of Data and Knowledge Engineering 19(2) (2002)
Costa, M., Vieira, J., Bernardino, J., Furtado, P., Madeira, H.: A middle layer for distributed data warehouses using the DWS-AQA technique. In: 8th Conference on Software Engineering and Databases (2003)
Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An Efficient Compression Code for Text Databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)
Silva da Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Compression: A Key for Next-Generation Text Retrieval Systems. ACM transactions on informations systems, 113–139 (2000)
TPC-H benchmark, http://www.tpc.org/tpch/spec/tpch2.1.0.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vieira, J., Bernardino, J., Madeira, H. (2005). Efficient Compression of Text Attributes of Data Warehouse Dimensions. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_35
Download citation
DOI: https://doi.org/10.1007/11546849_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)