Adaptive Tuple Differential Coding
It is desirable to employ compression techniques in Relational OLAP systems to reduce disk space requirements and increase disk I/O throughput. Tuple Differential Coding (TDC) techniques have been introduced to compress views on a tuple level by storing only the differences between consecutive ordered tuples. These techniques work well for highly regular data in which the differences between tuples are fairly constant but are less effective on real data containing either skew or outliers. In this paper we introduce Adaptive Tuple Differential Coding (ATDC), which employs optimization techniques to analyze blocks of tuples to detect large tuple differences, with the purpose of isolating them to minimize their negative effect on the compression of neighbouring tuples. Our experiments show that this new algorithm provides an increase in compression ratio of 15–30% over TDC on typical real datasets.
KeywordsCompression Ratio Compression Algorithm High Compression Ratio Compression Time Disk Block
Unable to display preview. Download preview PDF.
- 2.cgmLab: OLAP data generator (2000), http://cgmlab.cs.dal.ca/downloadarea/
- 3.Chen, Z., Seshadri, P.: An algebraic compression framework for query results. In: ICDE, pp. 177–188 (2000)Google Scholar
- 4.Dehne, F., Eavis, T., Rau-Chaplin, A.: Parallel multi-dimensional ROLAP indexing. In: Proc. Int’l Symposium on Cluster Computing and the Grid, 2003, pp. 86–93 (2003)Google Scholar
- 5.Liang, B.: Compressing data cube in parallel OLAP systems. Master’s thesis, Carleton University (2004)Google Scholar
- 7.US Geological Survey. HYDRO1k elevation derivative database (2003), http://edcdaac.usgs.gov/gtopo30/hydro/index.asp