Abstract
Multi-dimensional data sets are very common in areas such as data warehousing and statistical databases. In these environments, core tables often grow to enormous sizes. In order to reduce storage requirements, and therefore to permit the retention of even larger data sets, compression methods are an attractive option. In this paper we discuss an efficient compression framework that is specifically designed for very large relational database implementations. The primary methods exploit a Hilbert space filling curve to dramatically reduce the storage footprint for the underlying tables. Tuples are individually compressed into page sized units so that only blocks relevant to the user’s multi-dimensional query need be accessed. Compression is available not only for the relational tables themselves, but also for the associated r-tree indexes. Experimental results demonstrate compression rates of more than 90% for multi-dimensional data, and up to 98% for the indexes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Faloutsos, C., Roseman, S.: Fractals for secondary key retrieval. In: ACM Symposium on Principles of Database Systems, pp. 247–252. ACM Press, New York (1989)
Gaede, V., Gunther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: ICDE. International Conference on Data Engineering, pp. 370–379 (1998)
Golomb, S.W.: Run-length encodings. IEEE Transactions on Information Theory 12(3), 399–401 (1966)
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: ICDE. International Conference On Data Engineering, pp. 152–159 (1996)
Guttman, A.: R-trees: A dynamic index structure for spatial searching, pp. 47–57 (1984)
Hahn, C., Warren, S., Loudon, J.: Edited synoptic cloud reports from ships and land stations over the globe. Available at http://cdiac.esd.ornl.gov/cdiac/ndps/ndpo26b.html
Hilbert, D.: Ueber die stetige abbildung einer line auf ein flchenstck. Mathematische Annalen 38(3), 459–460 (1891)
Huffman, D.: A method for the construction of minimum redundancy codes. Proceedings of the Institute of Radio Engineers (IRE) 40(9), 1098–1101 (1952)
Jagadish, H.: Linear clustering of objects with multiple attributes. In: ACM SIGMOD, 332–342 (1990)
Kamel, I., Faloutsos, C.: On packing r-trees. In: CIKM. International Conference on Information and Knowledge Management, pp. 490–499 (1993)
Leutenegger, S., Lopez, M., Eddington, J.: STR: A simple and efficient algorithm for r-tree packing. In: ICDE. International Conference on Data Engineering, pp. 497–506 (1997)
Moon, B., Jagadish, H., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the hilbert space-filling curve. Knowledge and Data Engineering 13(1), 124–141 (2001)
Ng, W., Ravishankar, C.V.: Block-oriented compression techniques for large statistical databases. IEEE Transactions on Knowledge and Data Engineering 9(2), 314–328 (1997)
Peano, G.: Sur une courbe, qui remplit toute une aire plane. Mathematische Annalen 36(1), 157–160 (1890)
Ray, G., Haritsa, J.R., Seshadri, S.: Database compression: A performance enhancement tool. In: COMAD. International Conference on Management of Data (1995)
Rissanen, J.: Generalized kraft inequality and arithmetic coding. IBM Journal of Research and Development 20(3), 198–203 (1976)
Roussopoulos, N., Leifker, D.: Direct spatial search on pictorial databases using packed r-trees, pp. 17–31 (1985)
Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the petacube. In: ACM SIGMOD, pp. 464–475 (2002)
Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The implementation and performance of compressed databases. SIGMOD Record 29(3), 55–67 (2000)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eavis, T., Cueva, D. (2007). A Hilbert Space Compression Architecture for Data Warehouse Environments. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-74553-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)