A Hilbert Space Compression Architecture for Data Warehouse Environments

Eavis, Todd; Cueva, David

doi:10.1007/978-3-540-74553-2_1

Todd Eavis¹ &
David Cueva¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1249 Accesses
8 Citations
3 Altmetric

Abstract

Multi-dimensional data sets are very common in areas such as data warehousing and statistical databases. In these environments, core tables often grow to enormous sizes. In order to reduce storage requirements, and therefore to permit the retention of even larger data sets, compression methods are an attractive option. In this paper we discuss an efficient compression framework that is specifically designed for very large relational database implementations. The primary methods exploit a Hilbert space filling curve to dramatically reduce the storage footprint for the underlying tables. Tuples are individually compressed into page sized units so that only blocks relevant to the user’s multi-dimensional query need be accessed. Compression is available not only for the relational tables themselves, but also for the associated r-tree indexes. Experimental results demonstrate compression rates of more than 90% for multi-dimensional data, and up to 98% for the indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Faloutsos, C., Roseman, S.: Fractals for secondary key retrieval. In: ACM Symposium on Principles of Database Systems, pp. 247–252. ACM Press, New York (1989)
Google Scholar
Gaede, V., Gunther, O.: Multidimensional access methods. ACM Computing Surveys 30(2), 170–231 (1998)
Article Google Scholar
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: ICDE. International Conference on Data Engineering, pp. 370–379 (1998)
Google Scholar
Golomb, S.W.: Run-length encodings. IEEE Transactions on Information Theory 12(3), 399–401 (1966)
Article MATH MathSciNet Google Scholar
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: ICDE. International Conference On Data Engineering, pp. 152–159 (1996)
Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching, pp. 47–57 (1984)
Google Scholar
Hahn, C., Warren, S., Loudon, J.: Edited synoptic cloud reports from ships and land stations over the globe. Available at http://cdiac.esd.ornl.gov/cdiac/ndps/ndpo26b.html
Hilbert, D.: Ueber die stetige abbildung einer line auf ein flchenstck. Mathematische Annalen 38(3), 459–460 (1891)
Article MathSciNet Google Scholar
Huffman, D.: A method for the construction of minimum redundancy codes. Proceedings of the Institute of Radio Engineers (IRE) 40(9), 1098–1101 (1952)
Google Scholar
Jagadish, H.: Linear clustering of objects with multiple attributes. In: ACM SIGMOD, 332–342 (1990)
Google Scholar
Kamel, I., Faloutsos, C.: On packing r-trees. In: CIKM. International Conference on Information and Knowledge Management, pp. 490–499 (1993)
Google Scholar
Leutenegger, S., Lopez, M., Eddington, J.: STR: A simple and efficient algorithm for r-tree packing. In: ICDE. International Conference on Data Engineering, pp. 497–506 (1997)
Google Scholar
Moon, B., Jagadish, H., Faloutsos, C., Saltz, J.: Analysis of the clustering properties of the hilbert space-filling curve. Knowledge and Data Engineering 13(1), 124–141 (2001)
Article Google Scholar
Ng, W., Ravishankar, C.V.: Block-oriented compression techniques for large statistical databases. IEEE Transactions on Knowledge and Data Engineering 9(2), 314–328 (1997)
Article Google Scholar
Peano, G.: Sur une courbe, qui remplit toute une aire plane. Mathematische Annalen 36(1), 157–160 (1890)
Article MathSciNet Google Scholar
Ray, G., Haritsa, J.R., Seshadri, S.: Database compression: A performance enhancement tool. In: COMAD. International Conference on Management of Data (1995)
Google Scholar
Rissanen, J.: Generalized kraft inequality and arithmetic coding. IBM Journal of Research and Development 20(3), 198–203 (1976)
Article MATH MathSciNet Google Scholar
Roussopoulos, N., Leifker, D.: Direct spatial search on pictorial databases using packed r-trees, pp. 17–31 (1985)
Google Scholar
Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the petacube. In: ACM SIGMOD, pp. 464–475 (2002)
Google Scholar
Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The implementation and performance of compressed databases. SIGMOD Record 29(3), 55–67 (2000)
Article Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Concordia University, Montreal, Canada
Todd Eavis & David Cueva

Authors

Todd Eavis
View author publications
You can also search for this author in PubMed Google Scholar
David Cueva
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eavis, T., Cueva, D. (2007). A Hilbert Space Compression Architecture for Data Warehouse Environments. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-74553-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics