A Parallel Compressed Data Cube Based on Hadoop
Aiming at the on-line analytical processing technology, this paper proposes a parallel compressed data cube algorithm based on Hadoop architecture. The algorithm divides a single data cube into several independent sub-compressed data cubes, and then uses Hadoop architecture to realize the parallel construction and query of the entire data cube. Experiments show that the parallel compressed data cube algorithm combines the parallelism and high scalability of the Hadoop architecture on the one hand, and on the other hand, it can realize faster query operation on data cube by means of a self-indexing of the compressed data cube. So it has good research value and practical application significance.
KeywordsData cube Hadoop Parallel
This work was supported by the National Natural Science Foundation of China (No. 61702345).
- 2.Golfarelli, M., Rizzi, S.: Designing the data warehouse: key steps and crucial issues. J. Comput. Sci. Inf. Manag. 2(3), 13–22 (1999)Google Scholar
- 3.Wang, W., Lu, H.J., Feng, J.L., et al.: Condensed cube: an effective approach to reducing data cube size. In: Proceedings of the 18th International Conference on Data Engineering, pp. 155–165 (2002)Google Scholar
- 5.Sismanis, Y., Deligiannakis, A., Roussopoulos, N., et al.: Dwarf: shrinking the PetaCube. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 464–475 (2002)Google Scholar
- 11.Othayoth, R., Poess, M.: The making of TPC-DS. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 1049–1058 (2006)Google Scholar