Parallel Data Cube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation
- 320 Downloads
Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper presents two new algorithms for parallel data cube construction, along with their theoretical analysis and experimental evaluation. Our work is based upon a new data-structure, called the aggregation tree, which results in minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We prove that the same ordering of the dimensions minimizes both the computational and communication requirements, for both the algorithms. We also describe a method for partitioning the initial array, which again minimizes the communication volume for both the algorithms. Experimental results further validate the theoretical results.
Unable to display preview. Download preview PDF.
- 2.Dehne, F., Eavis, T., Hambrusch, S., Rau-Chaplin, A.: Parallelizing the data cube. Distributed and Parallel Databases: An International Journal (Special Issue on Parallel and Distributed Data Mining) (2002) (to appear)Google Scholar
- 3.Goil, S., Choudhary, A.: High performance OLAP and data mining on parallel computers. Technical Report CPDC-TR-97-05, Center for Parallel and Distributed Computing, Northwestern University (December 1997)Google Scholar
- 5.Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregational Operator for Generalizing Group-Bys, Cross-Tabs, and Sub-totals. Technical Report MSRTR- 95-22, Microsoft Research (1995)Google Scholar
- 6.Agrawal, S., Agrawal, R., Desphpande, P.M., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proc 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 506–521 (1996)Google Scholar
- 7.Tam, Y.J.: Datacube: Its implementation and application in olap mining. Master’s thesis, Simon Fraser University (September 1998)Google Scholar
- 8.Yang, G., Jin, R., Agrawal, G.: Implementing data cube construction using a cluster middleware: Algorithms, implementation experience and performance evaluation. In: The 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany (May 2002)Google Scholar