Parallel Data Cube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation

  • Ruoming Jin
  • Ge Yang
  • Gagan Agrawal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2913)


Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper presents two new algorithms for parallel data cube construction, along with their theoretical analysis and experimental evaluation. Our work is based upon a new data-structure, called the aggregation tree, which results in minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We prove that the same ordering of the dimensions minimizes both the computational and communication requirements, for both the algorithms. We also describe a method for partitioning the initial array, which again minimizes the communication volume for both the algorithms. Experimental results further validate the theoretical results.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. McGraw-Hill, New York (1990)zbMATHGoogle Scholar
  2. 2.
    Dehne, F., Eavis, T., Hambrusch, S., Rau-Chaplin, A.: Parallelizing the data cube. Distributed and Parallel Databases: An International Journal (Special Issue on Parallel and Distributed Data Mining) (2002) (to appear)Google Scholar
  3. 3.
    Goil, S., Choudhary, A.: High performance OLAP and data mining on parallel computers. Technical Report CPDC-TR-97-05, Center for Parallel and Distributed Computing, Northwestern University (December 1997)Google Scholar
  4. 4.
    Goil, S., Choudhary, A.: PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining. Journal of Parallel and Distributed Computing 61(3), 285–321 (2001)zbMATHCrossRefGoogle Scholar
  5. 5.
    Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregational Operator for Generalizing Group-Bys, Cross-Tabs, and Sub-totals. Technical Report MSRTR- 95-22, Microsoft Research (1995)Google Scholar
  6. 6.
    Agrawal, S., Agrawal, R., Desphpande, P.M., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proc 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 506–521 (1996)Google Scholar
  7. 7.
    Tam, Y.J.: Datacube: Its implementation and application in olap mining. Master’s thesis, Simon Fraser University (September 1998)Google Scholar
  8. 8.
    Yang, G., Jin, R., Agrawal, G.: Implementing data cube construction using a cluster middleware: Algorithms, implementation experience and performance evaluation. In: The 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany (May 2002)Google Scholar
  9. 9.
    Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array based algorithm for simultaneous multidimensional aggregates. In: Prceedings of the ACM SIGMOD International Conference on Management of Data, June 1997, pp. 159–170. ACM Press, New York (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Ruoming Jin
    • 1
  • Ge Yang
    • 1
  • Gagan Agrawal
    • 1
  1. 1.Department of Computer and Information SciencesOhio State UniversityColumbusUSA

Personalised recommendations