Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals
Multi-dimensional integrals of products of several arrays arise in certain scientific computations. In the context of these integral calculations, this paper addresses a memory usage minimization problem. Based on a framework that models the relationship between loop fusion and memory usage, we propose an algorithm for finding a loop fusion configuration that minimizes memory usage. A practical example shows the performance improvement obtained by our algorithm on an electronic structure computation.
Unable to display preview. Download preview PDF.
- 1.W. Aulbur, Parallel implementation of quasip article calculations of semiconductors and insulators Ph.D. Dissertation, Ohio State University, Columbus, October 1996.Google Scholar
- 2.S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng, Automatic array alignment in data-parallel programs, 20th Annual ACM SIGACTS/SIGPLAN Symposium on Principles of Programming Languages, New York, pp. 16–28, 1993.Google Scholar
- 3.S. Chatterjee, J. R. Gilbert, R. Schreiber, and S.-H. Teng, Optimal evaluation of array expressions on massively parallel machines, ACM TOPLAS, 17(1), pp. 123–156, Jan. 1995.Google Scholar
- 4.C. N. Fischer and R. J. LeBlanc Jr, Crafting a compiler, Menlo Park, CA: Benjamin/Cummings, 1991.Google Scholar
- 5.L. J. Guibas and D. K. Wyatt, Compilation and Delayed Evaluation in APL, Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona, pp. 1–8, Jan. 1978.Google Scholar
- 6.G. Gao, R. Olsen, V. Sarkar, and R. Thekkath, Collective loop fusion for array contraction, Languages and Compilers for Parallel Processing, New Haven, CT, August 1992.Google Scholar
- 8.K. Kennedy and K. S. McKinley, Maximizing loop parallelism and improving data locality via loop fusion and distribution, Languages and Compilers for Parallel Computing, Portland, OR, pp. 301–320, August 1993.Google Scholar
- 9.C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan, Memory-optimal evaluation of expression trees involving large objects, Technical report no. OSU-CISRC-5/99-TR13, Dept. of Computer and Information Science, The Ohio State University, May 1999.Google Scholar
- 11.C. Lam, P. Sadayappan, and R. Wenger, Optimization of a class of multi-dimensional integrals on parallel machines, Eighth SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN, March 1997.Google Scholar
- 12.C. Lam, P. Sadayappan, D. Cociorva, M. Alouani, and J. Wilkins, Performance optimization of a class of loops involving sums of products of sparse arrays, Ninth SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, TX, March 1999.Google Scholar
- 13.C. Lam, Performance optimization of a class of loops implementing multi-dimensional integrals, Technical report no. OSU-CISRC-8/99-TR22, Dept. of Computer and Information Science, The Ohio State University, Columbus, August 1999.Google Scholar
- 14.N. Manjikian and T. S. Abdelrahman, Fusion of Loops for Parallelism and Locality, International Conference on Parallel Processing, pp. II:19–28, Oconomowoc, WI, August 1995.Google Scholar
- 16.S. Singhai and K. MacKinley, Loop Fusion for Data Locality and Parallelism, Mid-Atlantic Student Workshop on Programming Languages and Systems, SUNY at New Paltz, April 1996.Google Scholar