Abstract
Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler which processes data intensive applications written in a dialect of Java and compiles them for efficient execution on cluster of workstations or distributed memory machines.
In this paper, we focus on data intensive applications with two important properties: 1) data elements have spatial coordinates associated with them and the distribution of the data is not regular with respect to these coordinates, and 2) the application processes only a subset of the available data on the basis of spatial coordinates. These applications arise in many domains like satellite data-processing and medical imaging. We present a general compilation and execution strategy for this class of applications which achieves high locality in disk accesses. We then present a technique for hoisting conditionals which further improves efficiency in execution of such compiled codes.
Our preliminary experimental results showtha t the performance from our proposed execution strategy is nearly two orders of magnitude better than a naive strategy. Further, up to 30% improvement in performance is observed by applying the technique for hoisting conditionals.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work was supported by NSF grant ACR-9982087, NSF CAREER award ACI- 9733520, and NSF grant CCR-9808522.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman, Joel Saltz, Alan Sussman, and Hubert Tsang. Digital dynamic telepathology-the Virtual Microscope. In Proceedings of the 1998 AMIA Annual Fall Symposium. American Medical Informatics Association, November 1998.
Gagan Agrawal, Renato Ferreira, Joel Saltz, and Ruoming Jin. High-level programming methodologies for data intensive computing. In Proceedings of the Fifth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, May 2000.
Gagan Agrawal, Renato Ferriera, and Joel Saltz. Language extensions and compilation techniques for data intensive computations. In Proceedings of Workshop on Compilers for Parallel Computing, January 2000.
W. Blume and R. Eigenmann. Demand-driven, symbolic range propagation. Proceedings of the 8th Workshop on Languages and Compilers for Parallel Computing, pages 141–160, August 1995.
Rastislav Bodik, Rajiv Gupta, and Mary Lou Soffa. Interprocedural conditional branch elimination. In Proceedings of the SIGPLAN’ 97 Conference on Programming Language Design and Implementation, pages 146–158. ACM Press, June 1997.
R. Bordawekar, A. Choudhary, K. Kennedy, C. Koelbel, and M. Paleczny. A model and compilation strategy for out-of-core data parallel programs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pages 1–10. ACM Press, July 1995. ACM SIGPLAN Notices, Vol. 30, No. 8.
C. Chang, A. Acharya, A. Sussman, and J. Saltz. T2: A customizable parallel database for multi-dimensional data. ACM SIGMOD Record, 27(1):58–66, March 1998.
Chialin Chang, Renato Ferreira, Alan Sussman, and Joel Saltz. Infrastructure for building parallel database systems for multi-dimensional data. In Proceedings of the Second Merged IPPS/SPDP (13th International Parallel Processing Symposium’ 10th Symposium on Parallel and Distributed Processing). IEEE Computer Society Press, April 1999.
Chialin Chang, Bongki Moon, Anurag Acharya, Carter Shock, Alan Sussman, and Joel Saltz. Titan: A high performance remote-sensing database. In Proceedings of the 1997 International Conference on Data Engineering, pages 375–384. IEEE Computer Society Press, April 1997.
Chialin Chang, Alan Sussman, and Joel Saltz. Scheduling in a high performance remote-sensing data server. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1997.
A.A. Chien and W.J. Dally. Concurrent aggregates (CA). In Proceedings of the Second ACM SIGPLAN Symposium on Principles’ Practice of Parallel Programming (PPOPP), pages 187–196. ACM Press, March 1990.
Renato Ferriera, Gagan Agrawal, and Joel Saltz. Compiling object-oriented data intensive computations. In Proceedings of the 2000 International Conference on Supercomputing, May 2000.
M. Gupta, S. Mukhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. In Proceedings of Conference on Parallel Architectures and Compilation Techniques (PACT), October 1999.
High Performance Fortran Forum. Hpf language specification, version 2.0. Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz, January 1997.
M. Kandemir, J. Ramanujam, and A. Choudhary. Improving the performance of out-of-core computations. In Proceedings of International Conference on Parallel Processing, August 1997.
Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali. Data-centric multilevel blocking. In Proceedings of the SIGPLAN’ 97 Conference on Programming Language Design and Implementation, pages 346–357, June 1997.
Tahsin M. Kurc, Alan Sussman, and Joel Saltz. Coupling multiple simulations via a high performance customizable database system. In Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.
E. Morel and C. Renvoise. Global optimization by suppression of partial redundancies. Communications of the ACM, 22(2):96–103, February 1979.
Todd C. Mowry, Angela K. Demke, and Orran Krieger. Automatic compiler-inserted i/o prefetching for out-of-core applications. In Proceedings of the Second Symposium on Operating Systems Design and plementation (OSDI’ 96), Nov 1996.
Frank Mueller and David B. Whalley. Avoiding conditional branches by code replication. In Proceedings of the ACM SIGPLAN’95 Conference on Programming Language Design and Implementation (PLDI), pages 56–66, La Jolla, California, 18-21 June 1995. SIGPLAN Notices 30(6), June 1995.
NASA Goddard Distributed Active Archive Center (DAAC). Advanced Very High Resolution Radiometer Global Area Coverage (AVHRR GAC) data. http://daac.gsfc.nasa.gov/CAMPAIGN DOCS/ LAND BIO/origins.html.
M. Paleczny, K. Kennedy, and C. Koelbel. Compiler support for out-of-core arrays on parallel machines. In Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation, pages 110–118. IEEE Computer Society Press, February 1995.
John P levyak and Andrew A. Chien. Precise concrete type inference for object-oriented languages. In Ninth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’ 94), pages 324–340, October 1994.
F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3(3):121–189, September 1995.
Peng Tu and David Padua. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In Proceedings of the 1995 International Conference on Supercomputing, pages 414–423, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferreira, R., Agrawal, G., Jin, R., Saltz, J. (2001). Compiling Data Intensive Applications with Spatial Coordinates. In: Midkiff, S.P., et al. Languages and Compilers for Parallel Computing. LCPC 2000. Lecture Notes in Computer Science, vol 2017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45574-4_22
Download citation
DOI: https://doi.org/10.1007/3-540-45574-4_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42862-6
Online ISBN: 978-3-540-45574-5
eBook Packages: Springer Book Archive