Abstract
This paper proposes a novel optimization framework for the Data-Flow Graph Language (DFGL), a dependence-based notation for macro-dataflow model which can be used as an embedded domain-specific language. Our optimization framework follows a “dependence-first” approach in capturing the semantics of DFGL programs in polyhedral representations, as opposed to the standard polyhedral approach of deriving dependences from access functions and schedules. As a first step, our proposed framework performs two important legality checks on an input DFGL program — checking for potential violations of the single-assignment rule, and checking for potential deadlocks. After these legality checks are performed, the DFGL dependence information is used in lieu of standard polyhedral dependences to enable polyhedral transformations and code generation, which include automatic loop transformations, tiling, and code generation of parallel loops with coarse-grain (fork-join) and fine-grain (doacross) synchronizations. Our performance experiments with nine benchmarks on Intel Xeon and IBM Power7 multicore processors show that the DFGL versions optimized by our proposed framework can deliver up to 6.9\(\times \) performance improvement relative to standard OpenMP versions of these benchmarks. To the best of our knowledge, this is the first system to encode explicit macro-dataflow parallelism in polyhedral representations so as to provide programmers with an easy-to-use DSL notation with legality checks, while taking full advantage of the optimization functionality in state-of-the-art polyhedral frameworks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Step I/O may comprise a list of items, and item keys may include range expressions.
- 2.
A typical case is env step to create set of step instances where tag is a range.
- 3.
In future work, we may consider the possibility of not treating this case as an error condition by assuming that each data item that is not performed in the DFGL region has a initializing write that is instead performed by the environment.
- 4.
MKL is the best tuned library for Intel platforms. We compare against Sequential and Parallel MKL.
- 5.
On POWER7 we use ATLAS — the sequential library — as MKL cannot run on POWER7, and a parallel library was not available.
References
Hydrodynamics Challenge Problem, Lawrence Livermore National Laboratory. Technical report LLNL-TR-490254
The PACE compiler project. http://pace.rice.edu
The Swarm Framework. http://swarmframework.org/
Building an open community runtime (OCR) framework for exascale systems, supercomputing 2012 Birds-of-a-feather session, November 2012
Ackerman, W., Dennis, J.: VAL - A Value Oriented Algorithmic Language. Technical report TR-218, MIT Laboratory for Computer Science, June 1979
Agrawal, K., et al.: Executing task graphs using work-stealing. In: IPDPS (2010)
Arvind., Dertouzos, M., Nikhil, R., Papadopoulos, G.: Project Dataflow: A parallel computing system based on the Monsoon architecture and the Id programming language. Technical report, MIT Lab for Computer Science, computation Structures Group Memo 285, March 1988
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT, pp. 7–16 (2004)
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC (2012)
Bhaskaracharya, S.G., Bondhugula, U.: PolyGLoT: a polyhedral loop transformation framework for a graphical dataflow language. In: Jhala, R., De Bosschere, K. (eds.) Compiler Construction. LNCS, vol. 7791, pp. 123–143. Springer, Heidelberg (2013)
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI (2008)
Budimlić, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D., Sarkar, V., Schlimbach, F., Taşirlar, S.: Concurrent collections. Sci. Program. 18, 203–217 (2010)
Chandramowlishwaran, A., Knobe, K., Vuduc, R.: Performance evaluation of concurrent collections on high-performance multicore computing systems. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12, April 2010
Chatarasi, P., Shirako, J., Sarkar, V.: Polyhedral optimizations of explicitly parallel programs. In: Proceedings of PACT 2015 (2015)
Chatterjee, S., Tasrlar, S., Budimlic, Z., Cave, V., Chabbi, M., Grossman, M., Sarkar, V., Yan, Y.: Integrating asynchronous task parallelism with MPI. In: IPDPS (2013)
Collard, J.-F., Griebl, M.: Array dataflow analysis for explicitly parallel programs. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 406–416. Springer, Heidelberg (1996)
Cytron, R.: Doacross: beyond vectorization for multiprocessors. In: ICPP 1986, pp. 836–844 (1986)
Feautrier, P.: Some efficient solutions to the affine scheduling problem, part II: multidimensional time. Int. J. Parallel Program. 21(6), 389–420 (1992)
Feautrier, P., Lengauer, C.: The polyhedron model. In: Encyclopedia of Parallel Programming (2011)
Hong, S., Salihoglu, S., Widom, J., Olukotun, K.: Simplifying scalable graph processing with a domain-specific language. In: CGO (2014)
IntelCorporation: Intel (R) Concurrent Collections for C/C++. http://softwarecommunity.intel.com/articles/eng/3862.htm
Karlin, I., et al.: Lulesh programming model and performance ports overview. Techical report. LLNL-TR-608824, December 2012
Kong, M., Pop, A., Pouchet, L.N., Govindarajan, R., Cohen, A., Sadayappan, P.: Compiler/runtime framework for dynamic dataflow parallelization of tiled programs. ACM Trans. Archit. Code Optim. (TACO) 11(4), 61 (2015)
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978). http://doi.acm.org/10.1145/359545.359563
Pouchet, L.-N.: The Polyhedral Benchmark Suite. http://polybench.sourceforge.net
Lu, Q., Bondhugula, U., Henretty, T., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P., Chen, Y., Lin, H., Fook Ngai, T.: Data layout transformation for enhancing data locality on NUCA chip multiprocessors. In: PACT (2009)
McGraw, J.: SISAL - Streams and Iteration in a Single-Assignment Language - Version 1.0. Lawrence Livermore National Laboratory, July 1983
OpenMP Technical Report 3 on OpenMP 4.0 enhancements. http://openmp.org/TR3.pdf
Sarkar, V., Harrod, W., Snavely, A.E.: Software Challenges in Extreme Scale Systems, special Issue on Advanced Computing: The Roadmap to Exascale, January 2010
Sarkar, V., Hennessy, J.: Partitioning parallel programs for macro-dataflow. In: ACM Conference on LISP and Functional Programming, pp. 202–211, August 1986
Sbirlea, A., Pouchet, L.N., Sarkar, V.: DFGR: an intermediate graph representation for macro-dataflow programs. In: Fourth International Workshop on Data-Flow Modelsfor Extreme Scale Computing (DFM 2014), August 2014
Sbîrlea, A., Zou, Y., Budimlić, Z., Cong, J., Sarkar, V.: Mapping a data-flow programming model onto heterogeneous platforms. In: LCTES (2012)
Shirako, J., Pouchet, L.N., Sarkar, V.: Oil and water can mix: an integration of polyhedral and AST-based transformations. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2014 (2014)
Shirako, J., Unnikrishnan, P., Chatterjee, S., Li, K., Sarkar, V.: Expressing DOACROSS loop dependencies in OpenMP. In: 9th International Workshop on OpenMP (IWOMP) (2011)
Stavrou, K., Nikolaides, M., Pavlou, D., Arandi, S., Evripidou, P., Trancoso, P.: TFlux: a portable platform for data-driven multithreading on commodity multicore systems. In: ICPP (2008)
The STE—AR Group: HPX, a C++ runtime system for parallel and distributed applications of any scale. http://stellar.cct.lsu.edu/tag/hpx
UCLA, Rice, OSU, UCSB: Center for Domain-Specific Computing (CDSC). http://cdsc.ucla.edu
Unnikrishnan, P., Shirako, J., Barton, K., Chatterjee, S., Silvera, R., Sarkar, V.: A practical approach to DOACROSS parallelization. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 219–231. Springer, Heidelberg (2012)
Vrvilo, N.: Asynchronous Checkpoint/Restart for the Concurrent Collections Model. MS thesis, Rice University (2014). https://habanero.rice.edu/vrvilo-ms
Wonnacott, D.G.: Constraint-based Array Dependence Analysis. Ph.D. thesis, College Park, MD, USA, uMI Order No. GAX96-22167 (1995)
Yuki, T., Feautrier, P., Rajopadhye, S., Saraswat, V.: Array dataflow analysis for polyhedral X10 programs. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2007 (2013)
Yuki, T., Gupta, G., Kim, D.G., Pathan, T., Rajopadhye, S.: AlphaZ: a system for design space exploration in the polyhedral model. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 17–31. Springer, Heidelberg (2013)
Acknowledgments
This work was supported in part by the National Science Foundation through awards 0926127 and 1321147.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Sbîrlea, A., Shirako, J., Pouchet, LN., Sarkar, V. (2016). Polyhedral Optimizations for a Data-Flow Graph Language. In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-29778-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29777-4
Online ISBN: 978-3-319-29778-1
eBook Packages: Computer ScienceComputer Science (R0)