Journal of Computer Science and Technology

, Volume 34, Issue 2, pp 456–475 | Cite as

Revisiting the Parallel Strategy for DOACROSS Loops

  • Song Liu
  • Yuan-Zhen Cui
  • Nian-Jun Zou
  • Wen-Hao Zhu
  • Dong Zhang
  • Wei-Guo WuEmail author
Regular Paper


DOACROSS loops are significant parts in many important scientific and engineering applications, which are generally exploited pipeline/wave-front parallelism by loop transformations. However, previous work almost statically performs iterations in parallel threads, thus causing a waste of computing resources in thread synchronization. This paper proposes a brand-new parallel strategy for DOACROSS loops that provides a dynamic task assignment with reduced dependences to achieve wave-front parallelism through loop tiling. The proposed strategy uses a master-slave parallel mode and some customized structures to realize dynamic and flexible parallelization, which effectively avoids threads from waiting in communication. An efficient tile size selection (TSS) approach is also proposed to preserve data reuse in cache for tiled codes. The experimental results show that the proposed parallel strategy obtains good and stable speedups over six typical benchmarks with different problem sizes and different numbers of threads on an Intel® Xeon® 32-core server. And it outperforms two static strategies, a barrier-based strategy and a post/wait-based strategy, by 32% and 20% in average performance, respectively. This strategy also yields a better performance than a mutex-based dynamic strategy. Besides, it has been demonstrated that the proposed TSS approach can achieve a near-optimal performance and is comparable with a state-of-the-art TSS approach.


DOACROSS loop wave-front parallelism tile size selection dynamic task assignment synchronization optimization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2019_1919_MOESM1_ESM.pdf (134 kb)
ESM 1 (PDF 134 kb)


  1. [1]
    Cytron R. DOACROSS: Beyond vectorization for multiprocessors. In Proc. the 15th Int. Conf. Parallel Processing, August 1986, pp.836-844.Google Scholar
  2. [2]
    Hackbusch W. Iterative Solution of Large Sparse Systems of Equations (2nd edition). Springer, 2016.Google Scholar
  3. [3]
    Quarteroni A, Valli A. Numerical Approximation of Partial Differential Equations (1st edition). Springer, 1994.Google Scholar
  4. [4]
    Versteeg H K, Malalasekera W. An Introduction to Computational Fluid Dynamics: The Finite Volume Method. London: Longman Scientific and Technical, 1995.Google Scholar
  5. [5]
    Midkiff S, Padua D. Compiler algorithms for synchronization. IEEE Trans. Computers, 1987, C-36(12): 1485-1495.CrossRefGoogle Scholar
  6. [6]
    Wolfe M. Multiprocessor synchronization for concurrent loops. Software IEEE, 1988, 5(1): 34-42.CrossRefGoogle Scholar
  7. [7]
    Su H M, Yew P. On data synchronization for multiprocessors. In Proc. the 16th Annual Int. Symp. Computer Architecture, May 1989, pp.416-423.Google Scholar
  8. [8]
    Chen D, Torrellas J, Yew P. An efficient algorithm for the run-time parallelization of DOACROSS loops. In Proc. ACM/IEEE Supercomputing, November 1994, pp.518-527.Google Scholar
  9. [9]
    Xue J. Loop Tiling for Parallelism. Springer, 2000.Google Scholar
  10. [10]
    Wolf M, Lam S. A data locality optimizing algorithm. In Proc. the 12th ACM SIGPLAN Conf. Programming Language Design and Implementation, June 1991, pp.30-44.Google Scholar
  11. [11]
    Baghdadi R, Cohen A, Verdoolaege S, Trifunović K. Improved loop tiling based on the removal of spurious false dependences. ACM Trans. Architecture and Code Optimization, 2013, 9(4): Article No. 52.Google Scholar
  12. [12]
    Wonnacott D, Strout M. On the scalability of loop tiling techniques. In Proc. the 3rd Int. Workshop on Polyhedral Compilation Techniques, January 2013, pp.3-11.Google Scholar
  13. [13]
    Bondhugula U, Baskaran M, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In Proc. the 17th Int. Conf. Compiler Construction, March 2008, pp.132-146.Google Scholar
  14. [14]
    Unnikrishnan P, Shirako J, Barton K, Chatterjee S, Silvera R, Sarkar V. A practical approach to DOACROSS parallelization. In Proc. the 18th Int. Conf. Parallel Processing, August 2012, pp.219-231.Google Scholar
  15. [15]
    Krothapalli V P, Sadayappan P. Removal of redundant dependences in DOACROSS loops with constant dependences. IEEE Trans. Parallel and Distributed Systems, 1991, 2(3): 281-289.CrossRefGoogle Scholar
  16. [16]
    Rajamony R, Cox A L. Optimally synchronizing DOACROSS loops on shared memory multiprocessors. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, November 1997, pp.214-224.Google Scholar
  17. [17]
    Chen D, Yew P. Statement re-ordering for DO-ACROSS loops. In Proc. Int. Conf. Parallel Processing, August 1994, pp.24-28.Google Scholar
  18. [18]
    Chen D, Yew P. On effective execution of nonuniform DOACROSS loops. IEEE Trans. Parallel and Distributed Systems, 1996, 7(5): 463-476.CrossRefGoogle Scholar
  19. [19]
    Chen D, Yew P. Redundant synchronization elimination for DOACROSS loops. In Proc. the 8th Int. Parallel Processing Symp., April 1994, pp.477-481.Google Scholar
  20. [20]
    Kwok Y K, Ahmad I. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors. IEEE Trans. Parallel and Distributed Systems, 1996, 7(5): 506-521.CrossRefGoogle Scholar
  21. [21]
    Chase D, Lev Y. Dynamic circular work-stealing deque. In Proc. the 17th Annual ACM Symp. Parallelism in Algorithms and Architectures, July 2005, pp.21-28.Google Scholar
  22. [22]
    Guo Y, Zhao J, Cave V, Sarkar V. SLAW: A scalable locality-aware adaptive work-stealing scheduler for multicore systems. In Proc. the 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, January 2010, pp.341-342.Google Scholar
  23. [23]
    Cui Y, Liu S, Zou N, Wu W. A dynamic parallel strategy for DOACROSS loops. In Proc. Int. Conf. High Performance Computing in Asia-Pacific Region, January 2018, pp.108-115.Google Scholar
  24. [24]
    Renganarayanan L, Kim D, Strout M M, Rajopadhye S. Parameterized loop tiling. ACM Trans. Programming Languages and Systems, 2012, 34(1): Article No. 3.Google Scholar
  25. [25]
    Chame J, Moon S. A tile selection algorithm for data locality and cache interference. In Proc. the 13th Int. Conf. Supercomputing, June 1999, pp.492-499.Google Scholar
  26. [26]
    Fraguela B B, Carmueja M G, Andrade D. Optimal tile size selection guided by analytical models. In Proc. Int. Conf. Parallel Computing, September 2005, pp.565-572.Google Scholar
  27. [27]
    Yuki T, Renganarayanan L, Rajopadhye S, Anderson C, Eichenberger A E, O’Brien K. Automatic creation of tile size selection models. In Proc. the 8th Annual IEEE/ACM Int. Symp. Code Generation and Optimization, April 2010, pp.190-199.Google Scholar
  28. [28]
    Mehta S, Beeraka G, Yew P. Tile size selection revisited. ACM Trans. Architecture and Code Optimization, 2013, 10(4): Article No. 35.Google Scholar
  29. [29]
    Mehta S, Garg R, Trivedi N, Yew P. Turbo tiling: Leveraging prefetching to boost performance of tiled codes. In Proc. the 30th Int. Conf. Supercomputing, June 2016, Article No. 38.Google Scholar
  30. [30]
    Rivera G, Tseng C W. Tiling optimizations for 3D scientific computations. In Proc. ACM/IEEE Conf. Supercomputing, November 2000, Article No. 32.Google Scholar
  31. [31]
    Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F, Zhao W, Yin X, Hou C, Zhang C, Ge W, Zhang J, Wang Y, Zhou C, Yang G. The Sunway TaihuLight supercomputer: System and applications. Science China Information Sciences, 2016, 59(7): 072001.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC & Science Press, China 2019

Authors and Affiliations

  • Song Liu
    • 1
  • Yuan-Zhen Cui
    • 1
  • Nian-Jun Zou
    • 1
  • Wen-Hao Zhu
    • 2
  • Dong Zhang
    • 3
    • 4
  • Wei-Guo Wu
    • 1
    Email author
  1. 1.School of Electronic Information and EngineeringXi’an Jiaotong UniversityXi’anChina
  2. 2.School of Computer Engineering and ScienceShanghai UniversityShanghaiChina
  3. 3.Xi’an Research Institute of Surveying and MappingXi’anChina
  4. 4.State Key Laboratory of Geo-Information EngineeringXi’anChina

Personalised recommendations