A Practical and Aggressive Loop Fission Technique

  • Bo ZhaoEmail author
  • Yingying Li
  • Lin Han
  • Jie Zhao
  • Wei Gao
  • Rongcai Zhao
  • Ramin Yahyapour
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11338)


Loop fission is an effective loop optimization for exploiting fine-grained parallelism. Currently, loop fission is widely used in existing parallelizing compilers. To fully exploit the optimization, we proposed and implemented a practical and aggressive loop fission technique. First, we present an aggressive dependence graph pruning method to eliminate pseudo dependences caused by the conservativeness of compilers. Second, we introduce a topological sort based loop fission algorithm to distribute loops correctly. Finally, to enhance the performance of the generated programs which have potential of loop fission, we propose an advanced loop fission strategy. We evaluate these techniques and algorithms in the experimental section.


Loop fission Automatic vectorization Compiling optimization 


  1. 1.
    Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/software Approach. Gulf Professional Publishing, Houston (1999)Google Scholar
  2. 2.
    Kirk, D.B., Wen-Mei, W.H.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, Burlington (2016)Google Scholar
  3. 3.
    Kumar, V., et al.: Introduction to Parallel Computing: Design and Analysis of Algorithms, vol. 400. Benjamin/Cummings, Redwood City (1994)zbMATHGoogle Scholar
  4. 4.
    Pugh, W.: The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing. ACM (1991)Google Scholar
  5. 5.
    Luporini, F., et al.: Cross-loop optimization of arithmetic intensity for finite element local assembly. ACM Trans. Archit. Code Optim. (TACO) 11(4), 57 (2015)Google Scholar
  6. 6.
    Kennedy, K., McKinley, K.S.: Optimizing for parallelism and data locality. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM (2014)Google Scholar
  7. 7.
    Allen, J.R., Kennedy, K.: Automatic loop interchange. ACM Sigplan Notices 19(6), 233–246 (1984)CrossRefGoogle Scholar
  8. 8.
    Banerjee, U.: Loop Parallelization. Springer, Heidelberg (2013)zbMATHGoogle Scholar
  9. 9.
  10. 10.
    McFarling, S.: Program optimization for instruction caches. ACM SIGARCH Comput. Archit. News 17(2), 183–191 (1989)CrossRefGoogle Scholar
  11. 11.
    Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: a Dependence-Based Approach, vol. 1. Morgan Kaufmann, San Francisco (2002)Google Scholar
  12. 12.
    Pouchet, L.-N., et al.: Loop transformations: convexity, pruning and optimization. ACM SIGPLAN Notices 46(1), 549–562 (2011)CrossRefGoogle Scholar
  13. 13.
    Kong, M., et al.: When polyhedral transformations meet SIMD code generation. ACM Sigplan Notices. 48(6), 127–138 (2013)CrossRefGoogle Scholar
  14. 14.
    Maleki, S., et al.: An evaluation of vectorizing compilers. In: 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE (2011)Google Scholar
  15. 15.
    Bastoul, C., Cohen, A., Girbal, S., Sharma, S., Temam, O.: Putting polyhedral loop transformations to work. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 209–225. Springer, Heidelberg (2004). Scholar
  16. 16.
    Hoefler, T., Lumsdaine, A., Dongarra, J.: Towards efficient mapreduce using MPI. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) EuroPVM/MPI 2009. LNCS, vol. 5759, pp. 240–249. Springer, Heidelberg (2009). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Bo Zhao
    • 1
    • 2
    Email author
  • Yingying Li
    • 2
  • Lin Han
    • 2
  • Jie Zhao
    • 2
    • 3
  • Wei Gao
    • 2
  • Rongcai Zhao
    • 2
  • Ramin Yahyapour
    • 1
  1. 1.Gesellschaft für wissenschaftliche Datenverarbeitung mbH GöttingenGöttingenGermany
  2. 2.State Key Laboratory of Mathematical Engineering and Advanced ComputingZhengzhouChina
  3. 3.French Institute for Research in Computer Science and AutomationRocquencourtFrance

Personalised recommendations