Advertisement

Pipeline Optimization for Loops on Reconfigurable Platform

  • Qi Guo
  • Chao Wang
  • Xuehai Zhou
  • Xi Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7806)

Abstract

Pipelining is an effective technique to improve the performance of a loop by overlapping the execution of several iterations, particularly on the reconfigurable platform, which is more coarse-grained. In this paper, we use reconfigurable platform to accelerate loop based applications by reconstructing the pipeline structure during the execution of application. Based on this concept, the optimized strategies such as duplexing and splitting of function unit are applied from instruction level to task level. First, a loop is abstracted as a weighted data flow graph (WDFG), where nodes represent tasks while edges stand for inter-task dependencies. The weights of nodes and edges indicate task execution times and communication overheads respectively. Based on the abstraction, we propose an algorithm which automatically maps the pipelined loops onto reconfigurable hardware and select whether the duplexing or splitting is more appropriate. The algorithm is based on profiling information of WDFG, such as execution times and communication overheads. Then several test cases from EEMBC benchmark are selected to evaluate our approach. The evaluation is demonstrated in two ways. First, we operate some software simulations to appraise the effectiveness of the algorithms. Second, a prototype system is implemented on state-of-the-art FPGA board to evaluate the practicability of our approach on reconfigurable platform. Performance indicators of pipeline such as speedup, throughput and efficiency are measured in both ways. Moreover, in software simulation, the speedup and throughput rate of optimized pipeline achieved to 2 times at least and the efficiency increased by 1.1-1.5 times, whilst in hardware platform, the speedup and efficiency increase by 1.5 times due to the communication cost and reconfiguration delay, the throughput rate also increases by 1.5 to 2 times. Experimental results demonstrate that our approach can achieve satisfactory performance both on effectiveness and practicality.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Qi Guo
    • 1
  • Chao Wang
    • 1
  • Xuehai Zhou
    • 2
  • Xi Li
    • 2
  1. 1.Suzhou Institute of Advance StudyUSTCSuzhouChina
  2. 2.University of Science and Technology of ChinaHefeiChina

Personalised recommendations