UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores
Clustered VLIW processors are scalable wide-issue statically scheduled processors. Their design is based on physically partitioning the otherwise shared hardware resources, a design which leads to both high performance and low energy consumption. In traditional clustered VLIW processors, all clusters operate at the same frequency. Heterogeneous clustered VLIW processors however, support dynamic voltage and frequency scaling (DVFS) independently per cluster. Effectively controlling DVFS, to selectively decrease the frequency of clusters with a lot of slack in their schedule, can lead to significant energy savings.
In this paper we propose UCIFF, a new scheduling algorithm for heterogeneous clustered VLIW processors with software DVFS control, that performs cluster assignment, instruction scheduling and fast frequency selection simultaneously, all in a single compiler pass. The proposed algorithm solves the phase ordering problem between frequency selection and scheduling, present in existing algorithms. We compared the quality of the generated code, using both performance and energy-related metrics, against that of the current state-of-the-art and an optimal scheduler. The results show that UCIFF produces better code than the state-of-the-art, very close to the optimal across the mediabench2 benchmarks, while keeping the algorithmic complexity low.
Keywordsclustered VLIW heterogeneous DVFS scheduling phase-ordering
Unable to display preview. Download preview PDF.
- 1.Gcc: Gnu compiler collection, http://gcc.gnu.org
- 2.Aleta, A., Codina, J., González, A., Kaeli, D.: Heterogeneous clustered vliw microarchitectures. In: CGO, pp. 354–366 (2007)Google Scholar
- 3.Baniasadi, A., Moshovos, A.: Asymmetric-frequency clustering: a power-aware back-end for high-performance processors. In: ISLPED, pp. 255–258 (2002)Google Scholar
- 4.Desoli, G.: Instruction assignment for clustered vliw dsp compilers: A new approach. HP laboratories Technical Report HPL (1998)Google Scholar
- 5.Ellis, J.: Bulldog: A compiler for vliw architectures. Technical Report, Yale Univ., New Haven, CT, USA (1985)Google Scholar
- 6.Faraboschi, P., Brown, G., et al.: Lx: a technology platform for customizable vliw embedded processing. In: ISCA, pp. 203–213 (2000)Google Scholar
- 8.Fritts, J., Steiling, F., et al.: Mediabench ii video: expediting the next generation of video systems research. In: Proceedings of SPIE, vol. 5683, p. 79 (2005)Google Scholar
- 9.Kailas, K., Ebcioglu, K., Agrawala, A.: Cars: a new code generation framework for clustered ilp processors. Technical Report UMIACS-TR-2000-55 (2000)Google Scholar
- 10.Kailas, K., Ebcioglu, K., Agrawala, A.: Cars: a new code generation framework for clustered ilp processors. In: HPCA, pp. 133–143 (2001)Google Scholar
- 11.Lee, W., Barua, R., et al.: Space-time scheduling of instruction-level parallelism on a raw machine. In: ASPLOS (1998)Google Scholar
- 13.Muralimanohar, N., et al.: Power efficient resource scaling in partitioned architectures through dynamic heterogeneity. In: ISPASS, pp. 100–111 (2006)Google Scholar
- 14.Ozer, E., et al.: Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures, pp. 308–315 (1998)Google Scholar
- 15.Pechanek, G., Vassiliadis, S.: The ManArray embedded processor architecture. Euromicro 1, 348–355 (2000)Google Scholar