Abstract
Very Long Instruction Word (VLIW) processors represent an attractive solution for embedded computing, offering significant computational power with reduced hardware complexity. However, they impose higher compiler complexity since the instructions are executed in parallel based on the static compiler schedule. Therefore, finding a promising set of compiler transformations and defining their effects have a significant impact on the overall system performance. In this chapter, we provide a methodology with an integrated framework to automatically (i) generate optimized application-specific VLIW architectural configurations and (ii) analyze compiler level transformations, enabling application-specific compiler tuning over customized VLIW system architectures. We based the analysis on a Design of Experiments (DoEs) procedure that statistically captures the higher order effects among different sets of activated compiler transformations. Applying the proposed methodology onto real-case embedded application scenarios, we show that (i) only a limited set of compiler transformations exposes high confidence level (over 95%) in affecting the performance and (ii) using them we could be able to achieve gains between 16–23% in comparison to the default optimization levels. In the next chapters, we go deeper in building machine learning models to tackle the problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
LLVM projected supported its C source-to-source compiler frontend till v2.8.
- 2.
Since the distributions are built based on empirical/experimental data, the distribution is considered in general non-parametric.
References
Fisher JA, Faraboschi P, Young C (2009) VLIW processors: once blue sky, now commonplace. IEEE Solid-State Circuits Mag 1(2):10–17
Fisher JA, Faraboschi P, Young C (2004) Embedded computing: a VLIW approach to architecture, compilers and tools. Morgan Kaufmann, Burlington, MA
Ascia G, Catania V, Palesi M, Patti D (2005) A system-level framework for evaluating area/performance/power trade-offs of vliw-based embedded systems. Design automation conference. In: Proceedings of the ASP-DAC 2005. Asia and South Pacific, vol 2., pp 940–943
Fisher JA (1981) Trace scheduling: a technique for global microcode compaction. IEEE Trans Comput 30(7):478–490
Hwu WMW, Mahlke SA, Chen WY, Chang PP, Warter NJ, Bringmann RA, Ouellette RG, Hank RE, Kiyohara T, Haab GE et al (1993) The superblock: an effective technique for VLIW and superscalar compilation. J Supercomput 7(1–2):229–248
Quinlan D (2000) Rose: compiler support for object-oriented frameworks. Parallel Process Lett 10:215–226
Fenacci D, Franke B, Thomson J (2010) Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining. In: Proceedings of the 13th international workshop on software & compilers for embedded systems, p 5. ACM
Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76
The LLVM website (2013). http://www.llvm.org/
Faraboschi P, Homewood F (2000) ST200: a VLIW architecture for media-oriented applications. In: Microprocessor Forum. San Jose, CA
Saptono D, Brost V, Yang F, Prasetyo E (2008) Design space exploration for a custom VLIW architecture: direct photo printer hardware setting using VEX compiler. In: Proceedings of the 2008 IEEE international conference on signal image technology and internet based systems, SITIS ’08, pp 416–421, Washington, DC, USA. IEEE Computer Society
Wong S, Van As T, Brown G (2008) \(\rho \)-vex: a reconfigurable and extensible softcore VLIW processor. In: International conference on ICECE Technology. FPT 2008, pp 369–372. IEEE
Hewlett-packard laboratories. vex toolchain. [online], available. http://www.hpl.hp.com/downloads/vex/
Multicube explorer. http://m3explorer.sourceforge.net/
Zaccaria V, Palermo G, Castro F, Silvano C, Mariani G (2010) Multicube explorer: an open source framework for design space exploration of chip multi-processors. In: 23rd International conference on architecture of computing systems (ARCS), pp 1–7. VDE
R Core Team et al. (2013) R: a language and environment for statistical computing. Vienna, Austria
Palermo G, Silvano C, Valsecchi S, Zaccaria V (2003) A system-level methodology for fast multi-objective design space exploration. In: Proceedings of the 13th ACM Great Lakes symposium on VLSI, pp 92–95. ACM
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: International symposium on microarchitecture. MICRO-42. 42nd Annual IEEE/ACM, pp 469–480. IEEE
Roy RK (2001) Design of experiments using the Taguchi approach: 16 steps to product and process improvement. Wiley, Hoboken
Breslow N (1970) A generalized Kruskal-Wallis test for comparing k samples subject to unequal patterns of censorship. Biometrika 57(3):579–594
Agakov F, Bonilla E, Cavazos J, Franke B, Fursin G, O’Boyle MF, Thomson J, Toussaint M, Williams CK (2006) Using machine learning to focus iterative optimization. In: Proceedings of the international symposium on code generation and optimization. IEEE Computer Society, pp 295–305
Cavazos J, Dubach C, Agakov F (2006) Automatic performance model construction for the fast software exploration of new hardware designs. In: Proceedings of the 2006 international conference on compilers, architecture and synthesis for embedded systems, pp 24–34
Dubach C, Cavazos J, Franke B (2007) Fast compiler optimisation evaluation using code-feature based performance prediction. In: Proceedings of the 4th international conference on computing frontiers, pp 131–142
Thompson B (2002) Statistical, practical, and clinical: how many kinds of significance do counselors need to consider? J Couns Dev 80(1):64–71
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 The Author(s)
About this chapter
Cite this chapter
Ashouri, A.H., Palermo, G., Cavazos, J., Silvano, C. (2018). Design Space Exploration of Compiler Passes: A Co-Exploration Approach for the Embedded Domain. In: Automatic Tuning of Compilers Using Machine Learning. SpringerBriefs in Applied Sciences and Technology(). Springer, Cham. https://doi.org/10.1007/978-3-319-71489-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-71489-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71488-2
Online ISBN: 978-3-319-71489-9
eBook Packages: EngineeringEngineering (R0)