Skip to main content

Hardware-Aware Compilation

  • Reference work entry
  • First Online:
Handbook of Hardware/Software Codesign
  • 3151 Accesses

Abstract

Hardware-aware compilers are in high demand for embedded systems with stringent multidimensional design constraints on cost, power, performance, etc. By making use of the microarchitectural information about a processor, a hardware-aware compiler can generate more efficient code than a generic compiler while meeting the design constraints, by exploiting those highly customized microarchitectural features. In this chapter, we introduce two applications of hardware-aware compilers: a hardware-aware compiler can be used as a production compiler and as a tool to efficiently explore the design space of embedded processors. We demonstrate the first application with a compiler that generates efficient code for embedded processors that do not have any branch predictor to reduce branch penalties. To demonstrate the second application, we show how a hardware-aware compiler can be used to explore the Design Space of the bypass designs in the processor. In both the cases, the hardware-aware compiler can generate better code than a hardware-ignorant compiler.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 699.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ADL:

Architecture Description Language

BRF:

Bypass Register File

BTB:

Branch Target Buffer

CFG:

Control-Flow Graph

CIL:

Compiler-In-the-Loop

DSE:

Design Space Exploration

HPC:

Horizontally Partitioned Cache

ISA:

Instruction-Set Architecture

MAC:

Multiply-Accumulator

OT:

Operation Table

RT:

Response Time

SPU:

Synergistic Processor Unit

References

  1. Bala V, Rubin N (1995) Efficient instruction scheduling using finite state automata. In: Proceedings of the 28th annual international symposium on microarchitecture, pp 46–56. doi:10.1109/MICRO.1995.476812

  2. Ball T, Larus JR (1993) Branch prediction for free. In: Proceedings of PLDI. ACM, New York, pp 300–313. doi:10.1145/155090.155119

    Google Scholar 

  3. Chen T, Raghavan R, Dale JN, Iwata E (2007) Cell broadband engine architecture and its first implementation – a performance view. IBM J Res Dev 51(5):559–572. doi:10.1147/rd.515.0559

    Article  Google Scholar 

  4. Dual-Core Intel Itanium Processor 9000 and 9100 Series (2007). http://download.intel.com/design/itanium/downloads/314054.pdf

  5. Flachs et al B (2006) The microarchitecture of the synergistic processor for a cell processor. IEEE Solid-State Circuits 41(1):63–70

    Google Scholar 

  6. Fog A (2008) The microarchitecture of Intel and AMD CPUs

    Google Scholar 

  7. GNU Toolchain 4.1.1 and GDB for the Cell BE’s PPU/SPU. http://www.bsc.es/plantillaH.php?cat_id=304

  8. Grun P, Dutt N, Nicolau A Memory aware compilation through accurate timing extraction. In: Proceedings of the 37th annual design automation conference, DAC’00. ACM, New York, pp 316–321 (2000). doi:10.1145/337292.337428

  9. Grun P, Dutt N, Nicolau A (2000) MIST: an algorithm for memory miss traffic management. In: IEEE/ACM international conference on computer aided design, ICCAD-2000, pp 431–437. doi:10.1109/ICCAD.2000.896510

  10. Grun P, Halambi A, Dutt N, Nicolau A (2003) RTGEN-an algorithm for automatic generation of reservation tables from architectural descriptions. IEEE Trans Very Large Scale Integr (VLSI) Syst 11(4):731–737. doi:10.1109/TVLSI.2003.813011

    Article  Google Scholar 

  11. Halambi A, Grun P, Ganesh V, Khare A, Dutt N, Nicolau A (1999) EXPRESSION: a language for architecture exploration through compiler/simulator retargetability. In: Design, automation and test in Europe conference and exhibition 1999. Proceedings, pp 485–490. doi:10.1109/DATE.1999.761170

  12. Hoffmann A, Schliebusch O, Nohl A, Braun G, Wahlen O, Meyr H (2001) A methodology for the design of application specific instruction set processors (ASIP) using the machine description language LISA. In: Proceedings of the 2001 IEEE/ACM international conference on computer-aided design, ICCAD’01. IEEE Press, Piscataway, pp 625–630

    Google Scholar 

  13. https://gcc.gnu.org/ (2007)

  14. IBM: Cell Broadband Engine Programming Handbook including PowerXCell 8i. https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/7A77CCDF14FE70D5852575CA0074E8ED

  15. Intel Corporation. Intel XScale(R) Core: Developer’s Manual. http://www.intel.com/design/iio/manuals/273411.htm

  16. Keutzer K, Malik S, Newton A (2002) From ASIC to ASIP: the next design discontinuity. In: IEEE international conference on computer design: VLSI in computers and processors, 2002. Proceedings, pp 84–90. doi:10.1109/ICCD.2002.1106752

  17. Kondo M, Kobyashi H, Sakamoto R, Wada M, Tsukamoto J, Namiki M, Wang W, Amano H, Matsunaga K, Kudo M, Usami K, Komoda T, Nakamura H (2014) Design and evaluation of fine-grained power-gating for embedded microprocessors. In: Design, automation and test in Europe conference and exhibition (DATE), pp 1–6. doi:10.7873/DATE.2014.158

  18. Kongetira P, Aingaran K, Olukotun K (2005) Niagara: a 32-way multithreaded sparc processor. IEEE Micro 25(2):21–29. doi:10.1109/MM.2005.35

    Article  Google Scholar 

  19. Lattner C (2002) LLVM: an infrastructure for multi-stage optimization. Master’s thesis, Computer Science Department, University of Illinois at Urbana-Champaign, Urbana. See http://llvm.cs.uiuc.edu

  20. Leupers R (2000) Code generation for embedded processors. In: The 13th international symposium on system synthesis, 2000. Proceedings, pp 173–178. doi:10.1109/ISSS.2000.874046

  21. Lowney PG, Freudenberger SM, Karzes TJ, Lichtenstein WD, Nix RP, O’Donnell JS, Ruttenberg JC (1993) The multiflow trace scheduling compiler. J Supercomput 7:51–142

    Article  Google Scholar 

  22. Lu J, Kim Y, Shrivastava A, Huang C (2011) Branch penalty reduction on IBM cell SPUs via software branch hinting. In: Proceedings of CODES+ISSS, pp 355–364

    Google Scholar 

  23. Muchnick SS (1997) Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  24. Park D, Lee J, Kim NS, Kim T (2010) Optimal algorithm for profile-based power gating: a compiler technique for reducing leakage on execution units in microprocessors. In: 2010 IEEE/ACM international conference on computer-aided design (ICCAD), pp 361–364. doi:10.1109/ICCAD.2010.5653652

  25. Patterson D, Anderson T, Cardwell N, Fromm R, Keeton K, Kozyrakis C, Thomas R, Yelick K (1997) A case for intelligent RAM. IEEE Micro 17(2):34–44. doi:10.1109/40.592312

    Article  Google Scholar 

  26. Proebsting TA, Fraser CW (1994) Detecting pipeline structural hazards quickly. In: Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL’94. ACM, New York, pp 280–286. doi:10.1145/174675.177904

    Google Scholar 

  27. Roy S, Katkoori S, Ranganathan N (2007) A compiler based leakage reduction technique by power-gating functional units in embedded microprocessors. In: 20th international conference on VLSI Design, 2007. Held jointly with 6th international conference on embedded systems, pp 215–220. doi:10.1109/VLSID.2007.10

  28. Shrivastava A (2006) Compiler-in-loop exploration of programmable embedded systems. Ph.D. thesis, Donald Bren School of Information and Computer Sciences

    Google Scholar 

  29. Shrivastava A, Issenin I, Dutt N (2005) Compilation techniques for energy reduction in horizontally partitioned cache architectures. In: Proceedings of the 2005 international conference on compilers, architectures and synthesis for embedded systems, CASES’05. ACM, New York, pp 90–96. doi:10.1145/1086297.1086310

    Chapter  Google Scholar 

  30. Siska C (1998) A processor desription language supporting retargetable multi-pipeline DSP program development tools. In: Proceedings of the 11th international symposium on system synthesis, ISSS’98. IEEE Computer Society, Washington, DC, pp 31–36

    Google Scholar 

  31. Trimaran. http://www.trimaran.org/

  32. Wagner TA, Maverick V, Graham SL, Harrison MA (1994) Accurate static estimators for program optimization. In: Proceedings of the ACM SIGPLAN 1994 conference on programming language design and implementation, PLDI’94. ACM, New York, pp 85–96. doi:10.1145/178243.178251

    Chapter  Google Scholar 

  33. Wu Y, Larus JR (1994) Static branch frequency and program profile analysis. In: Proceedings of the 27th annual international symposium on Microarchitecture. ACM, New York, pp 1–11. doi:10.1145/192724.192725

    Google Scholar 

  34. Zivojnovic V, Pees S, Meyr H (1996) LISA-machine description language and generic machine model for HW/SW co-design. In: Workshop on VLSI signal processing, IX, pp 127–136. doi:10.1109/VLSISP.1996.558311

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aviral Shrivastava .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this entry

Cite this entry

Shrivastava, A., Cai, J. (2017). Hardware-Aware Compilation. In: Ha, S., Teich, J. (eds) Handbook of Hardware/Software Codesign. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7267-9_26

Download citation

Publish with us

Policies and ethics