Abstract
General-Purpose Processors (GPPs) and Application-Specific Integrated Circuits (ASICs) are the two extreme choices for computational engines. GPPs offer complete flexibility but are inefficient both in terms of performance and energy. In contrast, ASICs are highly energy-efficient, provide the best performance at the cost of zero flexibility. Application-specific processors or custom processors bridge the gap between these two alternatives by bringing in improved power-performance efficiency within the familiar software programming environment. An application-specific processor architecture augments the base instruction-set architecture with customized instructions that encapsulate the frequently occurring computational patterns within an application. These custom instructions are implemented in hardware enabling performance acceleration and energy benefits. The challenge lies in inventing automated tools that can design an application-specific processor by identifying and implementing custom instructions from the application software specified in high-level programming languages. In this chapter, we present the benefits of application-specific processors, their architecture, automated design flow, and the renewed interests in this class of architectures from energy-efficiency perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ALU:
-
Arithmetic-Logic Unit
- ASIC:
-
Application-Specific Integrated Circuit
- BERET:
-
Bundled Execution of REcurring Traces
- CAD:
-
Computer-Aided Design
- CCA:
-
Configurable Compute Accelerator
- CFG:
-
Control-Flow Graph
- CFU:
-
Custom Functional Unit
- CIS:
-
Custom Instruction-Set
- DFG:
-
Data-Flow Graph
- DISC:
-
Dynamic Instruction-Set Computer
- DSP:
-
Digital Signal Processor
- FPGA:
-
Field-Programmable Gate Array
- GPP:
-
General-Purpose Processor
- GPU:
-
Graphics Processing Unit
- ILP:
-
Integer Linear Program
- IR:
-
Intermediate Representation
- ISA:
-
Instruction-Set Architecture
- ISEF:
-
Stretch Instruction-Set Extension Fabric
- MAC:
-
Multiply-Accumulator
- MIMO:
-
Multiple Input Multiple Output
- MISO:
-
Multiple Input Single Output
- PFU:
-
Programmable Functional Unit
- PRISC:
-
Programmable Instruction-Set Processor
- RAM:
-
Random-Access Memory
- RISC:
-
Reduced Instruction-Set Processor
- RISPP:
-
Rotating Instruction-Set Processing Platform
- SFU:
-
Specialized Functional Unit
- VLIW:
-
Very Long Instruction Word
References
Ahn J, Choi K (2013) Isomorphism-aware identification of custom instructions with i/o serialization. IEEE Trans Comput-Aided Des Integr Circuits Syst 32(1):34–46
Alippi C, Fornaciari W, Pozzi L, Sami M (1999) A dag-based design approach for reconfigurable VLIW processors. In: Proceedings of the conference on design, automation and test in Europe. ACM, p 57
Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the spring joint computer conference, 18–20 Apr 1967. ACM, pp 483–485
Atasu K, Dimond RG, Mencer O, Luk W, Özturan C, Diindar G (2007) Optimizing instruction-set extensible processors under data bandwidth constraints. In: Design, automation & test in Europe conference & exhibition, DATE’07. IEEE, pp 1–6
Atasu K, Luk W, Mencer O, Özturan C, Dündar G (2012) Fish: fast instruction synthesis for custom processors. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(1):52–65
Atasu K, Mencer O, Luk W, Özturan C, Dündar G (2008) Fast custom instruction identification by convex subgraph enumeration. In: International conference on application-specific systems, architectures and processors, ASAP 2008. IEEE, pp 1–6
Atasu K, Pozzi L, Ienne P (2003) Automatic application-specific instruction-set extensions under microarchitectural constraints. Int J Parallel Program 31(6):411–428
Bauer L, Shafique M, Kramer S, Henkel J (2007) Rispp: rotating instruction set processing platform. In: Proceedings of the 44th annual design automation conference. ACM, pp 791–796
Bonzini P, Pozzi L (2007) Polynomial-time subgraph enumeration for automated instruction set extension. In: Proceedings of the conference on design, automation and test in Europe. EDA Consortium, pp 1331–1336
Bordoloi UD, Huynh HP, Chakraborty S, Mitra T (2009) Evaluating design trade-offs in customizable processors. In: 46th ACM/IEEE design automation conference, DAC’09. IEEE, pp 244–249
Borkar S, Chien AA (2011) The future of microprocessors. Commun ACM 54(5):67–77
Chen L, Tarango J, Mitra T, Brisk P (2013) A just-in-time customizable processor. In: 2013 IEEE/ACM international conference on computer-aided design (ICCAD). IEEE, pp 524–531
Chen X, Maskell DL, Sun Y (2007) Fast identification of custom instructions for extensible processors. IEEE Trans Comput-Aided Des Integr Circuits Syst 26(2):359–368
Choi K (2011) Coarse-grained reconfigurable array: architecture and application mapping. IPSJ Trans Syst LSI Des Methodol 4:31–46
Clark N, Blome J, Chu M, Mahlke S, Biles S, Flautner K (2005) An architecture framework for transparent instruction set customization in embedded processors. In: Proceedings of the 32nd international symposium on computer architecture (ISCA’05). IEEE Computer Society, pp 272–283
Clark N, Kudlur M, Park H, Mahlke S, Flautner K (2004) Application-specific processing on a general-purpose core via transparent instruction set customization. In: 37th international symposium on microarchitecture, MICRO-37 2004. IEEE, pp 30–40
Cong J, Fan Y, Han G, Jagannathan A, Reinman G, Zhang Z (2005) Instruction set extension with shadow registers for configurable processors. In: Proceedings of the 2005 ACM/SIGDA 13th international symposium on field-programmable gate arrays. ACM, pp 99–106
Cong J, Fan Y, Han G, Zhang Z (2004) Application-specific instruction generation for configurable processor architectures. In: Proceedings of the 2004 ACM/SIGDA 12th international symposium on field programmable gate arrays. ACM, pp 183–189
Dennard RH, Gaensslen FH, Rideout VL, Bassous E, LeBlanc AR (1974) Design of Ion-implanted MOSFET’s with very small physical dimensions. IEEE J Solid-State Circuits 9(5):256–268
Dubach C, Jones T, O’Boyle M (2007) Microarchitectural design space exploration using an architecture-centric approach. In: Proceedings of the 40th annual IEEE/ACM international symposium on microarchitecture. IEEE Computer Society, pp 262–271
Esmaeilzadeh H, Blem E, St Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: International symposium on computer architecture (ISCA)
Geer D (2005) Chip makers turn to multicore processors. Computer 38(5):11–13
Giaquinta E, Mishra A, Pozzi L (2015) Maximum convex subgraphs under i/o constraint for automatic identification of custom instructions. IEEE Trans Comput-Aided Des Integr Circuits Syst 34(3):483–494
Gonzalez RE (2000) Xtensa: a configurable and extensible processor. IEEE Micro 20(2):60–70
Gonzalez RE (2006) A software-configurable processor architecture. IEEE Micro 26(5):42–51
Govindaraju V, Ho CH, Sankaralingam K (2011) Dynamically specialized datapaths for energy efficient computing. In: 2011 IEEE 17th international symposium on high performance computer architecture (HPCA). IEEE, pp 503–514
Gupta S, Feng S, Ansari A, Mahlke S, August D (2011) Bundled execution of recurring traces for energy-efficient general purpose processing. In: Proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture. ACM, pp 12–23
Gutin G, Johnstone A, Reddington J, Scott E, Yeo A (2012) An algorithm for finding input–output constrained convex sets in an acyclic digraph. J Discret Algorithms 13:47–58
Halambi A, Grun P, Ganesh V, Khare A, Dutt N, Nicolau A (2008) Expression: a language for architecture exploration through compiler/simulator retargetability. In: Design, automation, and test in Europe. Springer, The Netherlands, pp 31–45
Hameed R, Qadeer W, Wachs M, Azizi O, Solomatnikov A, Lee BC, Richardson S, Kozyrakis C, Horowitz M (2010) Understanding sources of inefficiency in general-purpose chips. In: ACM SIGARCH computer architecture news, vol 38, no 3. ACM, pp 37–47
Huynh H, Mitra T (2007) Instruction-set customization for real-time embedded systems. In: Proceedings of the conference on design, automation and test in Europe. EDA Consortium, pp 1472–1477
Huynh HP, Mitra T (2009) Runtime adaptive extensible embedded processors–a survey. In: International workshop on embedded computer systems. Springer, Berlin/Heidelberg, pp 215–225
Huynh HP, Sim JE, Mitra T (2007) An efficient framework for dynamic reconfiguration of instruction-set customization. In: Proceedings of the 2007 international conference on compilers, architecture, and synthesis for embedded systems. ACM, pp 135–144
Ienne P, Leupers R (2006) Customizable embedded processors: design technologies and applications. Academic Press
Jacob JA, Chow P (1999) Memory interfacing and instruction specification for reconfigurable processors. In: Proceedings of the 1999 ACM/SIGDA seventh international symposium on field programmable gate arrays. ACM, pp 145–154
Jayaseelan R, Liu H, Mitra T (2006) Exploiting forwarding to improve data bandwidth of instruction-set extensions. In: Proceedings of the 43rd annual design automation conference. ACM, pp 43–48
Kastner R, Kaplan A, Memik SO, Bozorgzadeh E (2002) Instruction generation for hybrid reconfigurable systems. ACM Trans Des Autom Electron Syst (TODAES) 7(4):605–627
Kathail V, Aditya S, Schreiber R, Rau BR, Cronquist DC, Sivaraman M (2002) Pico: automatically designing custom computers. Computer 35(9):39–47
Leibson S (2006) Designing SOCs with configured cores: unleashing the tensilica Xtensa and diamond cores. Academic Press
Li T, Sun Z, Jigang W, Lu X (2009) Fast enumeration of maximal valid subgraphs for custom-instruction identification. In: Proceedings of the 2009 international conference on compilers, architecture, and synthesis for embedded systems. ACM, pp 29–36
Lodi A, Toma M, Campi F, Cappelli A, Canegallo R, Guerrieri R (2003) A VLIW processor with reconfigurable instruction set for embedded applications. IEEE J Solid-State Circuits 38(11):1876–1886
Lysecky R, Stitt G, Vahid F (2004) Warp processors. In: ACM transactions on design automation of electronic systems (TODAES), vol 11, no 3. ACM, pp 659–681
Merritt R (2009) ARM CTO: power surge could create ‘dark silicon’. EE Times, Oct 2009.
Mitra T (2015) Heterogeneous multi-core architectures. Inf Media Technol 10(3):383–394
Mitra T, Yu P (2005) Satisfying real-time constraints with custom instructions. In: Third IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES+ ISSS’05. IEEE, pp 166–171
Moon JW, Moser L (1965) On cliques in graphs. Israel J Math 3(1):23–28
Moore GE et al (1965) Cramming more components onto integrated circuits
Mudge T (2000) Power: a first class design constraint for future architectures. In: International conference on high-performance computing. Springer, pp 215–224
Nios I (2009) Processor reference handbook
Palacharla S, Jouppi NP, Smith JE (1997) Complexity-effective superscalar processors. In: Proceedings of the 24th annual international symposium on computer architecture (ISCA’97), Denver. ACM, New York, pp 206–218. doi: 10.1145/264107.264201
Palermo G, Silvano C, Zaccaria V (2005) Multi-objective design space exploration of embedded systems. J Embed Comput 1(3):305–316
Pan Y (2008) Design methodologies for instruction-set extensible processors. Ph.D. thesis, National University of Singapore
Patterson D, Hennessy JL (2012) Computer architecture: a quantitative approach. Elsevier
Pothineni N, Kumar A, Paul K (2007) Application specific datapath extension with distributed i/o functional units. In: Proceedings of the 20th international conference on VLSI design, Bangalore
Pozzi L, Atasu K, Ienne P (2006) Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Trans Comput-Aided Des Integr Circuits Syst 25(7):1209–1229
Pozzi L, Ienne P (2005) Exploiting pipelining to relax register-file port constraints of instruction-set extensions. In: Proceedings of the 2005 international conference on compilers, architectures and synthesis for embedded systems. ACM, pp 2–10
Razdan R (1994) Prisc: programmable reduced instruction set computers. Ph.D. thesis, Harvard University Cambridge
Reddington J, Atasu K (2012) Complexity of computing convex subgraphs in custom instruction synthesis. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(12): 2337–2341
Reddington J, Gutin G, Johnstone A, Scott E, Yeo A (2009) Better than optimal: fast identification of custom instruction candidates. In: International conference on computational science and engineering, CSE’09. vol 2. IEEE, pp 17–24
Rosinger HP (2004) Connecting customized ip to the microblaze soft processor using the fast simplex link (fsl) channel. Xilinx Application Note
Shafique M, Garg S, Mitra T, Parameswaran S, Henkel J (2014) Dark silicon as a challenge for hardware/software co-design. In: Conference on hardware/software codesign and system synthesis (CODES)
Shalf JM, Leland R (2015) Computing beyond moore’s law. Computer 48(12):14–23
Tan C, Kulkarni A, Venkataramani V, Karunaratne M, Mitra T, Peh LS (2016) Locus: low-power customizable many-core architecture for wearables. In: Proceedings of the international conference on compilers, architecture, and synthesis for embedded systems (CASES)
Vassiliadis S, Wong S, Gaydadjiev G, Bertels K, Kuzmanov G, Panainte EM (2004) The molen polymorphic processor. IEEE Trans Comput 53(11):1363–1375
Venkatesh G, Sampson J, Goulding-Hotta N, Venkata SK, Taylor MB, Swanson S (2011) Qscores: trading dark silicon for scalable energy efficiency with quasi-specific cores. In: Proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture. ACM, pp 163–174
Verma AK, Brisk P, Ienne P (2007) Rethinking custom ise identification: a new processor-agnostic method. In: Proceedings of the 2007 international conference on compilers, architecture, and synthesis for embedded systems. ACM, pp 125–134
Wall DW (1991) Limits of instruction-level parallelism. In: Proceedings of the fourth international conference on architectural support for programming languages and operating systems (ASPLOS IV), Santa Clara. ACM, New York, pp 176–188. doi: 10.1145/106972.106991
Wirthlin MJ, Hutchings BL (1995) A dynamic instruction set computer. In: IEEE symposium on FPGAs for custom computing machines. Proceedings. IEEE, pp 99–107
Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. ACM SIGARCH Comput Archit News 23(1):20–24
Ye ZA, Moshovos A, Hauck S, Banerjee P (2000) CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit. In: ACM SIGARCH computer architecture news, vol 28, no 2. ACM, pp. 225–235
Yu P, Mitra T (2004) Characterizing embedded applications for instruction-set extensible processors. In: Proceedings of the 41st annual design automation conference. ACM, pp 723–728
Yu P, Mitra T (2004) Scalable custom instructions identification for instruction-set extensible processors. In: Proceedings of the 2004 international conference on compilers, architecture, and synthesis for embedded systems. ACM, pp 69–78
Yu P, Mitra T (2007) Disjoint pattern enumeration for custom instructions identification. In: International conference on field programmable logic and applications, FPL 2007. IEEE, pp 273–278
Acknowledgements
This work was partially supported by Singapore Ministry of Education Academic Research Fund Tier 2 MOE2014-T2-2-129.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this entry
Cite this entry
Mitra, T. (2017). Application-Specific Processors. In: Ha, S., Teich, J. (eds) Handbook of Hardware/Software Codesign. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-7267-9_13
Download citation
DOI: https://doi.org/10.1007/978-94-017-7267-9_13
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-7266-2
Online ISBN: 978-94-017-7267-9
eBook Packages: EngineeringReference Module Computer Science and Engineering