Software Pipelining Support for Transport Triggered Architecture Processors
Many telecommunication applications, especially baseband processing, and digital signal processing (DSP) applications call for high-performance implementations due to the complexity of algorithms and high throughput requirements. In general, the required performance is obtained with the aid of parallel computational resources. In these application domains, software implementations are often preferred over fixed-function ASICs due to the flexibility and ease of development. Application-specific instruction-set processor (ASIP) architectures can be used to exploit efficiently the inherent parallelism of the algorithms but still maintaining the flexibility. Use of high-level languages to program processor architectures with parallel resources can lead to inefficient resource utilization and, on the other hand, parallel assembly programming is error prone and tedious.
In this paper, the inherent problems of parallel programming and software pipelining are mitigated with parallel language syntax and automatic generation of software pipelined code for the iteration kernels. With the aid of the developed tool support, the underlying performance of a processor architecture with parallel resources can be exploited and full utilization of the main processing resources is obtained for pipelined loop kernels. The given examples show that efficiency can be obtained without reducing the performance.
KeywordsClock Cycle Function Unit Assembly Language Software Pipeline Automatic Code Generation
Unable to display preview. Download preview PDF.
- 1.Kokozinski, R., Greifendorf, D., Stammen, J., Jung, P.: The evolution of hardware platforms for mobile ’software defined radio’ terminals. In: IEEE Int. Symp. Personal Indoor Mobile Radio Commun., Freiburg, Germany, vol. 5, pp. 2389–2393 (2002)Google Scholar
- 2.Keutzer, K., Malik, S., Newton, A.R.: From ASIC to ASIP: the next design discontinuity. In: IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, Freiburg, Germany, pp. 84–90 (2002)Google Scholar
- 3.Corporaal, H.: Microprocessor Architectures from VLIW to TTA. John Wiley & Sons Ltd, Chichester (1998)Google Scholar
- 4.Corporaal, H., Arnold, M.: Using transport triggered architecture for embedded processor design. Integrated Computer-Aided Eng. 5, 19–38 (1998)Google Scholar
- 5.Salmela, P., Järvinen, T., Sipilä, T., Takala, J.: Scalable FIR filtering on transport triggered architecture processor. In: Int. Symp. Signals Circuit Syst., Iasi, Romania, pp. 493–496 (2005)Google Scholar
- 6.Salmela, P., Järvinen, T., Sipilä, T., Takala, J.: 256-state rate 1/2 Viterbi decoder on TTA processor. In: IEEE Int. Conf. Application-Specific Syst. Architectures Processors, Samos, Greece, vol. 2, pp. 370–375 (2005)Google Scholar
- 8.Fisher, J.A., Faraboschi, P., Young, C.: Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufman Publishers Inc., San Francisco (2004)Google Scholar
- 9.Texas Instruments: TMS320C55x Technical Overview, SPRU393 (2000)Google Scholar
- 10.Texas Instruments: TMS320C55x DSP Mnemonic Instruction Set Reference Guide, SPRU374G (2002)Google Scholar