A Scalable, Multi-thread, Multi-issue Array Processor Architecture for DSP Applications Based on Extended Tomasulo Scheme

  • Mladen Bereković
  • Tim Niggemeier
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4017)


A scalable, distributed micro-architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with out-of-order execution, that supports specialized, complex DSP function units, and simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks, forming a ”network-on-a-chip” (NOC) [1]. The communication protocol is a modified version of Tomasulo’s scheme [2], that was extended to eliminate all central control structures for the data flow and to support multithreading. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.


Function Unit Digital Signal Processing Application Superscalar Processor VLIW Processor Content Addressable Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Benini, L., de Micheli, G.: Networks on chip: A New SOC Paradigm. IEEE Computer 35(1), 70–78 (2002)Google Scholar
  2. 2.
    Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM Journal on Research and Development 11(1), 25–33 (1967)zbMATHCrossRefGoogle Scholar
  3. 3.
    Berekovic, M., Stolberg, H.-J., Pirsch, P.: Multi-Core System-On-Chip Architecture for MPEG-4 Streaming Video. Transactions on Circuits and Systems for Video Technology (CSVT) 12(8), 688–699 (2002)CrossRefGoogle Scholar
  4. 4.
    Pirsch, P., Berekovic, M., Stolberg, H.-J., Jachalsky, J.: VLSI Architectures for MPEG-4 Video. In: VLSI Conference, Taipei (April 2003)Google Scholar
  5. 5.
    ARM AMBA Specification,
  6. 6.
    Vahid, F.: The Softening of Hardware. IEEE Computer 36(4), 27–34 (2003)Google Scholar
  7. 7.
    Zhang, H., Rabaey, J.M., et al.: A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications. In: Proc. Int’l. Solid-State Circuits Conference (ISSCC), San Francisco (February 2000)Google Scholar
  8. 8.
    van Meerbergen, J.L.: Lecture slides: Complex Multiprocessor architectures,
  9. 9.
    ISO/IEC JTC/SC29/WG11 N4668, Overview of the MPEG-4 standard, Jeju (March 2002)Google Scholar
  10. 10.
    Berekovic, M., Pirsch, P., Kneip, J.: An Algorithm-Hardware-System Approach to VLIW Multimedia Processors. Journal of VLSI Signal Processing Systems 20(1-2), 163–180 (1998)CrossRefGoogle Scholar
  11. 11.
    Allan, A., Edenfeld, D., Joyner, W.H., Kahng, A.B., Rodgers, M., Zorian, Y.: 2001 Technology Roadmap for Semiconductors. IEEE Computer 35(1), 42–53 (2002)Google Scholar
  12. 12.
    Lipasti, M.H., Shen, J.P.: Modern Processor Design. McGrawHill, New York (2002)Google Scholar
  13. 13.
    Berekovic, M., Stolberg, H.J., Kulaczewski, M.B., Pirsch, P., Moeller, H., Runge, H., Kneip, J., Stabernack, B.: Instruction Set Extensions for MPEG-4 Video. Journal of VLSI Signal Processing Systems 23(1), 7–50 (1999)CrossRefGoogle Scholar
  14. 14.
    Wittenburg, J.P., Hinrichs, W., Kneip, J., Ohmacht, M., Berekovic, M., Lieske, H., Kloos, H., Pirsch, P.: Realization of a Programmable Parallel DSP for High Performance Image Processing Applications. In: Design Automation Conference (DAC) 1998, June 1998, pp. 56–61 (1998)Google Scholar
  15. 15.
    Lee, R.: Accelerating Multimedia with Enhanced Microprocessors. IEEE Micro 15(2), 22–32 (1995)CrossRefGoogle Scholar
  16. 16.
    Slingerland, N., Smith, A.J.: Measuring the Performance of Multimedia Instruction Sets. IEEE Transactions on Computers 51(11), 1317–1332 (2002)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Texas Instruments, TMS320DM642 Technical Overview, Application Report SPRU615 (September 2002)Google Scholar
  18. 18.
    Lam, M.S., Wilson, R.P.: Limits of Control Flow on Parallelism. In: Proc. 19th Ann. Int’l Symp. on Computer Architecture, June 1992, pp. 46–57 (1992)Google Scholar
  19. 19.
    Tullsen, D.M., Eggers, S.J., Levy, H.M.: Simultaneous Multithreading: Maximizing On-Chip Parallelism. In: Proc. 22th Ann. Int’l Symp. on Computer Architecture, June 1995, pp. 392–403 (1995)Google Scholar
  20. 20.
    Preston, R.P., et al.: Design of an 8-wide Superscalar RISC with Simultaneous Multithreading. In: Solid-State Circuits Conference (ISSCC2002), San-Francisco, February 2002, pp. 469–471 (2002)Google Scholar
  21. 21.
    Palacharla, S., Jouppi, N.P., Smith, J.: Complexity Effective Superscalar Processors. In: Proc. 24th. Int’l. Symp. on Computer Architecture, June 1997, pp. 206–218 (1997)Google Scholar
  22. 22.
    Ackland, B., et al.: A Single Chip, 1.6-Billion, 16-b MAC/s Multiprocessor DSP. IEEE J. Solid-State Circuits, 412–424 (March 2000)Google Scholar
  23. 23.
    Stolberg, H.-J., Berekovic, M., Friebe, L., Moch, S., Fluegel, S., Mao, X., Kulaczewski, M.B., Klussmann, H., Pirsch, P.: HiBRID-SoC: A Multi-Core System-on-Chip Architecture for Multimedia Signal Processing Applications. In: Proceedings Design, Automation and Test in Europe (DATE 2003) - Designer’s Forum, March 2003, pp. 8–13 (2003)Google Scholar
  24. 24.
    Farkas, K.I., Chow, P., Jouppi, N.P., Vranesic, Z.: The Multicluster Architecture: Reducing Cycle Time through Partitioning. In: Proc. 30th. Int’l. Symp. On Microarchitecure, December 1997, pp. 149–159 (1997)Google Scholar
  25. 25.
    Kessler, R.E.: The Alpha 21264 Microprocessor. IEEE Micro 19(2), 24–36 (1999)CrossRefMathSciNetGoogle Scholar
  26. 26.
    Ho, R., Mai, K.W., Horowitz, M.A.: The Future of wires. Proceedings of the IEEE 89(4), 490–504 (2001)CrossRefGoogle Scholar
  27. 27.
    Agarwal, V., Hrishikesh, M.S., Keckler, S.W., Burger, D.: Clock Rate versus IPC: The End of the Road for conventional Microarchitectures. In: Proc. 27th Ann. Int’l. Symp on Computer Architecture, June 2000, pp. 248–259 (2000)Google Scholar
  28. 28.
    Corporaal, H.: Microprocessor Architectures from VLIW to TTA. John Wiley & Sons, Chichester (1998)Google Scholar
  29. 29.
    Vangal, S., et al.: 5-Ghz 32-bit Integer Execution Core in 130-nm Dual-VT CMOS. IEEE Journal of Solid-State Circuits 37(11) (November 2002)Google Scholar
  30. 30.
    Berekovic, M., Pirsch, P., Kneip, J.: An Algorithm-Hardware-System Approach to VLIW Multimedia Processors. Journal of VLSI Signal Processing Systems 20(1-2), 163–180 (1998)CrossRefGoogle Scholar
  31. 31.
    Chen, Y.-K., Lienhart, R., Debes, E., Holliman, M., Yeung, M.: The impact of SMT/SMP Designs on Multimedia Software Engineering: A Workload Analysis Study. In: Fourth International Symposium on Multimedia Software Engineering (December 2002)Google Scholar
  32. 32.
    Wall, D.W.: Limits of Instruction-Level Parallelism. In: Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 176–188 (1991)Google Scholar
  33. 33.
    Brown, M., Stark, J., Patt, Y.: Select-Free Instruction Scheduling Logic. Micro-34, 204–213 (2001)Google Scholar
  34. 34.
    Weiss, S., Smith, J.E.: Instruction Issue Logic in Pipelined Supercomputers. IEEE Trans. on Comp. C 33(11), 1013–1022 (1984)CrossRefGoogle Scholar
  35. 35.
    Sato, T., Nakamura, Y., Arita, I.: Revisiting Direct Tag Search Algorithm on Superscalar Processors. In: Workshop on Complexity-Effective Design (June 2001)Google Scholar
  36. 36.
    Corporaal, H.: Microprocessor Architectures from VLIW to TTA. John Wiley & Sons, Chichester (1998)Google Scholar
  37. 37.
    Nagarajan, R., Sankaralingam, K., Burger, D., Keckler, S.: Design Space Evaluation of Grid Processor Architectures, Micro-34., 40–53 (2001)Google Scholar
  38. 38.
    Taylor, M.B., et al.: The RAW Microprocessor: A Computational Fabric For Software Circuits and General-Purpose Programs. IEEE Micro 22(2), 25–35 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mladen Bereković
    • 1
    • 2
  • Tim Niggemeier
    • 3
  1. 1.IMECBelgium
  2. 2.TU DelftNetherlands
  3. 3.IBM Deutschland Entwicklung GmbHGermany

Personalised recommendations