Advertisement

Synchroscalar: Initial Lessons in Power-Aware Design of a Tile-Based Embedded Architecture

  • John Oliver
  • Ravishankar Rao
  • Paul Sultana
  • Jedidiah Crandall
  • Erik Czernikowski
  • Leslie W. JonesIV
  • Dean Copsey
  • Diana Keen
  • Venkatesh Akella
  • Frederic T. Chong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3164)

Abstract

Embedded devices have hard performance targets and severe power and area constraints that depart significantly from our design intuitions derived from general-purpose microprocessor design. This paper describes our initial experiences in designing Synchroscalar, a tile-based embedded architecture targeted for multi-rate signal processing applications.

We present a preliminary design of the Synchroscalar architecture and some design space exploration in the context of important signal processing kernels. In particular, we find that synchronous design and substantial global interconnect are desirable in the low-frequency, low-power domain. This global interconnect enables parallelization and reduces processor idle time, which are critical to energy efficient implementations of high bandwidth signal processing. Furthermore, statically-scheduled communication and SIMD computation keep control overheads low and energy efficiency high.

Keywords

Low Power Processor 802.11(a) Programmable DSP Processor tiled-based architectures embedded processors 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baas, B.M.: A parallel programmable energy-efficient architecture for computationally intensive DSP systems. In: Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems, and Computers (November 2003)Google Scholar
  2. 2.
    Bhattacharya, S., Murthy, P., Lee, E.: Software synthesis from dataflow graphs (1996)Google Scholar
  3. 3.
    Bhattacharya, S., Murthy, P., Lee, E.: Synthesis of embedded software from synchronous dataflow specifications. Journal of VLSI Signal Processing (21), 151–166 (1999)Google Scholar
  4. 4.
    Buck, J., Ha, S., Lee, E.A., Messerschmitt, D.G.: Ptolemy: A framework for simulating and prototyping heterogenous systems. Int. Journal in Computer Simulation 4(2) (1994)Google Scholar
  5. 5.
    Caspi, E., Chu, M., Huang, R., Yeh, J., Wawrzynek, J., DeHon, A.: Stream computations organized for reconfigurable execution (SCORE). In: Grünbacher, H., Hartenstein, R.W. (eds.) FPL 2000. LNCS, vol. 1896, pp. 605–614. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Ho, R., Mai, K., Horowitz, M.: The future of wires. Proceedings of the IEEE 89, 490–504 (2001)CrossRefGoogle Scholar
  7. 7.
    Kolagotla, R., Fridman, J., Aldrich, B., Hoffman, M., Anderson, W., Allen, M., Witt, D., Dunton, R., Booth, L.: High Performance Dual-MAC DSP Architecture. IEEE Signal Processing Magazine (July 2002)Google Scholar
  8. 8.
    Lee, E.A., Messerschmitt, D.G.: Static scheduling of synchronous dataflow programs for digital signal processing. IEEE Transactions on Computers C-36(1) (January 1999)Google Scholar
  9. 9.
    Lee, W., Barua, R., Frank, M., Srikrishna, D., Babb, J., Sarkar, V., Amarasinghe, S.P.: Space-time scheduling of instruction-level parallelism on a raw machine. In: Architectural Support for Programming Languages and Operating Systems, pp. 46–57 (1998)Google Scholar
  10. 10.
    Liang, J., Swaminathan, S., Tessier, R.: aSOC: A scalable, single-chip communications architecture. In: IEEE PACT, pp. 37–46 (2000)Google Scholar
  11. 11.
    Marculescu, D., Iyer, A.: Power and performance evaluation of globally asynchronous locally synchronous processors. In: DeGroot, D. (ed.) Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), May 25–29. Computer Architectuer News, vol. 30(2), pp. 158–170. ACM Press, New York (2002)Google Scholar
  12. 12.
    Mori, T., Amrutur, B., Horowitz, M., Fukushi, I., Izawa, T., Mitarai, S.: A 1v 0.19mw at 100 mhz 2kx16b sram utilizing a half-swing pulsed-decoder and write-bus architecture in 0.25 μm dual-vt cmos. In: Solid-State Conference, 1998, Digest of Technical Papers, 45th ISSCC 1998 IEEE International (1998)Google Scholar
  13. 13.
    Rixner, S., Dally, W.J., Kapasi, U.J., Khailany, B., Lopez-Lagunas, A., Mattson, P.R., Owens, J.D.: A bandwidth-efficient architecture for media processing. In: International Symposium on Microarchitecture, pp. 3–13 (1998)Google Scholar
  14. 14.
    Sakurai, T., Newton, R.: Alpha-Power Law MOSFET Model and Its Application to CMOS Inverter Delay and Other Formulas. IEEE Journal of Solid State Circuits 25, 584–594 (1990)CrossRefGoogle Scholar
  15. 15.
    Sarmenta, L., Pratt, G.A., Ward, S.: Rational clocking. In: International Conference on Computer Design, pp. 271–278 (1995)Google Scholar
  16. 16.
    Schmit, H., Cadambi, S., Moe, M., Goldstein, S.: Pipeline reconfigurable FPGA. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 24(2), 129–146 (2000)CrossRefGoogle Scholar
  17. 17.
    Semeraro, G., Magklis, G., Balasubramonian, R., Albonesi, D.H., Dwarkadas, S., Scott, M.L.: Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In: HPCA, pp. 29–42 (2002)Google Scholar
  18. 18.
    Shoemaker, D., Honore, F., Metcalf, C., Ward, S.: Numesh: An architecture optimized for scheduled communication. Journal of Supercomputing 10(3) (1996)Google Scholar
  19. 19.
    Kumura, M.Y.T., Ikekawa, M., Kuroda, I.: VLIW DSP for Mobile Applications. In: IEEE Signal Processing Magazine (July 2002)Google Scholar
  20. 20.
    Taylor, M.B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffman, H., Johnson, P., Lee, J.-W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A.: The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro 22(2), 25–35 (2002)CrossRefGoogle Scholar
  21. 21.
    Zhang, H., Prabhu, V., George, V., Benes, M., Abnous, A., Rabaey, J.: A 1-V heterogenous reconfigurable DSP IC for wireless baseband digital signal processing. IEEE Journal of Solid State Circuits 35, 1697–1704 (2000)CrossRefGoogle Scholar
  22. 22.
    Zhang, H., Wan, M., George, V., Rabaey, J.: Interconnect Architecture Exploration for Low Energy Reconfigurable Single-Chip DSP. In: Proceedings of the Workshop on VLSI, Orlando, Florida (April 1999)Google Scholar
  23. 23.
    Zhang, Y., Ye, W., Irwin, M.J.: An alternative for on-chip global interconnect: Segmented bus power modeling. In: Conference Record of the Thirty-Second Asilomar Conrference on Signals, Systems and Computers, pp. 1062–1065 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • John Oliver
    • 1
  • Ravishankar Rao
    • 1
  • Paul Sultana
    • 1
  • Jedidiah Crandall
    • 1
  • Erik Czernikowski
    • 1
  • Leslie W. JonesIV
    • 2
  • Dean Copsey
    • 1
  • Diana Keen
    • 2
  • Venkatesh Akella
    • 1
  • Frederic T. Chong
    • 1
  1. 1.University of California at DavisUSA
  2. 2.California Polytechnic State UniversitySan Luis ObispoUSA

Personalised recommendations