Skip to main content

A High Performance Heterogeneous Architecture and Its Optimization Design

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4208))

Abstract

The widely adoption of media processing applications provides great challenges to high performance embedded processor design. This paper studies a Data Parallel Coprocessor architecture based on SDTA and architecture de-cisions are made for the best performance/cost ratio. Experimental results on a prototype show that SDTA has high performance to run many embedded media processing applications. The simplicity and flexibility of SDTA encourages for further development for its reconfigurable functionality.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fritts, J.E., Steiling, F.W., Tucek, J.A.: MediaBench II Video: Expediting the next generation of video systems research. In: Embedded Processors for Multimedia and Communications II. San Jose, California, March 8, pp. 79–93 (2005) ISBN / ISSN: 0-8194-5656-X

    Google Scholar 

  2. Berry, M.W.: Scientific Workload Characterization By Loop-Based Analyses. SIGMETRICS Performance Evaluation Review 19(3), 17–29 (1992)

    Article  Google Scholar 

  3. Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell multiprocessor. IBM Journal. Research & Development 49(4/5) (July/September 2005)

    Google Scholar 

  4. Krewell, K.: Cell moves into the limelight. Microprocessor Report. February 14 (2005)

    Google Scholar 

  5. Fritts, J.: Multi-level Memory Prefetching for Media and Stream Processing. In: Proc. of the IEEE International Conference on Multimedia and Expo (ICME2002), pp. 101–104 (August 2002)

    Google Scholar 

  6. Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proc. of the 17th Annual International Symposium on Computer Architecture, pp. 364–373 (May 1990)

    Google Scholar 

  7. Palacharla, S., Kessler, R.: Evaluating stream buffers as a secondary cache replacement. In: Proc. of the 21st Annual International Symposium on Computer Architecture, pp. 24–33 (April 1994)

    Google Scholar 

  8. Fu, J.W.C., Patel, J.H.: Data prefetching in multi-processor vector cache memories. In: Proc. of the 18th Annual International Symposium on Computer Architecture, pp. 54–63 (May 1991)

    Google Scholar 

  9. Fu, J., Patel, J., Janssens, B.: Stride directed prefetching in scalar processors. In: Proc. of the 25th International Symposium on Microarchitecture, pp. 102–110 (December 1992)

    Google Scholar 

  10. Zucker, D., Flynn, M., Lee, R.: A Comparison of Hardware Prefetching Techniques For Multimedia Benchmarks. In: 3rd. IEEE International Conference on Multimedia Computing and Systems, Hiroshima, Japan, pp. 236–244 (June 1996)

    Google Scholar 

  11. Jain, M.K., Balakrishnan, M.: ASIP Design Methodologies: Survey and Issues. In: Proc. of the 14th International Conference on VLSI Design (VLSID 2001), pp. 76–81 (January 2001)

    Google Scholar 

  12. Corporaal, H., Mulder, H.: MOVE: A framework for high-performance processor design. In: Supercomputing 1991, pp. 692–701 (November 1991)

    Google Scholar 

  13. Hoogerbrugge, J.: Code generation for Transport Triggered Architectures. PhD thesis, Delft Univ.of Technology (February 1996) ISBN 90-9009002-9

    Google Scholar 

  14. Leon3 Processor Introduction, http://www.gaisler.com/cms4_5_3/index.php?option=com_content&task=view&id=13&Itemid=53

  15. Volder, J.E.: The CORDIC trigonometric computing technique. IRE Transactions on Electronic Computers 8, 330–334 (1959)

    Article  Google Scholar 

  16. Ye, T.T.: 0n-chip multiprocessor communication network design and analysis. PhD thesis, Stanford University (December 2003)

    Google Scholar 

  17. TMS320C64x CPU and Instruction Set Reference Guide. Texas Instruments, Inc., USA (2000)

    Google Scholar 

  18. TMS320C64x DSP library programmer’s reference. Texas Instruments, Inc., USA (2003)

    Google Scholar 

  19. Hofstee, H.P.: Power Efficient Processor Architecture and The Cell Processor. In: Proc. of the 11th International Symposium on High-Performance Computer Architecture (HPCA 2005), San Francisco, CA, USA, pp. 258–262 (February 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, J., Dai, K., Wang, Z. (2006). A High Performance Heterogeneous Architecture and Its Optimization Design. In: Gerndt, M., Kranzlmüller, D. (eds) High Performance Computing and Communications. HPCC 2006. Lecture Notes in Computer Science, vol 4208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847366_31

Download citation

  • DOI: https://doi.org/10.1007/11847366_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39368-9

  • Online ISBN: 978-3-540-39372-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics