A High Performance Heterogeneous Architecture and Its Optimization Design

Guo, Jianjun; Dai, Kui; Wang, Zhiying

doi:10.1007/11847366_31

A High Performance Heterogeneous Architecture and Its Optimization Design

Jianjun Guo¹⁸,
Kui Dai¹⁸ &
Zhiying Wang¹⁸

Conference paper

792 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4208))

Abstract

The widely adoption of media processing applications provides great challenges to high performance embedded processor design. This paper studies a Data Parallel Coprocessor architecture based on SDTA and architecture de-cisions are made for the best performance/cost ratio. Experimental results on a prototype show that SDTA has high performance to run many embedded media processing applications. The simplicity and flexibility of SDTA encourages for further development for its reconfigurable functionality.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fritts, J.E., Steiling, F.W., Tucek, J.A.: MediaBench II Video: Expediting the next generation of video systems research. In: Embedded Processors for Multimedia and Communications II. San Jose, California, March 8, pp. 79–93 (2005) ISBN / ISSN: 0-8194-5656-X
Google Scholar
Berry, M.W.: Scientific Workload Characterization By Loop-Based Analyses. SIGMETRICS Performance Evaluation Review 19(3), 17–29 (1992)
Article Google Scholar
Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the Cell multiprocessor. IBM Journal. Research & Development 49(4/5) (July/September 2005)
Google Scholar
Krewell, K.: Cell moves into the limelight. Microprocessor Report. February 14 (2005)
Google Scholar
Fritts, J.: Multi-level Memory Prefetching for Media and Stream Processing. In: Proc. of the IEEE International Conference on Multimedia and Expo (ICME2002), pp. 101–104 (August 2002)
Google Scholar
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proc. of the 17th Annual International Symposium on Computer Architecture, pp. 364–373 (May 1990)
Google Scholar
Palacharla, S., Kessler, R.: Evaluating stream buffers as a secondary cache replacement. In: Proc. of the 21st Annual International Symposium on Computer Architecture, pp. 24–33 (April 1994)
Google Scholar
Fu, J.W.C., Patel, J.H.: Data prefetching in multi-processor vector cache memories. In: Proc. of the 18th Annual International Symposium on Computer Architecture, pp. 54–63 (May 1991)
Google Scholar
Fu, J., Patel, J., Janssens, B.: Stride directed prefetching in scalar processors. In: Proc. of the 25th International Symposium on Microarchitecture, pp. 102–110 (December 1992)
Google Scholar
Zucker, D., Flynn, M., Lee, R.: A Comparison of Hardware Prefetching Techniques For Multimedia Benchmarks. In: 3rd. IEEE International Conference on Multimedia Computing and Systems, Hiroshima, Japan, pp. 236–244 (June 1996)
Google Scholar
Jain, M.K., Balakrishnan, M.: ASIP Design Methodologies: Survey and Issues. In: Proc. of the 14th International Conference on VLSI Design (VLSID 2001), pp. 76–81 (January 2001)
Google Scholar
Corporaal, H., Mulder, H.: MOVE: A framework for high-performance processor design. In: Supercomputing 1991, pp. 692–701 (November 1991)
Google Scholar
Hoogerbrugge, J.: Code generation for Transport Triggered Architectures. PhD thesis, Delft Univ.of Technology (February 1996) ISBN 90-9009002-9
Google Scholar
Leon3 Processor Introduction, http://www.gaisler.com/cms4_5_3/index.php?option=com_content&task=view&id=13&Itemid=53
Volder, J.E.: The CORDIC trigonometric computing technique. IRE Transactions on Electronic Computers 8, 330–334 (1959)
Article Google Scholar
Ye, T.T.: 0n-chip multiprocessor communication network design and analysis. PhD thesis, Stanford University (December 2003)
Google Scholar
TMS320C64x CPU and Instruction Set Reference Guide. Texas Instruments, Inc., USA (2000)
Google Scholar
TMS320C64x DSP library programmer’s reference. Texas Instruments, Inc., USA (2003)
Google Scholar
Hofstee, H.P.: Power Efficient Processor Architecture and The Cell Processor. In: Proc. of the 11th International Symposium on High-Performance Computer Architecture (HPCA 2005), San Francisco, CA, USA, pp. 258–262 (February 2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, 410073, Changsha, Hunan, China
Jianjun Guo, Kui Dai & Zhiying Wang

Authors

Jianjun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Kui Dai
View author publications
You can also search for this author in PubMed Google Scholar
Zhiying Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

,
Michael Gerndt
GUP, Institute of Graphics and Parallel Processing, Johannes Kepler University, Altenbergerstraße 69, A-4040, Linz, Austria
Dieter Kranzlmüller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, J., Dai, K., Wang, Z. (2006). A High Performance Heterogeneous Architecture and Its Optimization Design. In: Gerndt, M., Kranzlmüller, D. (eds) High Performance Computing and Communications. HPCC 2006. Lecture Notes in Computer Science, vol 4208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847366_31

Download citation

DOI: https://doi.org/10.1007/11847366_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39368-9
Online ISBN: 978-3-540-39372-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics