International Journal of Parallel Programming

, Volume 36, Issue 1, pp 140–162 | Cite as

Architectural Exploration of Heterogeneous Multiprocessor Systems for JPEG


Multicore processors have been utilized in embedded systems and general computing applications for some time. However, these multicore chips execute multiple applications concurrently, with each core carrying out a particular task in the system. Such systems can be found in gaming, automotive real-time systems and video / image encoding devices. These system are commonly deployed to overcome deadline misses, which are primarily due to overloading of a single multitasking core. In this paper, we explore the use of multiple cores for a single application, as opposed to multiple applications executing in a parallel fashion. A single application is parallelized using two different methods: one, a master-slave model; and two, a sequential pipeline model. The systems were implemented using Tensilica’s Xtensa LX processors with queues as the means of communications between two cores. In a master-slave model, we utilized a course grained approach whereby a main core distributes the workload to the remaining cores and reads the processed data before writing the results back to file. In the pipeline model, a lower granularity is used. The application is partitioned into multiple sequential blocks; each block representing a stage in a sequential pipeline. For both models we applied a number of differing configurations ranging from a single core to a nine-core system. We found that without any optimization for the seven core system, the sequential pipeline approach has a more efficient area usage, with an area increase to speedup ratio of 1.83 compared to the master-slave approach of 4.34. With selective optimization in the pipeline approach, we obtained speed ups of up to 4.6 × while with an area increase of only 3.1 × (area increase to speedup ratio of just 0.68).


architecture ASIPs design heterogeneous system multiprocessor pipelines SoC 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kumar R., Tullsen D.M., Jouppi N.P., Ranganathan P. (2005). Heterogeneous Chip Multiprocessors. Computer 38(11):32–38CrossRefGoogle Scholar
  2. 2.
    D. Pham et al., The Design and Implementation of a First-generation Cell Processor, in Proc. of the ISSCC 2005, IEEE CS Press, pp. 184–186 (2005).Google Scholar
  3. 3.
    T. D. Braun, H. J. Siegel, and A. A. Maciejewski, Heterogeneous computing: Goals, Methods, and Open Problems, in Proc. of the HiPC 2001, Hyderabad, India, Springer, Berlin, Vol. 2228, pp. 302–320 (2001).Google Scholar
  4. 4.
    J. Axelsson, A Case Study in Heterogeneous Implementation of Automotive Real-Time Systems, in Proc. of the CODES’98, Seattle (1998).Google Scholar
  5. 5.
    Strik M.T.J., Timmer A.H., van Meerbergen J.L., van Rootselaar G.-J. (2000). Heterogeneous Multiprocessor for the Management of Real-time Video and Graphics Streams. IEEE J. Solid-State Circuits 35(11):1722–1731CrossRefGoogle Scholar
  6. 6.
    Zhang N., Wu C.-H. (1997). Study on Adaptive Job Assignment for Multiprocessor Implementation of MPEG2 Video Encoding. IEEE Trans. Ind. Electron. 44(5):726–734CrossRefGoogle Scholar
  7. 7.
    A. Berić, Ramanathan Sethuraman, Carlos Alba Pinto, Harm Peters, Gerard Veldman, Peter van de Haar, and Marc Duranton, Heterogeneous Multiprocessor for High Definition Video, in Proc of the ICCE’06, pp. 401–402 (2006).Google Scholar
  8. 8.
    S. Gopalakrishnan and M. Caccamo, Task Partitioning with Replication upon Heterogeneous Multiprocessor Systems, in Proc of the RTAS’06, pp. 199–207 (2006).Google Scholar
  9. 9.
    S. Baruah, Task Partitioning upon Heterogeneous Multiprocessor Platforms, in Proc of the RTAS’04, pp. 536–543 (2004).Google Scholar
  10. 10.
    M. Kim, D. Kim, and G.E. Sobelman, MPEG-4 Performance Analysis for a CDMA Network-on-chip, in Proc of the 2005 International Conference on Communications, Circuits and Systems, 2005, pp. 493–496 (2005).Google Scholar
  11. 11.
    Wieferink A., Doerper M., Leupers R., Ascheid G., Meyr H., Kogel T., Braun G., Nohl A. (2005). System Level Processor/Communication Co-exploration Methodology for Multiprocessor System-on-Chip Platforms. Comput. Digit. Tech. IEE Proc. 152(1):3–11CrossRefGoogle Scholar
  12. 12.
    V. Stefan V. Živojnović, S Pees, and H. Myer, LISA-machine Description Language and Generic Machine Model for HW/SW Co-design, in Workshop on VLSI Signal Processing, pp. 127–136 (1996).Google Scholar
  13. 13.
    SystemC Initiative. ( Scholar
  14. 14.
    K. S. Chatha and R. Vemuri, A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems, in Proc. of the 11th International Symposium on System Synthesis, 1998, Hsinchu, pp. 145–151 (1998).Google Scholar
  15. 15.
    J. Jeon and K. Choi, Loop Pipelining in Hardware-Software Partitioning, in Design Automation Conference 1998. Proceedings of the ASP-DAC ’98. Asia and South Pacific, Yokohama, Japan, pp. 361–366 (1998).Google Scholar
  16. 16.
    T. Kodaka, K. Kimura, and H. Kasahara, Multigrain Parallel Processing for JPEG Encoding on a Single Chip Multiprocessor, in Proc. of the IWIA’02, pp. 57–63 (2002).Google Scholar
  17. 17.
    Banerjee S., Hamada T., Chau P.M., Fellman R.D. (1995). Macro Pipelining Based Scheduling on High Performance Heterogeneous Multiprocessor Systems. IEEE Trans. Signal Process. 43(6):1468–1484CrossRefGoogle Scholar
  18. 18.
    T. A. Giuma and K. W. Hart, Microcomputer Bus Architectures, in Southcon Conference, Orlando, FL, pp. 431–437 (1996).Google Scholar
  19. 19.
    Independent JPEG Group. IJG ( Scholar
  20. 20.
    Xtensa Processor. Tensilica Inc. ( Scholar
  21. 21.
    Sun F., Ravi S., Raghunathan A., Jha N.K. (2004). Custom-instruction synthesis for extensible-processor platforms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 23(2):216–228CrossRefGoogle Scholar
  22. 22.
    Flix: Fast relief for performance-hungry embedded applications, Tensilica Inc. ( (2005).Google Scholar
  23. 23.
    K.-C. Huang and F.-J. Wang, Design Patterns for Parallel Computations of Master-Slave Model, in Proc. of the International Conference on Information, Communications and Signal Processing, Vol. 3, pp. 1508–1512 (1997).Google Scholar
  24. 24.
    T. G. Lewis and H. El-Rewini, Introduction to Parallel Computing, Prentice Hall, Englewood Cliffs, NJ (1992).Google Scholar
  25. 25.
    E. Hamilton, JPEG File Interchange Format. Technical report, C-Cube Microsystems, September 1 (1992).Google Scholar
  26. 26.
    J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 3rd Ed., Morgan Kaufmann Publishers, Los Atlos, CA (2003).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Seng Lin Shee
    • 1
    • 2
  • Andrea Erdos
    • 1
  • Sri Parameswaran
    • 1
    • 2
  1. 1.School of Computer Science and EngineeringThe University of New South WalesSydneyAustralia
  2. 2.National Information and Communications Technology Australia (NICTA)SydneyAustralia

Personalised recommendations