Interaction Between Data Parallel Compilation and Data Transfer and Storage Cost Minimization for Multimedia Applications

  • Chidamber Kulkarni
  • Koen Danckaert
  • Francky Catthoor
  • Manish Gupta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1685)


Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded programmable parallel processor context. Our main contribution in this context is the proposal of a formalized DTSE (data transfer and storage exploration) methodology, which allows to significantly reduce system bus load and hence overall system performance and also power consumption. We demonstrate the complementarity of this methodology by coupling the DTSE with a state-of-the-art performance optimizing and parallelizing compiler. Experiments on two real-life video and image processing applications show that this combined approach heavily reduces the memory accesses and bus-loading and hence power and also significantly reduces the total execution time. Decomposing the detailed parallelization and DTSE issues into two different stages is important to obtain the benefits of both the stages without exploding the complexity of solving all the issues simultaneously.


Total Execution Time Program Transformation Parallel Loop Cavity Detection Parallelize Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    A. Agarwal, D. Krantz, V. Nataranjan, “Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors”, IEEE Trans. on Parallel and Distributed Systems, Vol.6, No.9, pp.943–962, Sep. 1995.Google Scholar
  2. [2]
    U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua, “Automatic program parallelization”, Proc. of the IEEE, invited paper, Vol.81, No.2, Feb. 1993.Google Scholar
  3. [3]
    M. Bister, Y. Taeymans, J. Cornelis, “Automatic Segmentation of Cardiac MR Images”, Computers in Cardiology, IEEE Computer Society Press, pp.215–218, 1989.Google Scholar
  4. [4]
    F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, A. Vandecappelle, “Custom Memory Management Methodology-Exploration of Memory Organization for Embedded Multimedia System Design”, ISBN 0-7923-8288-9, Kluwer Acad. Publ., Boston, 1998.Google Scholar
  5. [5]
    M. Cierniak, W. Li, “Unifying Data and Control Transformations for Distributed Shared-Memory Machines”, Proc. of the SIGPLAN’95_Conf. on Programming Language Design and Implementation, La Jolla, pp.205–217, Feb. 1995.Google Scholar
  6. [6]
    K. Danckaert, F. Catthoor and H. De Man, “System-level memory management for weakly parallel image processing”, In proc. EUROPAR-96, Lecture notes in computer science series, vol. 1124, Lyon, Aug 1996.Google Scholar
  7. [7]
    E. De Greef, F. Catthoor, H. De Man, “Program transformation strategies for reduced power and memory size in pseudo-regular multimedia applications”, accepted for publication in IEEE Trans. on Circuits and Systems for Video Technology, 1998.Google Scholar
  8. [8]
    S. Hummel and E. Schoenberg, “Low-overhead scheduling of nested parallelism”,IBM Journal of Research and Development, 1991.Google Scholar
  9. [9]
    C. Kulkarni, F. Catthoor, H. De Man, “Hardware cache optimization for parallel multimedia applications”, In Proc. of EuroPar’98, Southampton, pp. 923–931, Sept 1998Google Scholar
  10. [10]
    C. Kulkarni, D. Moolenaar, L. Nachtergaele, F. Catthoor, H De Man, “System-level energy-delay exploration for multi-media applications on embedded cores with hardware caches”, Accepted for Journal of VLSI Signal Processing, special issue on SIPS’97, No.19, Kluwer, Boston, pp., 1999.Google Scholar
  11. [11]
    S. Muchnick, “Advanced compiler design and implementation”, Morgan Kaufmann Publishers Inc., ISBN 1-55860-320-4, 1997.Google Scholar
  12. [12]
    M. Miranda, F. Catthoor, M. Janssen, H. De Man, “High-level Address Optimization and Synthesis Techniques for Data-Transfer Intensive Applications”, IEEE Trans. on VLSI Systems, Vol.7, No.1, March 1999.Google Scholar
  13. [13]
    J.Ph. Diguet, S. Wuytack, F. Catthoor, H. De Man, “Formalized methodology for data reuse exploration in hierarchical memory mappings”, In Proc. Int’l symposium on low power electronics and design, pp.30–36, Monterey, Ca., Aug 1997.Google Scholar
  14. [14]
    P. Pirsch, H-J. Stolberg, Y-K. Chen, S.Y. Kung, “Implementation of Media Processors”, IEEE Signal Processing Magazine, No.4, pp.48–51, July 1997.Google Scholar
  15. [15]
    P. Strobach, “QSDPCM-A New Technique in Scene Adaptive Coding,” Proc. 4th Eur. Signal Processing Conf., EUSIPCO-88, Grenoble, France, Elsevier Publ., Amsterdam, pp.1141–1144, Sep. 1988.Google Scholar
  16. [16]
    V. Tiwari, S. Malik and A. Wolfe, “Instruction level power analysis and optimization of software”, Journal of VLSI signal processing systems, vol. 13, pp.223–238, 1996.Google Scholar
  17. [17]
    P. Tu and D. Padua, “Automatic array privatization”, Proc. 6th Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.Google Scholar
  18. [18]
    M.J. Wolfe, “Optimizing Supercompilers for Supercomputers”,The MIT Press, 1989.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Chidamber Kulkarni
    • 1
  • Koen Danckaert
    • 1
  • Francky Catthoor
    • 1
    • 2
  • Manish Gupta
    • 3
  1. 1.IMECLeuvenBelgium
  2. 2.Professor at the KatholiekeUniversiteit LeuvenLeuvenBelgium
  3. 3.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations