Abstract
Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded programmable parallel processor context. Our main contribution in this context is the proposal of a formalized DTSE (data transfer and storage exploration) methodology, which allows to significantly reduce system bus load and hence overall system performance and also power consumption. We demonstrate the complementarity of this methodology by coupling the DTSE with a state-of-the-art performance optimizing and parallelizing compiler. Experiments on two real-life video and image processing applications show that this combined approach heavily reduces the memory accesses and bus-loading and hence power and also significantly reduces the total execution time. Decomposing the detailed parallelization and DTSE issues into two different stages is important to obtain the benefits of both the stages without exploding the complexity of solving all the issues simultaneously.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
A. Agarwal, D. Krantz, V. Nataranjan, “Automatic partitioning of parallel loops and data arrays for distributed shared-memory multiprocessors”, IEEE Trans. on Parallel and Distributed Systems, Vol.6, No.9, pp.943–962, Sep. 1995.
U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua, “Automatic program parallelization”, Proc. of the IEEE, invited paper, Vol.81, No.2, Feb. 1993.
M. Bister, Y. Taeymans, J. Cornelis, “Automatic Segmentation of Cardiac MR Images”, Computers in Cardiology, IEEE Computer Society Press, pp.215–218, 1989.
F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, L. Nachtergaele, A. Vandecappelle, “Custom Memory Management Methodology-Exploration of Memory Organization for Embedded Multimedia System Design”, ISBN 0-7923-8288-9, Kluwer Acad. Publ., Boston, 1998.
M. Cierniak, W. Li, “Unifying Data and Control Transformations for Distributed Shared-Memory Machines”, Proc. of the SIGPLAN’95_Conf. on Programming Language Design and Implementation, La Jolla, pp.205–217, Feb. 1995.
K. Danckaert, F. Catthoor and H. De Man, “System-level memory management for weakly parallel image processing”, In proc. EUROPAR-96, Lecture notes in computer science series, vol. 1124, Lyon, Aug 1996.
E. De Greef, F. Catthoor, H. De Man, “Program transformation strategies for reduced power and memory size in pseudo-regular multimedia applications”, accepted for publication in IEEE Trans. on Circuits and Systems for Video Technology, 1998.
S. Hummel and E. Schoenberg, “Low-overhead scheduling of nested parallelism”,IBM Journal of Research and Development, 1991.
C. Kulkarni, F. Catthoor, H. De Man, “Hardware cache optimization for parallel multimedia applications”, In Proc. of EuroPar’98, Southampton, pp. 923–931, Sept 1998
C. Kulkarni, D. Moolenaar, L. Nachtergaele, F. Catthoor, H De Man, “System-level energy-delay exploration for multi-media applications on embedded cores with hardware caches”, Accepted for Journal of VLSI Signal Processing, special issue on SIPS’97, No.19, Kluwer, Boston, pp., 1999.
S. Muchnick, “Advanced compiler design and implementation”, Morgan Kaufmann Publishers Inc., ISBN 1-55860-320-4, 1997.
M. Miranda, F. Catthoor, M. Janssen, H. De Man, “High-level Address Optimization and Synthesis Techniques for Data-Transfer Intensive Applications”, IEEE Trans. on VLSI Systems, Vol.7, No.1, March 1999.
J.Ph. Diguet, S. Wuytack, F. Catthoor, H. De Man, “Formalized methodology for data reuse exploration in hierarchical memory mappings”, In Proc. Int’l symposium on low power electronics and design, pp.30–36, Monterey, Ca., Aug 1997.
P. Pirsch, H-J. Stolberg, Y-K. Chen, S.Y. Kung, “Implementation of Media Processors”, IEEE Signal Processing Magazine, No.4, pp.48–51, July 1997.
P. Strobach, “QSDPCM-A New Technique in Scene Adaptive Coding,” Proc. 4th Eur. Signal Processing Conf., EUSIPCO-88, Grenoble, France, Elsevier Publ., Amsterdam, pp.1141–1144, Sep. 1988.
V. Tiwari, S. Malik and A. Wolfe, “Instruction level power analysis and optimization of software”, Journal of VLSI signal processing systems, vol. 13, pp.223–238, 1996.
P. Tu and D. Padua, “Automatic array privatization”, Proc. 6th Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
M.J. Wolfe, “Optimizing Supercompilers for Supercomputers”,The MIT Press, 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kulkarni, C., Danckaert, K., Catthoor, F., Gupta, M. (1999). Interaction Between Data Parallel Compilation and Data Transfer and Storage Cost Minimization for Multimedia Applications. In: Amestoy, P., et al. Euro-Par’99 Parallel Processing. Euro-Par 1999. Lecture Notes in Computer Science, vol 1685. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48311-X_94
Download citation
DOI: https://doi.org/10.1007/3-540-48311-X_94
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66443-7
Online ISBN: 978-3-540-48311-3
eBook Packages: Springer Book Archive