2-D Wavelet Transform Enhancement on General- Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation
This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code).
KeywordsDiscrete Wavelet Transform Versus Versus Versus Versus Versus Single Instruction Multiple Data Memory Hierarchy Memory Access Pattern
Unable to display preview. Download preview PDF.
- E. J. Stollnitz, T. D. DeRose and D. H. Salesin. Wavelets for Computer Graphics: Theory and Applications. The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling, Morgan Kaufmann Publishers, Inc. San Francisco, CA, 1996.Google Scholar
- S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra and M. Thottethodi. Nonlinear Array Layouts for Hierarchical Memory Systems. Proceedings of 1999 ACM International Conference on Supercomputing, pp. 444–453, Rhodes, Greece, June 1999.Google Scholar
- P. Meerwald, R. Norcen, and A. Uhl. Cache issues with JPEG2000 wavelet lifting. In proceedings of 2002 Visual Communications and Image Processing (VCIP’02), volume 4671 of SPIE Proceedings, San Jose, CA, USA, January 2002.Google Scholar
- D. Chaver, M. Prieto, L. Piñuel, F. Tirado. Parallel Wavelet Transform for Large Scale Image Processing. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’2002). Florida, USA, April 2002.Google Scholar
- D. Chaver, C. Tenllado, L. Piñuel, M. Prieto and F. Tirado. Wavelet Transform for Large Scale Image Processing on Modern Microprocessors. To be published in the proceedings of Vecpar 2002, Porto, Portugal, June, 2002.Google Scholar
- M. Vishwanath, The recursive pyramid algorithm for the discrete wavelet transform. IEEE Trans. Signal Processing, vol. 42, pp. 673–676, March 1994.Google Scholar
- D. Chaver, C. Tenllado, L. Piñuel, M. Prieto and F. Tirado. Vectorizing the Wavelet Transform on the Intel Pentium-III and Pentium-4 Microprocessors. Technical Report 02-001. Dept. of Computer Architecture. Complutense University, 2002.Google Scholar
- K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour and T. Spencer. Enduser Tools for Application Performance Analysis, Using Hardware Counters. Presented at International Conference on Parallel and Distributed Computing Systems. August 2001.Google Scholar
- C. Chakrabarti and C. Mumford. Efficient realizations of encoders and decoders based on the 2-D discrete wavelet transforms. IEEE Trans. VLSI Syst., pp. 289–298, September 1999.Google Scholar
- T. Denk and K. Parhi. LSI Architectures for Lattice Structure Based Orthonormal Discrete Wavelet Transforms. IEEE Trans. Circuits and Systems, vol. 44, pp. 129–132, February 1997.Google Scholar
- L. Yang and M. Misra. Coarse-Grained Parallel Algorithms for Multi-Dimensional Wavelet Transforms. The journal of Supercomputing 11:1–22, 1997.Google Scholar
- M. Feil and A. Uhl. Multicomputer algorithms for wavelet packet image decomposition. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’2000), pages 793–798, Cancun, Mexico, 2000. IEEE Computer Society.Google Scholar
- Intel Corp. Real and Complex FIR Filter Using Streaming SIMD Extensions. Intel Application Note AP-809. Available at http://developer.intel.com.