2-D Wavelet Transform Enhancement on General- Purpose Microprocessors: Memory Hierarchy and SIMD Parallelism Exploitation

  • Daniel Chaver
  • Christian Tenllado
  • Luis Piñuel
  • Manuel Prieto
  • Francisco Tirado
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2552)


This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code).


Discrete Wavelet Transform Versus Versus Versus Versus Versus Single Instruction Multiple Data Memory Hierarchy Memory Access Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Z. Zhang and R. S. Blum. A Categorization of Multiscale-Decomposition-Based Image Fusion Schemes with a Performance Study for a Digital Camera Application. Proceeding of the IEEE, Vol. 87(8): 1315–1325, August 1999.CrossRefGoogle Scholar
  2. [2]
    E. J. Stollnitz, T. D. DeRose and D. H. Salesin. Wavelets for Computer Graphics: Theory and Applications. The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling, Morgan Kaufmann Publishers, Inc. San Francisco, CA, 1996.Google Scholar
  3. [3]
    S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra and M. Thottethodi. Nonlinear Array Layouts for Hierarchical Memory Systems. Proceedings of 1999 ACM International Conference on Supercomputing, pp. 444–453, Rhodes, Greece, June 1999.Google Scholar
  4. [4]
    P. Meerwald, R. Norcen, and A. Uhl. Cache issues with JPEG2000 wavelet lifting. In proceedings of 2002 Visual Communications and Image Processing (VCIP’02), volume 4671 of SPIE Proceedings, San Jose, CA, USA, January 2002.Google Scholar
  5. [5]
    D. Chaver, M. Prieto, L. Piñuel, F. Tirado. Parallel Wavelet Transform for Large Scale Image Processing. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’2002). Florida, USA, April 2002.Google Scholar
  6. [6]
    D. Chaver, C. Tenllado, L. Piñuel, M. Prieto and F. Tirado. Wavelet Transform for Large Scale Image Processing on Modern Microprocessors. To be published in the proceedings of Vecpar 2002, Porto, Portugal, June, 2002.Google Scholar
  7. [7]
    C. Chrysafis and A. Ortega. Line Based Reduced Memory Wavelet Image Compression. IEEE Trans. on Image Processing, Vol 9, No 3, pp. 378–389, March 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  8. [8]
    M. Vishwanath, The recursive pyramid algorithm for the discrete wavelet transform. IEEE Trans. Signal Processing, vol. 42, pp. 673–676, March 1994.Google Scholar
  9. [10]
    D. Chaver, C. Tenllado, L. Piñuel, M. Prieto and F. Tirado. Vectorizing the Wavelet Transform on the Intel Pentium-III and Pentium-4 Microprocessors. Technical Report 02-001. Dept. of Computer Architecture. Complutense University, 2002.Google Scholar
  10. [11]
    K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour and T. Spencer. Enduser Tools for Application Performance Analysis, Using Hardware Counters. Presented at International Conference on Parallel and Distributed Computing Systems. August 2001.Google Scholar
  11. [13]
    C. Chakrabarti and C. Mumford. Efficient realizations of encoders and decoders based on the 2-D discrete wavelet transforms. IEEE Trans. VLSI Syst., pp. 289–298, September 1999.Google Scholar
  12. [14]
    T. Denk and K. Parhi. LSI Architectures for Lattice Structure Based Orthonormal Discrete Wavelet Transforms. IEEE Trans. Circuits and Systems, vol. 44, pp. 129–132, February 1997.Google Scholar
  13. [15]
    M. Holmström. Parallelizing the fast wavelet transform. Parallel Computing, 11(21): 1837–1848, April 1995.CrossRefGoogle Scholar
  14. [16]
    O.M. Nielsen and M. Hegland. Parallel Performance of Fast Wavelet Transform. International Journal of High Speed Computing, 11 (1): 55–73, June 2000.zbMATHCrossRefGoogle Scholar
  15. [17]
    L. Yang and M. Misra. Coarse-Grained Parallel Algorithms for Multi-Dimensional Wavelet Transforms. The journal of Supercomputing 11:1–22, 1997.Google Scholar
  16. [18]
    M. Feil and A. Uhl. Multicomputer algorithms for wavelet packet image decomposition. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’2000), pages 793–798, Cancun, Mexico, 2000. IEEE Computer Society.Google Scholar
  17. [19]
    Intel Corp. Real and Complex FIR Filter Using Streaming SIMD Extensions. Intel Application Note AP-809. Available at

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Daniel Chaver
    • 1
  • Christian Tenllado
    • 1
  • Luis Piñuel
    • 1
  • Manuel Prieto
    • 1
  • Francisco Tirado
    • 1
  1. 1.Departamento de Arquitectura de Computadores y Automatica,Facultad de Ciencias FisicasUniversidad ComplutenseMadridSpain

Personalised recommendations