Skip to main content

Optimizing Memory Usage and Accesses on CUDA-Based Recurrent Pattern Matching Image Compression

  • Conference paper
Computational Science and Its Applications – ICCSA 2014 (ICCSA 2014)

Abstract

This paper reports the adaptation of the Multidimensional Multiscale Parser (MMP) algorithm to CUDA. Specifically, we focus on memory optimization issues, such as the layout of data structures in memory, the type of GPU memory – shared, constant and global – and on achieving coalesced accesses. MMP is a demanding lossy compression algorithm for images. For example, MMP requires nearly 9000 seconds to encode the 512 ×512 Lenna image on a 2013’s Intel Xeon. One of the main challenges to adapt MMP to manycore is related to the dependency over a pattern codebook which is built during the execution. This forces the input image to be processed sequentially. Nonetheless, CUDA-MMP achieves a 12× speedup over the sequential version when ran on an NVIDIA GTX 680. By further optimizing memory operations, the speedup is pushed to 17.1×.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Seltzer, M.L., Zhang, L.: The data deluge: Challenges and opportunities of unlimited data in statistical signal processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 3701–3704. IEEE (2009)

    Google Scholar 

  2. Murakami, T.: The development and standardization of ultra high definition video technology. In: Mrak, M., Grgic, M., Kunt, M. (eds.) High-Quality Visual Experience. Signals and Communication Technology, pp. 81–135. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Coughlin, T.: Evolving Storage Technology in Consumer Electronic Products (The Art of Storage). IEEE Consumer Electronics Magazine 2(2), 59–63 (2013)

    Article  Google Scholar 

  4. De Carvalho, M.B., Da Silva, E.A., Finamore, W.A.: Multidimensional signal compression using multiscale recurrent patterns. Signal Processing 82(11), 1559–1580 (2002)

    Article  MATH  Google Scholar 

  5. Rodrigues, N.M., da Silva, E.A., de Carvalho, M.B., de Faria, S.M., da Silva, V.M.M.: On dictionary adaptation for recurrent pattern image coding. IEEE Transactions on Image Processing 17(9), 1640–1653 (2008)

    Article  MathSciNet  Google Scholar 

  6. De Simone, F., Ouaret, M., Dufaux, F., Tescher, A.G., Ebrahimi, T.: A comparative study of JPEG2000, AVC/H.264, and HD photo, vol. 6696, pp. 669602–669602–12 (2007)

    Google Scholar 

  7. Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 451–460 (2010)

    Article  Google Scholar 

  8. de Verdiére, G.C.: Introduction to GPGPU, a hardware and software background. Comptes Rendus Mécanique 339(23), 78–89 (2011); High Performance Computing Le Calcul Intensif

    Google Scholar 

  9. Farber, R.: CUDA Application Design and Development. Morgan Kaufmann (2011)

    Google Scholar 

  10. Stone, J.E., Gohara, D., Shi, G.: Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science and Engineering 12(3), 66–73 (2010)

    Article  Google Scholar 

  11. Ozsoy, A., Swany, M., Chauhan, A.: Optimizing LZSS compression on GPGPUs. Future Generation Computer Systems 30, 170–178 (2014); Special Issue on Extreme Scale Parallel Architectures and Systems. In: Cryptography in Cloud Computing and Recent Advances in Parallel and Distributed Systems, ICPADS 2012 Selected Papers

    Google Scholar 

  12. Sodsong, W., Hong, J., Chung, S., Lim, Y., Kim, S.D., Burgstaller, B.: Dynamic partitioning-based jpeg decompression on heterogeneous multicore architectures. In: Proceedings of Programming Models and Applications on Multicores and Manycores, PMAM 2014, pp. 80:80–80:91. ACM, New York (2007)

    Google Scholar 

  13. Sung, I.J., Stratton, J.A., Hwu, W.M.W.: Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 513–522. ACM, New York (2010)

    Google Scholar 

  14. Jaeger, J., Barthou, D.: et al.: Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs. In: IEEE Proceedings of High Performance Computing Conference, pp. 1–10 (2012)

    Google Scholar 

  15. Sung, I., Liu, G., Hwu, W.: DL: A data layout transformation system for heterogeneous computing. In: Innovative Parallel Computing (InPar), pp. 1–11. IEEE (2012)

    Google Scholar 

  16. Mei, G., Tian, H.: Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation. ArXiv e-prints (February 2014)

    Google Scholar 

  17. Stamatopoulos, C., Chuang, T.Y., Fraser, C.S., Lu, Y.Y.: Fully automated image orientation in the absence of targets. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXIX-B5, 303–308 (2012)

    Google Scholar 

  18. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6, 40–53 (2008)

    Article  Google Scholar 

  19. Nvidia, C.: NVIDIA CUDA Programming Guide (version 5.5). NVIDIA Corporation (2013)

    Google Scholar 

  20. Wilt, N.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education (2013)

    Google Scholar 

  21. Nvidia, C.: NVIDIA CUDA C Best Practices Guide - CUDA Toolkit v5.5. NVIDIA Corporation (2013)

    Google Scholar 

  22. Kirk, D.B., Wen-mei, W.H.: Programming Massively Parallel Processors: a Hands-on Approach, 2nd edn. Newnes (2012)

    Google Scholar 

  23. Harris, M., et al.: Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2,  45 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Domingues, P., Silva, J., Ribeiro, T., Rodrigues, N.M.M., De Carvalho, M.B., De Faria, S.M.M. (2014). Optimizing Memory Usage and Accesses on CUDA-Based Recurrent Pattern Matching Image Compression. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8582. Springer, Cham. https://doi.org/10.1007/978-3-319-09147-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09147-1_41

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09146-4

  • Online ISBN: 978-3-319-09147-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics