Optimizing Memory Usage and Accesses on CUDA-Based Recurrent Pattern Matching Image Compression

Domingues, Patricio; Silva, João; Ribeiro, Tiago; Rodrigues, Nuno M. M.; De Carvalho, Murilo B.; De Faria, Sérgio M. M.

doi:10.1007/978-3-319-09147-1_41

Patricio Domingues²³,
João Silva²³,
Tiago Ribeiro²³,
Nuno M. M. Rodrigues^23,24,
Murilo B. De Carvalho²⁵ &
…
Sérgio M. M. De Faria^23,24

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8582))

Included in the following conference series:

International Conference on Computational Science and Its Applications

2854 Accesses
1 Citations

Abstract

This paper reports the adaptation of the Multidimensional Multiscale Parser (MMP) algorithm to CUDA. Specifically, we focus on memory optimization issues, such as the layout of data structures in memory, the type of GPU memory – shared, constant and global – and on achieving coalesced accesses. MMP is a demanding lossy compression algorithm for images. For example, MMP requires nearly 9000 seconds to encode the 512 ×512 Lenna image on a 2013’s Intel Xeon. One of the main challenges to adapt MMP to manycore is related to the dependency over a pattern codebook which is built during the execution. This forces the input image to be processed sequentially. Nonetheless, CUDA-MMP achieves a 12× speedup over the sequential version when ran on an NVIDIA GTX 680. By further optimizing memory operations, the speedup is pushed to 17.1×.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Seltzer, M.L., Zhang, L.: The data deluge: Challenges and opportunities of unlimited data in statistical signal processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 3701–3704. IEEE (2009)
Google Scholar
Murakami, T.: The development and standardization of ultra high definition video technology. In: Mrak, M., Grgic, M., Kunt, M. (eds.) High-Quality Visual Experience. Signals and Communication Technology, pp. 81–135. Springer, Heidelberg (2010)
Chapter Google Scholar
Coughlin, T.: Evolving Storage Technology in Consumer Electronic Products (The Art of Storage). IEEE Consumer Electronics Magazine 2(2), 59–63 (2013)
Article Google Scholar
De Carvalho, M.B., Da Silva, E.A., Finamore, W.A.: Multidimensional signal compression using multiscale recurrent patterns. Signal Processing 82(11), 1559–1580 (2002)
Article MATH Google Scholar
Rodrigues, N.M., da Silva, E.A., de Carvalho, M.B., de Faria, S.M., da Silva, V.M.M.: On dictionary adaptation for recurrent pattern image coding. IEEE Transactions on Image Processing 17(9), 1640–1653 (2008)
Article MathSciNet Google Scholar
De Simone, F., Ouaret, M., Dufaux, F., Tescher, A.G., Ebrahimi, T.: A comparative study of JPEG2000, AVC/H.264, and HD photo, vol. 6696, pp. 669602–669602–12 (2007)
Google Scholar
Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. SIGARCH Comput. Archit. News 38, 451–460 (2010)
Article Google Scholar
de Verdiére, G.C.: Introduction to GPGPU, a hardware and software background. Comptes Rendus Mécanique 339(23), 78–89 (2011); High Performance Computing Le Calcul Intensif
Google Scholar
Farber, R.: CUDA Application Design and Development. Morgan Kaufmann (2011)
Google Scholar
Stone, J.E., Gohara, D., Shi, G.: Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science and Engineering 12(3), 66–73 (2010)
Article Google Scholar
Ozsoy, A., Swany, M., Chauhan, A.: Optimizing LZSS compression on GPGPUs. Future Generation Computer Systems 30, 170–178 (2014); Special Issue on Extreme Scale Parallel Architectures and Systems. In: Cryptography in Cloud Computing and Recent Advances in Parallel and Distributed Systems, ICPADS 2012 Selected Papers
Google Scholar
Sodsong, W., Hong, J., Chung, S., Lim, Y., Kim, S.D., Burgstaller, B.: Dynamic partitioning-based jpeg decompression on heterogeneous multicore architectures. In: Proceedings of Programming Models and Applications on Multicores and Manycores, PMAM 2014, pp. 80:80–80:91. ACM, New York (2007)
Google Scholar
Sung, I.J., Stratton, J.A., Hwu, W.M.W.: Data layout transformation exploiting memory-level parallelism in structured grid many-core applications. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 513–522. ACM, New York (2010)
Google Scholar
Jaeger, J., Barthou, D.: et al.: Automatic efficient data layout for multithreaded stencil codes on CPUs and GPUs. In: IEEE Proceedings of High Performance Computing Conference, pp. 1–10 (2012)
Google Scholar
Sung, I., Liu, G., Hwu, W.: DL: A data layout transformation system for heterogeneous computing. In: Innovative Parallel Computing (InPar), pp. 1–11. IEEE (2012)
Google Scholar
Mei, G., Tian, H.: Performance Impact of Data Layout on the GPU-accelerated IDW Interpolation. ArXiv e-prints (February 2014)
Google Scholar
Stamatopoulos, C., Chuang, T.Y., Fraser, C.S., Lu, Y.Y.: Fully automated image orientation in the absence of targets. ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXIX-B5, 303–308 (2012)
Google Scholar
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable Parallel Programming with CUDA. Queue 6, 40–53 (2008)
Article Google Scholar
Nvidia, C.: NVIDIA CUDA Programming Guide (version 5.5). NVIDIA Corporation (2013)
Google Scholar
Wilt, N.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education (2013)
Google Scholar
Nvidia, C.: NVIDIA CUDA C Best Practices Guide - CUDA Toolkit v5.5. NVIDIA Corporation (2013)
Google Scholar
Kirk, D.B., Wen-mei, W.H.: Programming Massively Parallel Processors: a Hands-on Approach, 2nd edn. Newnes (2012)
Google Scholar
Harris, M., et al.: Optimizing parallel reduction in CUDA. NVIDIA Developer Technology 2, 45 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Management and Technology, Polytechnic Institute of Leiria, Leiria, Portugal
Patricio Domingues, João Silva, Tiago Ribeiro, Nuno M. M. Rodrigues & Sérgio M. M. De Faria
Instituto de Telecomunicações, Portugal
Nuno M. M. Rodrigues & Sérgio M. M. De Faria
TET/CTC, Universidade Federal Fluminense, Niterói, RJ, Brazil
Murilo B. De Carvalho

Authors

Patricio Domingues
View author publications
You can also search for this author in PubMed Google Scholar
João Silva
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Nuno M. M. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Murilo B. De Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Sérgio M. M. De Faria
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, University of Basilicata, 85100, Potenza, Italy
Beniamino Murgante
Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
Sanjay Misra
Department of Production and Systems, University of Minho, 4710-057, Braga, Portugal
Ana Maria A. C. Rocha
DICAR, Polytecnico di Bari, 70125, Bari, Italy
Carmelo Torre
University of Minho, Braga, Portugal
Jorge Gustavo Rocha & Maria Irene Falcão &
Monash University, 3800, Clayton, VIC, Australia
David Taniar
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, 813-8503, Higashi-ku, Fukuoka, Japan
Bernady O. Apduhan
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli, 1, 06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Domingues, P., Silva, J., Ribeiro, T., Rodrigues, N.M.M., De Carvalho, M.B., De Faria, S.M.M. (2014). Optimizing Memory Usage and Accesses on CUDA-Based Recurrent Pattern Matching Image Compression. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8582. Springer, Cham. https://doi.org/10.1007/978-3-319-09147-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-09147-1_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09146-4
Online ISBN: 978-3-319-09147-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics