Skip to main content
Log in

A GPU Implementation of OLPCA Method in Hybrid Environment

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Sophisticated denoising algorithms are used to improve image quality in the Magnetic Resonance Imaging field. Of course, better results are obtained by implementing computationally expensive schemes. In this paper, we consider the Overcomplete Local Principal Component Analysis (OLPCA) method for image denoising and its main issues. More in detail, we investigated the impact of the Singular Value Decomposition on the OLPCA algorithm and its high computational cost. Moreover, we propose a fine-to-coarse parallelization strategy in order to exploit a parallel hybrid architecture and we implement a multilevel parallel software as a smart combination between codes using NVIDIA cuBLAS library for Graphic Processor Units (GPUs) and the standard Message Passing Interface library for cluster programming. Experimental results show improvements in terms of execution time with a promising speed up with respect to the CPU and our old GPU versions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.nvidia.com/.

  2. http://www.culatools.com/.

  3. https://developer.nvidia.com/cublas.

  4. http://www.mcs.anl.gov/research/projects/mpi/.

  5. Available at http://brainweb.bic.mni.mcgill.ca/brainweb/.

References

  1. Abate, D., Ambrosino, F., Aprea, G., Bastianelli, T., Beone, F., Bertini, R., Bracco, G., Calosso, B., Caporicci, M., Chinnici, M., Colavincenzo, A., Cucurullo, A., D’Angelo, P., De Michele, P., De Rosa, M., Del Giudice, E., Funel, A., Furini, G., Giammattei, D., Giusepponi, S., Guadagni, R., Guarnieri, G., Italiano, A., Magagnino, S., Mariano, A., Mencuccini, G., Mercuri, C., Migliori, S., Ornelli, P., Palombi, F., Pecoraro, S., Perozziello, A., Pierattini, S., Podda, S., Poggi, F., Ponti, G., Quintiliani, A., Rocchi, A., Scio, C., Simoni, F., Vita, A.: The role of medium size facilities in the hpc ecosystem: the case of the new cresco4 cluster integrated in the eneagrid infrastructure. In: International Conference on High Performance Computing and Simulation, pp. 1030–1033, HPCS 2014, Bologna, Italy, 21–25 July (2014). doi:10.1109/HPCSim.2014.6903807

  2. Berry, M., Sameh, A.: Special issue on parallel algorithms for numerical linear algebra an overview of parallel algorithms for the singular value and symmetric eigenvalue problems. J. Comput. Appl. Math. 27(1), 191–213 (1989). doi:10.1016/0377-0427(89)90366-X

    Article  MathSciNet  MATH  Google Scholar 

  3. Buades, A., Coll, B., Morel, J.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005). doi:10.1137/040616024

    Article  MathSciNet  MATH  Google Scholar 

  4. Buades, A., Coll, B., Morel, J.: Image denoising methods. A new nonlocal principle. SIAM Rev. 52(1), 113–147 (2010). doi:10.1137/090773908

    Article  MathSciNet  MATH  Google Scholar 

  5. Bydder, M., Du, J.: Noise reduction in multiple-echo data sets using singular value decomposition. Magn. Reson. Imaging 24(7), 849–856 (2006). doi:10.1016/j.mri.2006.03.006. http://www.sciencedirect.com/science/article/pii/S0730725X06001317

  6. Cafieri, S., D’Apuzzo, M., De Simone, V., Di Serafino, D., Toraldo, G.: Convergence analysis of an inexact potential reduction method for convex quadratic programming. J. Optim. Theory Appl. 135(3), 355–366 (2007). doi:10.1007/s10957-007-9264-3

    Article  MathSciNet  MATH  Google Scholar 

  7. Campagna, R., Crisci, S., Cuomo, S., De Michele, P., Galletti, A., Marcellino, L., Murano, A.: A novel split Bregman algorithm for MRI denoising task in an e-health system. In: ACM International Conference Proceeding Series. Proceedings of the 9th PETRA Conference will held on the Island of Corfu, Greece at the Corfu Holiday Palace Hotel from June 29 to July 1 (2016). doi:10.1145/2910674.2910692. http://dl.acm.org/citation.cfm?doid=2910674.2910692

  8. Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A gpu-parallel algorithm for ecg signal denoting based on the nlm method. In: 30th IEEE International Conference on Advanced Information Networking and Applications, AINA 2016, Crans-Montana, Switzerland, March 23–25, 2016, pp. 35–39 (2016). doi:10.1109/WAINA.2016.110. http://doi.ieeecomputersociety.org/10.1109/WAINA.2016.110

  9. Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A gpu parallel implementation of the local principal component analysis overcomplete method for dw image denoising. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 26–31 (2016). The Twenty-First IEEE Symposium on Computers and Communication, 27–30 June 2016, Messina, Italy. doi:10.1109/ISCC.2016.7543709

  10. Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: Local principal component analysis overcomplete method: a gpu parallel implementation combining shared and global memories. In: International Conference on High Performance Computing and Simulation, HPCS 2016, Innsbruck, Austria, July 18–22, 2016, pp. 81–87 (2016). doi:10.1109/HPCSim.2016.7568319

  11. Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A parallel pde-based numerical algorithm for computing the optical flow in hybrid systems. J. Comput. Sci. (2017). doi:10.1016/j.jocs.2017.03.011

  12. Cuomo, S., De Michele, P., Maiorano, F., Marcellino, L.: Advances on P2P, parallel, grid, cloud and internet computing. Lecture Notes on Data Engineering and Communications Technologies, vol. 1, chap. GPU Profiling of Singular Value Decomposition in OLPCA Method for Image Denoising, pp. 707–716. Springer International Publishing (2017). doi:10.1007/978-3-319-49109-7_68. Proceedings of the 11th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing 3PGCIC-2016 November 5–7, 2016, Soonchunhyang University, Asan, Korea. Online ISBN: 978-3-319-49109-7

  13. Cuomo, S., Galletti, A., Giunta, G., Marcellino, L.: Toward a multi-level parallel framework on gpu cluster with petsc-cuda for pde-based optical flow computation. pp. 170–179 (2015). doi:10.1016/j.procs.2015.05.220. http://www.scopus.com/inward/record.url?eid=2-s2.0-84939155665&partnerID=40&md5=ddcb2162cbc29925e582fc9498463059

  14. Cuomo, S., Galletti, A., Marcellino, L.: A gpu algorithm in a distributed computing system for 3d MRI denoising. In: F. Xhafa, L. Barolli, F. Messina, M. R Ogilla (eds.) 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Krakow, Poland, pp. 557–562, November 4–6 (2015). doi:10.1109/3PGCIC.2015.77

  15. Cuomo, S., Michele, P.D., Piccialli, F.: 3d data denoising via nonlocal means filter by using parallel GPU strategies. Comput. Math. Methods Med. 523, 1–523. doi:10.1155/2014/523862

  16. D’Amore, L., Arcucci, R., Marcellino, L., Murli, A.: A parallel three-dimensional variational data assimilation scheme. AIP Conf. Proc. 1389(1), 1829–1831 (2011). doi:10.1063/1.3636965

  17. D’Amore, L., Laccetti, G., Romano, D., Scotti, G., Murli, A.: Towards a parallel component in a gpucuda environment: a case study with the l-bfgs harwell routine. Int. J. Comput. Math. 92(1), 59–76 (2015). doi:10.1080/00207160.2014.899589

    Article  MATH  Google Scholar 

  18. de Angelis, P.L., Bomze, I.M., Toraldo, G.: Ellipsoidal approach to box-constrained quadratic problems. J. Glob. Optim. 28(1), 1–15 (2004). doi:10.1023/B:JOGO.0000006654.34226.fe

    Article  MathSciNet  MATH  Google Scholar 

  19. D’Amore, L., Marcellino, L., Mele, V., Romano, D.: Deconvolution of 3d fluorescence microscopy images using graphics processing units. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7203 LNCS(PART 1), pp. 690–699 (2012). doi:10.1007/978-3-642-31464-3_70

  20. De Asmundis, R., di Serafino, D., Hager, W., Toraldo, G., Zhang, H.: An efficient gradient method using the yuan steplength. Comput. Optim. Appl. 59(3), 541–563 (2014). doi:10.1007/s10589-014-9669-5

    Article  MathSciNet  MATH  Google Scholar 

  21. Gmez, S., Severino, G., Randazzo, L., Toraldo, G., Otero, J.: Identification of the hydraulic conductivity using a global optimization method. Agric. Water Manag. 96(3), 504–510 (2009). doi:10.1016/j.agwat.2008.09.025

    Article  Google Scholar 

  22. Laccetti, G., Lapegna, M., Mele, V., Romano, D.: A study on adaptive algorithms for numerical quadrature on heterogeneous gpu and multicore based systems. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8384 LNCS(PART 1), pp. 704–713 (2014). doi:10.1007/978-3-642-55224-3_66

  23. Manjón, J., Coupé, P., Concha, L., Buades, A., Collins, D., Robles, M.: Diffusion weighted image denoising using overcomplete local pca. PLoS ONE 8(9) (2013). doi:10.1371/journal.pone.0073021. http://www.scopus.com/inward/record.url?eid=2-s2.0-84883366803&partnerID=40&md5=467a3af41b50d17486ab1385ccf8e816

  24. Manjón, J.V., Coupé, P., Martí-Bonmatí, L., Collins, D.L., Robles, M.: Adaptive non-local means denoising of MR images with spatially varying noise levels. J. Magn. Reson. Imaging 31(1), 192–203 (2010). doi:10.1002/jmri.22003. http://www.hal.inserm.fr/inserm-00454564

  25. Muresan, D.D., Parks, T.W.: Orthogonal, exactly periodic subspace decomposition. IEEE Trans. Signal Process. 51(9), 2270–2279 (2003). doi:10.1109/TSP.2003.815381

    Article  MathSciNet  MATH  Google Scholar 

  26. Palma, G., Piccialli, F., Michele, P.D., Cuomo, S., Comerci, M., Borrelli, P., Alfano, B.: 3d non-local means denoising via multi-gpu. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, Kraków, Poland, September 8–11, 2013, pp. 495–498 (2013). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6644045

  27. Piccialli, F., Cuomo, S., De Michele, P.: A regularized mri image reconstruction based on hessian penalty term on cpu/gpu systems, pp. 2643–2646 (2013). doi:10.1016/j.procs.2013.06.001. http://www.scopus.com/inward/record.url?eid=2-s2.0-84892506892&partnerID=40&md5=cc785a43da0426b134b5a4e05bc3ad5e

  28. Poon, P., Wei-Ren, N., Sridharan, V.: Image denoising with singular value decompositon and principal component analysis. http://www.u.arizona.edu/~ppoon/ImageDenoisingWithSVD.pdf (2009)

  29. Song, F., Dongarra, J.: A scalable approach to solving dense linear algebra problems on hybrid cpu–gpu systems. Concurr. Comput. 27(14), 3702–3723 (2015). doi:10.1002/cpe.3403

    Article  Google Scholar 

  30. Tristán-Vega, A., Aja-Fernández, S.: DWI filtering using joint information for DTI and HARDI. Med. Image Anal. 14(2), 205–218 (2010). doi:10.1016/j.media.2009.11.001

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Livia Marcellino.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Michele, P., Maiorano, F., Marcellino, L. et al. A GPU Implementation of OLPCA Method in Hybrid Environment. Int J Parallel Prog 46, 528–542 (2018). https://doi.org/10.1007/s10766-017-0505-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-017-0505-2

Keywords

Navigation