A GPU Implementation of OLPCA Method in Hybrid Environment

  • Pasquale De Michele
  • Francesco Maiorano
  • Livia Marcellino
  • Francesco Piccialli
Article
Part of the following topical collections:
  1. Special Issue on Programming Models and Algorithms for Data Analysis in HPC Systems

Abstract

Sophisticated denoising algorithms are used to improve image quality in the Magnetic Resonance Imaging field. Of course, better results are obtained by implementing computationally expensive schemes. In this paper, we consider the Overcomplete Local Principal Component Analysis (OLPCA) method for image denoising and its main issues. More in detail, we investigated the impact of the Singular Value Decomposition on the OLPCA algorithm and its high computational cost. Moreover, we propose a fine-to-coarse parallelization strategy in order to exploit a parallel hybrid architecture and we implement a multilevel parallel software as a smart combination between codes using NVIDIA cuBLAS library for Graphic Processor Units (GPUs) and the standard Message Passing Interface library for cluster programming. Experimental results show improvements in terms of execution time with a promising speed up with respect to the CPU and our old GPU versions.

Keywords

Overcomplete local principal component analysis High performance computing Graphic processor units Hybrid architectures 

References

  1. 1.
    Abate, D., Ambrosino, F., Aprea, G., Bastianelli, T., Beone, F., Bertini, R., Bracco, G., Calosso, B., Caporicci, M., Chinnici, M., Colavincenzo, A., Cucurullo, A., D’Angelo, P., De Michele, P., De Rosa, M., Del Giudice, E., Funel, A., Furini, G., Giammattei, D., Giusepponi, S., Guadagni, R., Guarnieri, G., Italiano, A., Magagnino, S., Mariano, A., Mencuccini, G., Mercuri, C., Migliori, S., Ornelli, P., Palombi, F., Pecoraro, S., Perozziello, A., Pierattini, S., Podda, S., Poggi, F., Ponti, G., Quintiliani, A., Rocchi, A., Scio, C., Simoni, F., Vita, A.: The role of medium size facilities in the hpc ecosystem: the case of the new cresco4 cluster integrated in the eneagrid infrastructure. In: International Conference on High Performance Computing and Simulation, pp. 1030–1033, HPCS 2014, Bologna, Italy, 21–25 July (2014). doi: 10.1109/HPCSim.2014.6903807
  2. 2.
    Berry, M., Sameh, A.: Special issue on parallel algorithms for numerical linear algebra an overview of parallel algorithms for the singular value and symmetric eigenvalue problems. J. Comput. Appl. Math. 27(1), 191–213 (1989). doi: 10.1016/0377-0427(89)90366-X MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Buades, A., Coll, B., Morel, J.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 4(2), 490–530 (2005). doi: 10.1137/040616024 MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Buades, A., Coll, B., Morel, J.: Image denoising methods. A new nonlocal principle. SIAM Rev. 52(1), 113–147 (2010). doi: 10.1137/090773908 MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Bydder, M., Du, J.: Noise reduction in multiple-echo data sets using singular value decomposition. Magn. Reson. Imaging 24(7), 849–856 (2006). doi: 10.1016/j.mri.2006.03.006. http://www.sciencedirect.com/science/article/pii/S0730725X06001317
  6. 6.
    Cafieri, S., D’Apuzzo, M., De Simone, V., Di Serafino, D., Toraldo, G.: Convergence analysis of an inexact potential reduction method for convex quadratic programming. J. Optim. Theory Appl. 135(3), 355–366 (2007). doi: 10.1007/s10957-007-9264-3 MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Campagna, R., Crisci, S., Cuomo, S., De Michele, P., Galletti, A., Marcellino, L., Murano, A.: A novel split Bregman algorithm for MRI denoising task in an e-health system. In: ACM International Conference Proceeding Series. Proceedings of the 9th PETRA Conference will held on the Island of Corfu, Greece at the Corfu Holiday Palace Hotel from June 29 to July 1 (2016). doi: 10.1145/2910674.2910692. http://dl.acm.org/citation.cfm?doid=2910674.2910692
  8. 8.
    Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A gpu-parallel algorithm for ecg signal denoting based on the nlm method. In: 30th IEEE International Conference on Advanced Information Networking and Applications, AINA 2016, Crans-Montana, Switzerland, March 23–25, 2016, pp. 35–39 (2016). doi: 10.1109/WAINA.2016.110. http://doi.ieeecomputersociety.org/10.1109/WAINA.2016.110
  9. 9.
    Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A gpu parallel implementation of the local principal component analysis overcomplete method for dw image denoising. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 26–31 (2016). The Twenty-First IEEE Symposium on Computers and Communication, 27–30 June 2016, Messina, Italy. doi: 10.1109/ISCC.2016.7543709
  10. 10.
    Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: Local principal component analysis overcomplete method: a gpu parallel implementation combining shared and global memories. In: International Conference on High Performance Computing and Simulation, HPCS 2016, Innsbruck, Austria, July 18–22, 2016, pp. 81–87 (2016). doi: 10.1109/HPCSim.2016.7568319
  11. 11.
    Cuomo, S., De Michele, P., Galletti, A., Marcellino, L.: A parallel pde-based numerical algorithm for computing the optical flow in hybrid systems. J. Comput. Sci. (2017). doi: 10.1016/j.jocs.2017.03.011
  12. 12.
    Cuomo, S., De Michele, P., Maiorano, F., Marcellino, L.: Advances on P2P, parallel, grid, cloud and internet computing. Lecture Notes on Data Engineering and Communications Technologies, vol. 1, chap. GPU Profiling of Singular Value Decomposition in OLPCA Method for Image Denoising, pp. 707–716. Springer International Publishing (2017). doi: 10.1007/978-3-319-49109-7_68. Proceedings of the 11th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing 3PGCIC-2016 November 5–7, 2016, Soonchunhyang University, Asan, Korea. Online ISBN: 978-3-319-49109-7
  13. 13.
    Cuomo, S., Galletti, A., Giunta, G., Marcellino, L.: Toward a multi-level parallel framework on gpu cluster with petsc-cuda for pde-based optical flow computation. pp. 170–179 (2015). doi: 10.1016/j.procs.2015.05.220. http://www.scopus.com/inward/record.url?eid=2-s2.0-84939155665&partnerID=40&md5=ddcb2162cbc29925e582fc9498463059
  14. 14.
    Cuomo, S., Galletti, A., Marcellino, L.: A gpu algorithm in a distributed computing system for 3d MRI denoising. In: F. Xhafa, L. Barolli, F. Messina, M. R Ogilla (eds.) 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Krakow, Poland, pp. 557–562, November 4–6 (2015). doi: 10.1109/3PGCIC.2015.77
  15. 15.
    Cuomo, S., Michele, P.D., Piccialli, F.: 3d data denoising via nonlocal means filter by using parallel GPU strategies. Comput. Math. Methods Med. 523, 1–523. doi: 10.1155/2014/523862
  16. 16.
    D’Amore, L., Arcucci, R., Marcellino, L., Murli, A.: A parallel three-dimensional variational data assimilation scheme. AIP Conf. Proc. 1389(1), 1829–1831 (2011). doi: 10.1063/1.3636965
  17. 17.
    D’Amore, L., Laccetti, G., Romano, D., Scotti, G., Murli, A.: Towards a parallel component in a gpucuda environment: a case study with the l-bfgs harwell routine. Int. J. Comput. Math. 92(1), 59–76 (2015). doi: 10.1080/00207160.2014.899589 CrossRefMATHGoogle Scholar
  18. 18.
    de Angelis, P.L., Bomze, I.M., Toraldo, G.: Ellipsoidal approach to box-constrained quadratic problems. J. Glob. Optim. 28(1), 1–15 (2004). doi: 10.1023/B:JOGO.0000006654.34226.fe MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    D’Amore, L., Marcellino, L., Mele, V., Romano, D.: Deconvolution of 3d fluorescence microscopy images using graphics processing units. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7203 LNCS(PART 1), pp. 690–699 (2012). doi: 10.1007/978-3-642-31464-3_70
  20. 20.
    De Asmundis, R., di Serafino, D., Hager, W., Toraldo, G., Zhang, H.: An efficient gradient method using the yuan steplength. Comput. Optim. Appl. 59(3), 541–563 (2014). doi: 10.1007/s10589-014-9669-5 MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Gmez, S., Severino, G., Randazzo, L., Toraldo, G., Otero, J.: Identification of the hydraulic conductivity using a global optimization method. Agric. Water Manag. 96(3), 504–510 (2009). doi: 10.1016/j.agwat.2008.09.025 CrossRefGoogle Scholar
  22. 22.
    Laccetti, G., Lapegna, M., Mele, V., Romano, D.: A study on adaptive algorithms for numerical quadrature on heterogeneous gpu and multicore based systems. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8384 LNCS(PART 1), pp. 704–713 (2014). doi: 10.1007/978-3-642-55224-3_66
  23. 23.
    Manjón, J., Coupé, P., Concha, L., Buades, A., Collins, D., Robles, M.: Diffusion weighted image denoising using overcomplete local pca. PLoS ONE 8(9) (2013). doi: 10.1371/journal.pone.0073021. http://www.scopus.com/inward/record.url?eid=2-s2.0-84883366803&partnerID=40&md5=467a3af41b50d17486ab1385ccf8e816
  24. 24.
    Manjón, J.V., Coupé, P., Martí-Bonmatí, L., Collins, D.L., Robles, M.: Adaptive non-local means denoising of MR images with spatially varying noise levels. J. Magn. Reson. Imaging 31(1), 192–203 (2010). doi: 10.1002/jmri.22003. http://www.hal.inserm.fr/inserm-00454564
  25. 25.
    Muresan, D.D., Parks, T.W.: Orthogonal, exactly periodic subspace decomposition. IEEE Trans. Signal Process. 51(9), 2270–2279 (2003). doi: 10.1109/TSP.2003.815381 MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Palma, G., Piccialli, F., Michele, P.D., Cuomo, S., Comerci, M., Borrelli, P., Alfano, B.: 3d non-local means denoising via multi-gpu. In: Proceedings of the 2013 Federated Conference on Computer Science and Information Systems, Kraków, Poland, September 8–11, 2013, pp. 495–498 (2013). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6644045
  27. 27.
    Piccialli, F., Cuomo, S., De Michele, P.: A regularized mri image reconstruction based on hessian penalty term on cpu/gpu systems, pp. 2643–2646 (2013). doi: 10.1016/j.procs.2013.06.001. http://www.scopus.com/inward/record.url?eid=2-s2.0-84892506892&partnerID=40&md5=cc785a43da0426b134b5a4e05bc3ad5e
  28. 28.
    Poon, P., Wei-Ren, N., Sridharan, V.: Image denoising with singular value decompositon and principal component analysis. http://www.u.arizona.edu/~ppoon/ImageDenoisingWithSVD.pdf (2009)
  29. 29.
    Song, F., Dongarra, J.: A scalable approach to solving dense linear algebra problems on hybrid cpu–gpu systems. Concurr. Comput. 27(14), 3702–3723 (2015). doi: 10.1002/cpe.3403 CrossRefGoogle Scholar
  30. 30.
    Tristán-Vega, A., Aja-Fernández, S.: DWI filtering using joint information for DTI and HARDI. Med. Image Anal. 14(2), 205–218 (2010). doi: 10.1016/j.media.2009.11.001 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.University of Naples Federico IINaplesItaly
  2. 2.University of Naples ParthenopeNaplesItaly

Personalised recommendations