Implementation of Non Local Means Filter in GPUs

  • Adrián Márques
  • Alvaro Pardo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8258)


In this paper, we review some alternatives to reduce the computational complexity of the Non-Local Means image filter and present a CUDA-based implementation of it for GPUs, comparing its performance on different GPUs and with respect to reference CPU implementations. Starting from a naive CUDA implementation, we describe different aspects of CUDA and the algorithm itself that can be leveraged to decrease the execution time. Our GPU implementation achieved speedups of up to 35.8x with respect to our reduced-complexity reference implementation on the CPU, and more than 700x over a plain CPU implementation.


Image denoising Non-Local Means GPU CUDA 


  1. 1.
    Podlozhnyuk, V., Kharlamov, A.: Image convolution with CUDA. Technical report. NVIDIA, Inc., Santa Clara (2007)Google Scholar
  2. 2.
    Podlozhnyuk, V., Kharlamov, A.: Image denoising. Technical report. NVIDIA, Inc., Santa Clara (2007)Google Scholar
  3. 3.
    Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: CVPR, pp. 60–65 (2005)Google Scholar
  4. 4.
    Condat, L.: A simple trick to speed up the non-local means. Technical reportGoogle Scholar
  5. 5.
    Darbon, J., Cunha, A., Chan, T., Osher, S., Jensen, G.: Fast nonlocal filtering applied to electron cryomicroscopy. In: ISBI, pp. 1331–1334 (2008)Google Scholar
  6. 6.
    Goossens, B., Luong, H., Aelterman, J., Pižurica, A., Philips, W.: A GPU-accelerated real-time NLMeans algorithm for denoising color video sequences. In: Blanc-Talon, J., Bone, D., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2010, Part II. LNCS, vol. 6475, pp. 46–57. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Orchard, J., Ebrahimi, M., Wong, A.: Efficient nonlocal-means denoising using the SVD. In: ICIP, pp. 1732–1735 (2008)Google Scholar
  8. 8.
    Tasdizen, T.: Principal neighborhood dictionaries for nonlocal means image denoising. IEEE Trans. on Image Process. 18(12), 2649–2660 (2009)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Wu, H., Zhang, W.-H., Gao, D.-Z., Yin, X.-D., Chen, Y., Wang, W.-D.: Fast CT image processing using parallelized non-local means. Journal of Medical and Biological Eng. 31(6), 437–441 (2011)CrossRefGoogle Scholar
  10. 10.
    Mueller, K., Zheng, Z., Xu, W.: Performance tuning for CUDA-accelerated neighborhood denoising filters. In: Workshop on High Performance Image Reconstruction (July 2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Adrián Márques
    • 1
  • Alvaro Pardo
    • 1
  1. 1.Universidad Catolica del UruguayMontevideoUruguay

Personalised recommendations