GPU acceleration of NL-means, BM3D and VBM3D


Denoising is an essential part of any image- or video-processing pipeline. Unfortunately, due to time-processing constraints, many pipelines do not consider the use of modern denoisers. These algorithms have only CPU implementations or suboptimal GPU implementations. We propose a new efficient GPU implementation of NL-means and BM3D, and, to our knowledge, the first GPU implementation of the video-denoising algorithm VBM3D. The performance of these implementations enable their use in real-time scenarios.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    Available on

  2. 2.


  1. 1.

    Ali, R.A., Hardie, R.C.: Recursive non-local means filter for video denoising. EURASIP JIVP 1, 29 (2017)

    Google Scholar 

  2. 2.

    AMD: AMD APP SDK OpenCLTM Optimization Guide (2015)

  3. 3.

    Arias, P., Facciolo, G., Morel, J.M.: A comparison of patch-based models in video denoising. In: IEEE IVMSP, pp. 1–5 (2018)

  4. 4.

    Arias, P., Morel, J.M.: Video denoising via empirical bayesian estimation of space-time patches. JMIV 60(1), 70–93 (2018)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Arias, P., Morel, J.M.: Kalman filtering of patches for frame-recursive video denoising. In: IEEE CVPRW (2019)

  6. 6.

    Aubert, G., Aujol, J.F.: A variational approach to removing multiplicative noise. SIAM SIIMS 68(4), 925–946 (2008)

    MathSciNet  MATH  Google Scholar 

  7. 7.

    Aujol, J.F., Aubert, G., Blanc-Féraud, L., Chambolle, A.: Image decomposition application to sar images. In: Springer Scale-Space, pp. 297–312 (2003)

  8. 8.

    Boulanger, J., Kervrann, C., Bouthemy, P., Elbau, P., Sibarita, J.B., Salamero, J.: Patch-based nonlocal functional for denoising fluorescence microscopy image sequences. IEEE TMI 29(2), 442–454 (2009)

    Google Scholar 

  9. 9.

    Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)

  10. 10.

    Briand, T., Davy, A.: Optimization of image B-spline interpolation for GPU architectures. IPOL 9, 183–204 (2019)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Brox, T., Kleinschmidt, O., Cremers, D.: Efficient nonlocal means for denoising of textural patterns. IEEE TIP 17(7), 1083–1092 (2008)

    MathSciNet  Google Scholar 

  12. 12.

    Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. IEEE CVPR 2, 60–65 (2005)

    MATH  Google Scholar 

  13. 13.

    Buades, A., Coll, B., Morel, J.M.: Non-local means denoising. IPOL 1, 208–212 (2011)

    MATH  Google Scholar 

  14. 14.

    Buades, A., Lisani, J.L., Miladinović, M.: Patch-based video denoising with optical flow estimation. IEEE TIP 25(6), 2573–2586 (2016)

    MathSciNet  MATH  Google Scholar 

  15. 15.

    Colom, M.: Multiscale noise estimation and removal for digital images. Ph.D. thesis, Universitat de les Illes Balears (2014)

  16. 16.

    Coupé, P., Hellier, P., Kervrann, C., Barillot, C.: Nonlocal means-based speckle filtering for ultrasound images. IEEE TIP 18(10), 2221–2229 (2009)

    MathSciNet  MATH  Google Scholar 

  17. 17.

    Coupé, P., Yger, P., Barillot, C.: Fast non local means denoising for 3d mr images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 33–40. Springer (2006)

  18. 18.

    Coupé, P., Yger, P., Prima, S., Hellier, P., Kervrann, C., Barillot, C.: An optimized blockwise nonlocal means denoising filter for 3-d magnetic resonance images. IEEE TMI 27(4), 425–441 (2008)

    Google Scholar 

  19. 19.

    Dabov, K., Foi, A., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE TIP 16(8), 2080–2095 (2007)

    MathSciNet  Google Scholar 

  20. 20.

    Dabov, K., Foi, A., Egiazarian, K.: Video denoising by sparse 3d transform-domain collaborative filtering. In: 2007 15th European Signal Processing Conference, pp. 145–149. IEEE (2007)

  21. 21.

    Davy, A., Ehret, T., Facciolo, G., Morel, J., Arias, P.: Non-local video denoising by CNN. CoRR arXiv:1811.12758 (2018)

  22. 22.

    Davy, A., Ehret, T., Facciolo, G., Morel, J., Arias, P.: A non-local cnn for video denoising. In: IEEE ICIP, pp. 2409–2413 (2019)

  23. 23.

    De Fontes, F.P.X., Barroso, G.A., Coupé, P., Hellier, P.: Real time ultrasound image denoising. J. Real-Time Image Process. 6(1), 15–22 (2011)

    Article  Google Scholar 

  24. 24.

    Duval, V., Aujol, J.F., Gousseau, Y.: On the parameter choice for the non-local means (2010)

  25. 25.

    Ehmann, J., Chu, L.C., Tsai, S.F., Liang, C.K.: Real-time video denoising on mobile phones. In: IEEE ICIP, pp. 505–509 (2018)

  26. 26.

    Ehret, T., Arias, P.: Implementation of the vbm3d video denoising method and some variants. CoRR arXiv:2001.01802 (2020)

  27. 27.

    Ehret, T., Arias, P., Morel, J.M.: Global patch search boosts video denoising. VISAPP 5, 124–134 (2017)

    Google Scholar 

  28. 28.

    Ehret, T., Davy, A., Morel, J.M., Facciolo, G., Arias, P.: Model-blind video denoising via frame-to-frame training. In: IEEE CVPR, pp. 11369–11378 (2019)

  29. 29.

    Ehret, T., Morel, J.M., Arias, P.: Non-local kalman: A recursive video denoising algorithm. In: IEEE ICIP, pp. 3204–3208 (2018)

  30. 30.

    Franzen, R.: Kodak lossless true color image suite. (1999)

  31. 31.

    Frosio, I., Kautz, J.: Statistical nearest neighbors for image denoising. IEEE TIP 28(2), 723–738 (2018)

    MathSciNet  MATH  Google Scholar 

  32. 32.

    Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. Multiscale Model Simul. 6(2), 595–630 (2007)

    MathSciNet  Article  Google Scholar 

  33. 33.

    Goossens, B., Luong, H., Aelterman, J., Pižurica, A., Philips, W.: A gpu-accelerated real-time NLmeans algorithm for denoising color video sequences. In: ACIVS, pp. 46–57. Springer (2010)

  34. 34.

    Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: IEEE CVPR, pp. 2862–2869 (2014)

  35. 35.

    Honzátko, D., Kruliš, M.: Accelerating block-matching and 3d filtering method for image denoising on GPUs. J. Real-Time Image Process. 16(6), 2273–2287 (2019)

    Article  Google Scholar 

  36. 36.

    Honzátko, D., Kruliš, M.: Cuda implementation of bm3d. (2018)

  37. 37.

    Jin, Q., Grama, I., Kervrann, C., Liu, Q.: Nonlocal means and optimal weights for noise removal. SIAM SIIMS 10(4), 1878–1920 (2017)

    MathSciNet  Article  Google Scholar 

  38. 38.

    Junkins, S.: The compute architecture of intel® processor graphics gen9 (2015)

  39. 39.

    Kervrann, C., Boulanger, J., Coupé, P.: Bayesian non-local means filter, image redundancy and adaptive dictionaries for noise removal. In: International Conference on Scale Space and Variational Methods in Computer Vision, pp. 520–532. Springer (2007)

  40. 40.

    Lebrun, M.: An analysis and implementation of the BM3D image denoising method. IPOL 2, 175–213 (2012)

    Article  Google Scholar 

  41. 41.

    Lebrun, M., Buades, A., Morel, J.M.: A nonlocal bayesian image denoising algorithm. SIAM SIIMS 6(3), 1665–1688 (2013)

    MathSciNet  Article  Google Scholar 

  42. 42.

    Lehtinen, J., Munkberg, J., Hasselgren, J., Laine, S., Karras, T., Aittala, M., Aila, T.: Noise2noise: Learning image restoration without clean data. In: International Conference on Machine Learning, pp. 2971–2980 (2018)

  43. 43.

    Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database: new challenges for image quality assessment models. IEEE TIP 26(2), 1004–1016 (2017)

    MathSciNet  MATH  Google Scholar 

  44. 44.

    Maggioni, M., Boracchi, G., Foi, A., Egiazarian, K.: Video denoising, deblocking, and enhancement through separable 4-D nonlocal spatiotemporal transforms. IEEE TIP 21(9), 3952–3966 (2012)

    MathSciNet  MATH  Google Scholar 

  45. 45.

    Mahmoudi, M., Sapiro, G.: Fast image and video denoising via nonlocal means of similar neighborhoods. IEEE SPL 12(12), 839–842 (2005)

    Google Scholar 

  46. 46.

    Makitalo, M., Foi, A.: Optimal inversion of the generalized anscombe transformation for Poisson–Gaussian noise. IEEE TIP 22(1), 91–103 (2012)

    MathSciNet  MATH  Google Scholar 

  47. 47.

    Márques, A., Pardo, A.: Implementation of non local means filter in GPUs. In: Iberoamerican Congress on Pattern Recognition, pp. 407–414. Springer (2013)

  48. 48.

    Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of 8th International Conference on Computer Vision, vol. 2, pp. 416–423 (2001)

  49. 49.

    NVIDIA: NVIDIA OpenCL Best Practices Guide (2009)

  50. 50.

    Pfleger, S.G., Plentz, P.D.M., Rocha, R.C.O., Pereira, A.D., Castro, M.: Real-time video denoising on multicores and gpus with kalman-based and bilateral filters fusion. J. of Real-Time Image Process. 16(5), 1629–1642 (2017)

    Article  Google Scholar 

  51. 51.

    Sutour, C., Deledalle, C.A., Aujol, J.F.: Adaptive regularization of the NL-means: application to image and video denoising. IEEE TIP 23(8), 3506–3521 (2014)

    MathSciNet  MATH  Google Scholar 

  52. 52.

    Wang, J., Guo, Y., Ying, Y., Liu, Y., Peng, Q.: Fast non-local algorithm for image denoising. In: IEEE ICIP, pp. 1429–1432 (2006)

  53. 53.

    Wang, T., Sun, Y.: GPU-accelerated denoising with bm3d. (2017)

  54. 54.

    Wang, X., Xu, K., Wang, D.: Accelerating block-matching and 3d filtering-based image denoising algorithm on fpgas. In: IEEE ICSP, pp. 235–240. IEEE (2018)

  55. 55.

    Zhang, K., Zuo, W., Chen, Y., Meng, D., Zhang, L.: Beyond a gaussian denoiser: residual learning of deep cnn for image denoising. IEEE TIP 26(7), 3142–3155 (2017)

    MathSciNet  MATH  Google Scholar 

  56. 56.

    Zhang, K., Zuo, W., Zhang, L.: Ffdnet: toward a fast and flexible solution for cnn-based image denoising. IEEE TIP 27(9), 4608–4622 (2018)

    MathSciNet  Google Scholar 

Download references


The authors gratefully thank Jean-Michel Morel for his valuable feedbacks. Work partly financed by IDEX Paris-Saclay IDI 2016, ANR-11-IDEX-0003-02, Office of Naval research grant N00014-17-1-2552, DGA Astrid project «filmer la Terre» no ANR-17-ASTR-0013-01, MENRT and Fondation Mathématique Jacques Hadamard.

Author information



Corresponding author

Correspondence to Axel Davy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Davy, A., Ehret, T. GPU acceleration of NL-means, BM3D and VBM3D. J Real-Time Image Proc 18, 57–74 (2021).

Download citation


  • Image denoising
  • Video denoising
  • OpenCL
  • GPU
  • NL-means
  • BM3D
  • VBM3D