Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

  • Peter OchsEmail author
  • Jalal Fadili
  • Thomas Brox


We propose a unifying algorithm for non-smooth non-convex optimization. The algorithm approximates the objective function by a convex model function and finds an approximate (Bregman) proximal point of the convex model. This approximate minimizer of the model function yields a descent direction, along which the next iterate is found. Complemented with an Armijo-like line search strategy, we obtain a flexible algorithm for which we prove (subsequential) convergence to a stationary point under weak assumptions on the growth of the model function error. Special instances of the algorithm with a Euclidean distance function are, for example, gradient descent, forward–backward splitting, ProxDescent, without the common requirement of a “Lipschitz continuous gradient”. In addition, we consider a broad class of Bregman distance functions (generated by Legendre functions), replacing the Euclidean distance. The algorithm has a wide range of applications including many linear and nonlinear inverse problems in signal/image processing and machine learning.


Bregman minimization Legendre function Model function Growth function Non-convex non-smooth Abstract algorithm 

Mathematics Subject Classification

49J52 65K05 65K10 90C26 



P. Ochs acknowledges funding by the German Research Foundation (DFG Grant OC 150/1-1).


  1. 1.
    Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. ArXiv e-prints (2016). ArXiv:1610.03446
  2. 2.
    Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Appl. Math. 16(6), 964–979 (1979)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Lewis, A., Wright, S.: A proximal method for composite minimization. Math. Program. 158(1–2), 501–546 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. ArXiv e-prints (2016). ArXiv:1602.06661
  5. 5.
    Noll, D., Prot, O., Apkarian, P.: A proximity control algorithm to minimize nonsmooth and nonconvex functions. Pac. J. Optim. 4(3), 571–604 (2008)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2013)zbMATHCrossRefGoogle Scholar
  7. 7.
    Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Burg, J.: The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics 37(2), 375–376 (1972)CrossRefGoogle Scholar
  9. 9.
    Bauschke, H., Borwein, J.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1), 27–67 (1997)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Bauschke, H., Borwein, J., Combettes, P.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3(4), 615–647 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Chen, G., Teboulle, M.: Convergence analysis of proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3, 538–543 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Bauschke, H., Borwein, J., Combettes, P.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42(2), 596–636 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Nguyen, Q.: Forward-backward splitting with Bregman distances. Vietnam J. Math. 45(3), 519–539 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013). MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014). MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. Soc. Ind. Appl. Math. 11, 431–441 (1963)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)zbMATHCrossRefGoogle Scholar
  20. 20.
    Rockafellar, R.T., Wets, R.B.: Variational Analysis, vol. 317. Springer, Heidelberg (1998). zbMATHCrossRefGoogle Scholar
  21. 21.
    Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. MIT Press, Cambridge (1986)zbMATHGoogle Scholar
  23. 23.
    Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Combettes, P., Dũng, D., Vũ, B.: Dualization of signal recovery problems. Set-Valued Var. Anal. 18(3–4), 373–404 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Probl. 25(12), 123,006 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Zanella, R., Boccacci, P., Zanni, L., Bertero, M.: Efficient gradient projection methods for edge-preserving removal of Poisson noise. Inverse Probl. 25(4) (2009)Google Scholar
  27. 27.
    Vardi, Y., Shepp, L., Kaufman, L.: A statistical model for positron emission tomography. J. Am. Stat. Assoc. 80(389), 8–20 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)zbMATHCrossRefGoogle Scholar
  29. 29.
    Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)CrossRefGoogle Scholar
  30. 30.
    Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  31. 31.
    Cichocki, A., Zdunek, R., Phan, A., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley, New York (2009)CrossRefGoogle Scholar
  32. 32.
    Chaudhuri, S., Velmurugan, R., Rameshan, R.: Blind Image Deconvolution. Springer, New York (2014)zbMATHGoogle Scholar
  33. 33.
    Starck, J.L., Murtagh, F., Fadili, J.: Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity, 2nd edn. Cambridge University Press, Cambridge (2015)zbMATHCrossRefGoogle Scholar
  34. 34.
    Xu, Y., Li, Z., Yang, J., Zhang, D.: A survey of dictionary learning algorithms for face recognition. IEEE Access 5, 8502–8514 (2017). CrossRefGoogle Scholar
  35. 35.
    Ahmed, A., Recht, B., Romberg, J.: Blind deconvolution using convex programming. IEEE Trans. Inf. Theory 60(3), 1711–1732 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    Lee, D., Seung, H.: Learning the part of objects from nonnegative matrix factorization. Nature 401, 788–791 (1999)zbMATHCrossRefGoogle Scholar
  37. 37.
    Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of \(\mathbb{R}^n\). J. Optim. Theory Appl. 50, 195–200 (1986)MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Olshausen, B., Field, D.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37, 3311–3325 (1996)CrossRefGoogle Scholar
  39. 39.
    Hoyer, P.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Applied optimization, vol. 87. Kluwer Academic Publishers, Boston, MA (2004)Google Scholar
  42. 42.
    Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  43. 43.
    Liang, J., Fadili, J., Peyré, G.: A multi-step inertial forward–backward splitting method for non-convex optimization. arXiv:1606.02118 [math] (2016)
  44. 44.
    Wen, B., Chen, X., Pong, T.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  45. 45.
    Drusvyatskiy, D., Kempton, C.: An accelerated algorithm for minimizing convex compositions. ArXiv e-prints (2016). ArXiv:1605.00125 [math]
  46. 46.
    Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  47. 47.
    Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du centre National de la Recherche Scientifique, Paris (1963)Google Scholar
  48. 48.
    Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’institut Fourier 43(5), 1575–1595 (1993)zbMATHMathSciNetCrossRefGoogle Scholar
  49. 49.
    Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006). MathSciNetzbMATHCrossRefGoogle Scholar
  50. 50.
    Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Saarland UniversitySaarbrückenGermany
  2. 2.Normandie Université ENSICAEN, CNRS, GREYCCaenFrance
  3. 3.University of FreiburgFreiburgGermany

Personalised recommendations