Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

  • Peter OchsEmail author
  • Jalal Fadili
  • Thomas Brox


We propose a unifying algorithm for non-smooth non-convex optimization. The algorithm approximates the objective function by a convex model function and finds an approximate (Bregman) proximal point of the convex model. This approximate minimizer of the model function yields a descent direction, along which the next iterate is found. Complemented with an Armijo-like line search strategy, we obtain a flexible algorithm for which we prove (subsequential) convergence to a stationary point under weak assumptions on the growth of the model function error. Special instances of the algorithm with a Euclidean distance function are, for example, gradient descent, forward–backward splitting, ProxDescent, without the common requirement of a “Lipschitz continuous gradient”. In addition, we consider a broad class of Bregman distance functions (generated by Legendre functions), replacing the Euclidean distance. The algorithm has a wide range of applications including many linear and nonlinear inverse problems in signal/image processing and machine learning.


Bregman minimization Legendre function Model function Growth function Non-convex non-smooth Abstract algorithm 

Mathematics Subject Classification

49J52 65K05 65K10 90C26 



P. Ochs acknowledges funding by the German Research Foundation (DFG Grant OC 150/1-1).


  1. 1.
    Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. ArXiv e-prints (2016). ArXiv:1610.03446
  2. 2.
    Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Appl. Math. 16(6), 964–979 (1979)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Lewis, A., Wright, S.: A proximal method for composite minimization. Math. Program. 158(1–2), 501–546 (2016)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. ArXiv e-prints (2016). ArXiv:1602.06661
  5. 5.
    Noll, D., Prot, O., Apkarian, P.: A proximity control algorithm to minimize nonsmooth and nonconvex functions. Pac. J. Optim. 4(3), 571–604 (2008)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2013)zbMATHGoogle Scholar
  7. 7.
    Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Burg, J.: The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics 37(2), 375–376 (1972)Google Scholar
  9. 9.
    Bauschke, H., Borwein, J.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1), 27–67 (1997)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Bauschke, H., Borwein, J., Combettes, P.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3(4), 615–647 (2001)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Chen, G., Teboulle, M.: Convergence analysis of proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3, 538–543 (1993)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Bauschke, H., Borwein, J., Combettes, P.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42(2), 596–636 (2003)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Nguyen, Q.: Forward-backward splitting with Bregman distances. Vietnam J. Math. 45(3), 519–539 (2017)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013). MathSciNetzbMATHGoogle Scholar
  17. 17.
    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014). MathSciNetzbMATHGoogle Scholar
  18. 18.
    Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. Soc. Ind. Appl. Math. 11, 431–441 (1963)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)zbMATHGoogle Scholar
  20. 20.
    Rockafellar, R.T., Wets, R.B.: Variational Analysis, vol. 317. Springer, Heidelberg (1998). zbMATHGoogle Scholar
  21. 21.
    Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. MIT Press, Cambridge (1986)zbMATHGoogle Scholar
  23. 23.
    Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Combettes, P., Dũng, D., Vũ, B.: Dualization of signal recovery problems. Set-Valued Var. Anal. 18(3–4), 373–404 (2010)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Probl. 25(12), 123,006 (2009)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Zanella, R., Boccacci, P., Zanni, L., Bertero, M.: Efficient gradient projection methods for edge-preserving removal of Poisson noise. Inverse Probl. 25(4) (2009)Google Scholar
  27. 27.
    Vardi, Y., Shepp, L., Kaufman, L.: A statistical model for positron emission tomography. J. Am. Stat. Assoc. 80(389), 8–20 (1985)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)zbMATHGoogle Scholar
  29. 29.
    Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)Google Scholar
  30. 30.
    Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Cichocki, A., Zdunek, R., Phan, A., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley, New York (2009)Google Scholar
  32. 32.
    Chaudhuri, S., Velmurugan, R., Rameshan, R.: Blind Image Deconvolution. Springer, New York (2014)zbMATHGoogle Scholar
  33. 33.
    Starck, J.L., Murtagh, F., Fadili, J.: Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity, 2nd edn. Cambridge University Press, Cambridge (2015)zbMATHGoogle Scholar
  34. 34.
    Xu, Y., Li, Z., Yang, J., Zhang, D.: A survey of dictionary learning algorithms for face recognition. IEEE Access 5, 8502–8514 (2017). Google Scholar
  35. 35.
    Ahmed, A., Recht, B., Romberg, J.: Blind deconvolution using convex programming. IEEE Trans. Inf. Theory 60(3), 1711–1732 (2014)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Lee, D., Seung, H.: Learning the part of objects from nonnegative matrix factorization. Nature 401, 788–791 (1999)zbMATHGoogle Scholar
  37. 37.
    Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of \(\mathbb{R}^n\). J. Optim. Theory Appl. 50, 195–200 (1986)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Olshausen, B., Field, D.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37, 3311–3325 (1996)Google Scholar
  39. 39.
    Hoyer, P.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Applied optimization, vol. 87. Kluwer Academic Publishers, Boston, MA (2004)Google Scholar
  42. 42.
    Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Liang, J., Fadili, J., Peyré, G.: A multi-step inertial forward–backward splitting method for non-convex optimization. arXiv:1606.02118 [math] (2016)
  44. 44.
    Wen, B., Chen, X., Pong, T.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)MathSciNetzbMATHGoogle Scholar
  45. 45.
    Drusvyatskiy, D., Kempton, C.: An accelerated algorithm for minimizing convex compositions. ArXiv e-prints (2016). ArXiv:1605.00125 [math]
  46. 46.
    Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du centre National de la Recherche Scientifique, Paris (1963)Google Scholar
  48. 48.
    Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’institut Fourier 43(5), 1575–1595 (1993)zbMATHGoogle Scholar
  49. 49.
    Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006). MathSciNetzbMATHGoogle Scholar
  50. 50.
    Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Saarland UniversitySaarbrückenGermany
  2. 2.Normandie Université ENSICAEN, CNRS, GREYCCaenFrance
  3. 3.University of FreiburgFreiburgGermany

Personalised recommendations