Advertisement

Faster subgradient methods for functions with Hölderian growth

  • Patrick R. JohnstoneEmail author
  • Pierre Moulin
Full Length Paper Series A
  • 120 Downloads

Abstract

The purpose of this manuscript is to derive new convergence results for several subgradient methods applied to minimizing nonsmooth convex functions with Hölderian growth. The growth condition is satisfied in many applications and includes functions with quadratic growth and weakly sharp minima as special cases. To this end there are three main contributions. First, for a constant and sufficiently small stepsize, we show that the subgradient method achieves linear convergence up to a certain region including the optimal set, with error of the order of the stepsize. Second, if appropriate problem parameters are known, we derive a decaying stepsize which obtains a much faster convergence rate than is suggested by the classical \(O(1/\sqrt{k})\) result for the subgradient method. Thirdly we develop a novel “descending stairs” stepsize which obtains this faster convergence rate and also obtains linear convergence for the special case of weakly sharp functions. We also develop an adaptive variant of the “descending stairs” stepsize which achieves the same convergence rate without requiring an error bound constant which is difficult to estimate in practice.

Mathematics Subject Classification

65K05 65K10 90C25 90C30 

Notes

References

  1. 1.
    Agro, G.: Maximum likelihood and \(L_p\) norm estimators. Stat. Appl. 4(1), 7 (1992)Google Scholar
  2. 2.
    Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)zbMATHGoogle Scholar
  4. 4.
    Beck, A., Shtern, S.: Linearly convergent away-step conditional gradient for non-strongly convex functions. Math. Program. 164, 1–27 (2015)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)zbMATHGoogle Scholar
  6. 6.
    Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)zbMATHGoogle Scholar
  7. 7.
    Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 1–37 (2015)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)zbMATHGoogle Scholar
  9. 9.
    Burke, J., Deng, S.: Weak sharp minima revisited part i: basic theory. Control Cybern. 31, 439–469 (2002)zbMATHGoogle Scholar
  10. 10.
    Burke, J., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31(5), 1340–1359 (1993)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Cruz, J.Y.B.: On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions. Set-Valued Var. Anal. 25(2), 245–263 (2017)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Ferris, M.C.: Finite termination of the proximal point algorithm. Math. Program. 50(1), 359–366 (1991)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Freund, R.M., Lu, H.: New computational guarantees for solving convex optimization problems with first order methods, via a function growth condition measure. Math. Program. 170, 1–33 (2015)MathSciNetGoogle Scholar
  16. 16.
    Gao, X., Huang, J.: Asymptotic analysis of high-dimensional LAD regression with LASSO. Stat. Sin. 20, 1485–1506 (2010)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Gilpin, A., Pena, J., Sandholm, T.: First-order algorithm with \(O(\ln (1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program. 133(1–2), 279–298 (2012)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Goffin, J.L.: On convergence rates of subgradient optimization methods. Math. Program. 13(1), 329–347 (1977)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Hare, W., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2009)zbMATHGoogle Scholar
  21. 21.
    Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps: asynchronous and block-iterative operator splitting. arXiv:1803.07043 (2018)
  22. 22.
    Johnstone, P.R., Moulin, P.: Faster subgradient methods for functions with Hölderian growth. arXiv:1704.00196 (2017)
  23. 23.
    Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811. Springer (2016)Google Scholar
  24. 24.
    Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. IEEE Trans. Signal Process. 52(8), 2165–2176 (2004)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Li, G.: Global error bounds for piecewise convex polynomials. Math. Program. 137(1–2), 37–64 (2013)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Lim, E.: On the convergence rate for stochastic approximation in the nonsmooth setting. Math. Oper. Res. 36(3), 527–537 (2011)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Maculan, N., Santiago, C.P., Macambira, E., Jardim, M.: An \(O(n)\) algorithm for projecting a vector on the intersection of a hyperplane and a box in \(\mathbb{R}^n\). J. Optim. Theory Appl. 117(3), 553–574 (2003)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Stochastic Optimization: Algorithms and Applications, pp. 223–264. Springer (2001)Google Scholar
  31. 31.
    Nedić, A., Bertsekas, D.P.: The effect of deterministic noise in subgradient methods. Math. Program. 125(1), 75–99 (2010)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2014)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Pang, J.S.: Error bounds in mathematical programming. Math. Program. 79(1–3), 299–332 (1997)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Poljak, B.: Nonlinear programming methods in the presence of noise. Math. Program. 14(1), 87–97 (1978)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987)zbMATHGoogle Scholar
  37. 37.
    Renegar, J.: A framework for applying subgradient methods to conic optimization problems. arXiv:1503.02611 (2015)
  38. 38.
    Renegar, J.: “Efficient” subgradient methods for general convex optimization. SIAM J. Optim. 26(4), 2649–2676 (2016)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Rosenberg, E.: A geometrically convergent subgradient optimization method for nonlinearly constrained convex programs. Math. Oper. Res. 13(3), 512–523 (1988)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Shor, N.Z.: Minimization Methods for Non-differentiable Functions, vol. 3. Springer, Berlin (2012)Google Scholar
  41. 41.
    Supittayapornpong, S., Neely, M.J.: Staggered time average algorithm for stochastic non-smooth optimization with \(O(1/T)\) convergence. arXiv:1607.02842 (2016)
  42. 42.
    Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Wang, L.: The \(\ell _1\) penalized LAD estimator for high dimensional linear regression. J. Multivar. Anal. 120, 135–151 (2013)Google Scholar
  44. 44.
    Wang, L., Gordon, M.D., Zhu, J.: Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Sixth International Conference on Data Mining, ICDM’06, 2006, pp. 690–700. IEEE (2006)Google Scholar
  45. 45.
    Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Xu, Y., Lin, Q., Yang, T.: Accelerate stochastic subgradient method by leveraging local error bound. hyperimagehttp://arxiv.org/abs/1607.01027arXiv:1607.01027 (2016)Google Scholar
  47. 47.
    Yang, T., Lin, Q.: RSG: beating subgradient method without smoothness and strong convexity. arXiv:1512.03107 (2015)
  48. 48.
    Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. arXiv:1606.00269 (2016)
  49. 49.
    Zhang, H., Yin, W.: Gradient methods for convex minimization: better rates under weaker conditions. arXiv:1303.4645 (2013)
  50. 50.
    Zhou, Z., So, A.M.C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. In: NIPS, vol. 15, pp. 49–56 (2003)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Department of Management Sciences and Information Systems, Rutgers Business School Newark and New BrunswickRutgers UniversityNew BrunswickUSA
  2. 2.Coordinated Science LaboratoryUniversity of IllinoisUrbanaUSA

Personalised recommendations