Advertisement

Inexact Successive quadratic approximation for regularized optimization

  • Ching-pei LeeEmail author
  • Stephen J. Wright
Article
  • 25 Downloads

Abstract

Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration complexity focus on the special case of proximal gradient method, or accelerated variants thereof. There have been only a few studies of methods that use a second-order approximation to the smooth part, due in part to the difficulty of obtaining closed-form solutions to the subproblems at each iteration. In fact, iterative algorithms may need to be used to find inexact solutions to these subproblems. In this work, we present global analysis of the iteration complexity of inexact successive quadratic approximation methods, showing that an inexact solution of the subproblem that is within a fixed multiplicative precision of optimality suffices to guarantee the same order of convergence rate as the exact version, with complexity related in an intuitive way to the measure of inexactness. Our result allows flexible choices of the second-order term, including Newton and quasi-Newton choices, and does not necessarily require increasing precision of the subproblem solution on later iterations. For problems exhibiting a property related to strong convexity, the algorithms converge at global linear rates. For general convex problems, the convergence rate is linear in early stages, while the overall rate is O(1 / k). For nonconvex problems, a first-order optimality criterion converges to zero at a rate of \(O(1/\sqrt{k})\).

Keywords

Convex optimization Nonconvex optimization Regularized optimization Variable metric Proximal method Second-order approximation Inexact method 

Notes

References

  1. 1.
    Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Problems 33(5), 055005 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Burke, J.V., Moré, J.J., Toraldo, G.: Convergence properties of trust region methods for linear and convex constraints. Math. Program. 47(1–3), 305–336 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for \({L}-1\) regularized optimization. Math. Program. 157(2), 375–396 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Conn, A.R., Gould, N.I.M., Toint, P.L.: Global convergence of a class of trust region algorithms for optimization with simple bounds. SIAM J. Numer. Anal. 25(2), 433–460 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2005)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Fletcher, R.: Practical Methods of Optimization. Wiley, Hoboken (1987)zbMATHGoogle Scholar
  12. 12.
    Ghanbari, H., Scheinberg, K.: Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates. Comput. Optim. Appl. 69(3), 597–627 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Jiang, K., Sun, D., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex sdp. SIAM J. Optim. 22(3), 1042–1064 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Lee, C.P., Chang, K.W.: Distributed block-diagonal approximation methods for regularized empirical risk minimization. Tech. rep. (2017)Google Scholar
  15. 15.
    Lee, C.p., Lim, C.H., Wright, S.J.: A distributed quasi-Newton algorithm for empirical risk minimization with nonsmooth regularization. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1646–1655. ACM, New York (2018)Google Scholar
  16. 16.
    Lee, C.P., Roth, D.: Distributed box-constrained quadratic optimization for dual linear SVM. In: Proceedings of the International Conference on Machine Learning (2015)Google Scholar
  17. 17.
    Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Li, D.H., Fukushima, M.: On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Optim. 11(4), 1054–1064 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Li, J., Andersen, M.S., Vandenberghe, L.: Inexact proximal Newton methods for self-concordant functions. Math. Methods Oper. Res. 85(1), 19–41 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Lin, C.J., Moré, J.J.: Newton’s method for large-scale bound constrained problems. SIAM J. Optim. 9, 1100–1127 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. (2018).  https://doi.org/10.1007/s10107-018-1232-1
  25. 25.
    Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Dordrecht (2004)CrossRefzbMATHGoogle Scholar
  26. 26.
    Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)zbMATHGoogle Scholar
  28. 28.
    Rodomanov, A., Kropotov, D.: A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In: Proceedings of the International Conference on Machine Learning, pp. 2597–2605 (2016)Google Scholar
  29. 29.
    Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)Google Scholar
  31. 31.
    Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: An inexact proximal path-following algorithm for constrained convex minimization. SIAM J. Optim. 24(4), 1718–1745 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Yang, T.: Trading computation for communication: Distributed stochastic dual coordinate ascent. In: Advances in Neural Information Processing Systems, pp. 629–637 (2013)Google Scholar
  36. 36.
    Zheng, S., Wang, J., Xia, F., Xu, W., Zhang, T.: A general distributed dual coordinate optimization framework for regularized loss minimization. J. Mach. Learn. Res. 18(115), 1–52 (2017)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer Sciences Department and Wisconsin Institute for DiscoveryUniversity of Wisconsin-MadisonMadisonUSA

Personalised recommendations