Abstract
We consider a problem of optimizing convex function of vector parameter. Many quasi-Newton optimization methods require to construct and store an approximation of Hessian matrix or its inverse to take function curvature into account, thus imposing high computational and memory requirements. We propose four quasi-Newton methods based on consecutive projective approximation. The idea of these methods is to approximate the product of the function Hessian inverse and function gradient in a low-dimensional space using appropriate projection and then reconstruct it back to original space as a new direction for the next estimate search. By exploiting Hessian rank deficiency in a special way it does not require to store Hessian matrix neither its inverse thus reducing memory requirements. We give a theoretical motivation for the proposed algorithms and prove several properties of corresponding estimates. Finally, we provide a comparison of the proposed methods with several existing ones on modelled data. Despite the fact that the proposed algorithms turned out to be inferior to the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) one, they have important advantage of being easy to extent and improve. Moreover, two of them do not require the function gradient knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Box, G.E., Draper, N.R., et al.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)
Boyd, S.: Global optimization in control system analysis and design. In: Control and Dynamic Systems V53: High Performance Systems Techniques and Applications: Advances in Theory and Applications, vol. 53, p. 1 (2012)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 6(1), 76–90 (1970)
Conn, A.R., Gould, N.I., Toint, P.L.: Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math. Program. 50(1–3), 177–195 (1991)
Davidon, W.C.: Variable metric method for minimization. SIAM J. Optim. 1(1), 1–17 (1991)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theor. 52(4), 1289–1306 (2006)
Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, New York (1987)
Ford, J., Moghrabi, I.: Multi-step quasi-Newton methods for optimization. J. Comput. Appl. Math. 50(1–3), 305–323 (1994)
Forrester, A., Keane, A.: Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 45(1), 50–79 (2009)
Granichin, O., Volkovich, Z.V., Toledano-Kitai, D.: Randomized Algorithms in Automatic Control and Data Mining, vol. 67. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-642-54786-7
Granichin, O.N.: Stochastic approximation search algorithms with randomization at the input. Autom. Remote Control 76(5), 762–775 (2015)
Hoffmann, W.: Iterative algorithms for Gram-Schmidt orthogonalization. Computing 41(4), 335–348 (1989)
Krause, A.: SFO: a toolbox for submodular function optimization. J. Mach. Learn. Res. 11, 1141–1144 (2010)
Nesterov, Y.: Introductory Lectures on Convex Programming Volume I: Basic Course. Citeseer (1998)
Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35(151), 773–782 (1980)
Polyak, B.T.: Introduction to optimization. Translations series in mathematics and engineering. Optimization Software Inc., Publications Division, New York (1987)
Senov, A.: Accelerating gradient descent with projective response surface methodology. In: Battiti, R., Kvasov, D.E., Sergeyev, Y.D. (eds.) LION 2017. LNCS, vol. 10556, pp. 376–382. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69404-7_34
Acknowledgments
This work was supported by Russian Science Foundation (project 16-19-00057).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proofs
A Proofs
Proof
(of Proposition 1)
Since \((\mathbf {I}-\mathbf {P}^\top \mathbf {P})\) is not invertible the above equation has infinite number of solutions. Hence, we are free to choose any one of them, e.g. \(\mathbf {x}=\frac{1}{K}\sum _1^K \mathbf {x}_t\). \(\square \)
Proof
(of Proposition 2). From Proposition 1 and the fact that \(\mathop {\arg \!\min }\limits _{\mathbf {z}\in \mathbb {R}^q}\, f(\mathbf {P}^\top \mathbf {z} + \mathbf {v}) = -\mathbf {P} \mathbf {H}^{-1} \mathbf {b}\) it follows that \(\widehat{\mathbf {x}} = \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \overline{\mathbf {x}} - \mathbf {P}^\top \mathbf {P} \mathbf {H}^{-1}\mathbf {b} \). Hence
\(\square \)
Proof
(of Proposition 3). First,
One can note that \(\widehat{\mathbf {Q}}_{i,j}\) are normally distributed variables s.t. \(\mathrm {E}\left[ \widehat{\mathbf {Q}}\right] = \mathbf {P}\mathbf {H}\mathbf {P}^\top \) (for example, see [1]). Moreover, consider a vectorization of the matrix \(\widehat{\mathbf {Q}}\) upper triangle \(\widehat{\mathbf {\theta }}\):
— its covariance matrix is equal to \(\mathbf {\Sigma }_\theta = \frac{\sigma _\varepsilon }{m}\ddot{\mathbf {Z}}\ddot{\mathbf {Z}}^\top \), where \(\ddot{\mathbf {Z}}_{i,\cdot }\) consist of quadratic elements of \(\mathbf {z}_i = \mathbf {P}\mathbf {x}_i\):
Next, denote \(\mathbf {\theta }\) as a vectorization of the \(\mathbf {P} \mathbf {H}\mathbf {P}^\top \) matrix upper triangle and consider eigendecomposition of \(\mathbf {\Sigma }_\theta = \mathbf {U} \mathbf {\Lambda } \mathbf {U}^\top \). Then, vector \(\widehat{\mathbf {\beta }} = \mathbf {U}\widehat{\mathbf {\theta }}\) would have gaussian distribution with covariance matrix \(\mathbf {\Lambda }\), and
Thus, \(C(\mathbf {X}\mathbf {P}^\top ) = \max \limits _i \lambda _i^2\).
\(\square \)
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Senov, A. (2018). Projective Approximation Based Quasi-Newton Methods. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds) Machine Learning, Optimization, and Big Data. MOD 2017. Lecture Notes in Computer Science(), vol 10710. Springer, Cham. https://doi.org/10.1007/978-3-319-72926-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-72926-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72925-1
Online ISBN: 978-3-319-72926-8
eBook Packages: Computer ScienceComputer Science (R0)