Skip to main content

Projective Approximation Based Quasi-Newton Methods

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Big Data (MOD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10710))

Included in the following conference series:

Abstract

We consider a problem of optimizing convex function of vector parameter. Many quasi-Newton optimization methods require to construct and store an approximation of Hessian matrix or its inverse to take function curvature into account, thus imposing high computational and memory requirements. We propose four quasi-Newton methods based on consecutive projective approximation. The idea of these methods is to approximate the product of the function Hessian inverse and function gradient in a low-dimensional space using appropriate projection and then reconstruct it back to original space as a new direction for the next estimate search. By exploiting Hessian rank deficiency in a special way it does not require to store Hessian matrix neither its inverse thus reducing memory requirements. We give a theoretical motivation for the proposed algorithms and prove several properties of corresponding estimates. Finally, we provide a comparison of the proposed methods with several existing ones on modelled data. Despite the fact that the proposed algorithms turned out to be inferior to the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) one, they have important advantage of being easy to extent and improve. Moreover, two of them do not require the function gradient knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Box, G.E., Draper, N.R., et al.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)

    MATH  Google Scholar 

  2. Boyd, S.: Global optimization in control system analysis and design. In: Control and Dynamic Systems V53: High Performance Systems Techniques and Applications: Advances in Theory and Applications, vol. 53, p. 1 (2012)

    Google Scholar 

  3. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  4. Broyden, C.G.: The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 6(1), 76–90 (1970)

    Article  Google Scholar 

  5. Conn, A.R., Gould, N.I., Toint, P.L.: Convergence of quasi-Newton matrices generated by the symmetric rank one update. Math. Program. 50(1–3), 177–195 (1991)

    Article  MathSciNet  Google Scholar 

  6. Davidon, W.C.: Variable metric method for minimization. SIAM J. Optim. 1(1), 1–17 (1991)

    Article  MathSciNet  Google Scholar 

  7. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theor. 52(4), 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  8. Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, New York (1987)

    MATH  Google Scholar 

  9. Ford, J., Moghrabi, I.: Multi-step quasi-Newton methods for optimization. J. Comput. Appl. Math. 50(1–3), 305–323 (1994)

    Article  MathSciNet  Google Scholar 

  10. Forrester, A., Keane, A.: Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 45(1), 50–79 (2009)

    Article  Google Scholar 

  11. Granichin, O., Volkovich, Z.V., Toledano-Kitai, D.: Randomized Algorithms in Automatic Control and Data Mining, vol. 67. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-642-54786-7

    Book  Google Scholar 

  12. Granichin, O.N.: Stochastic approximation search algorithms with randomization at the input. Autom. Remote Control 76(5), 762–775 (2015)

    Article  MathSciNet  Google Scholar 

  13. Hoffmann, W.: Iterative algorithms for Gram-Schmidt orthogonalization. Computing 41(4), 335–348 (1989)

    Article  MathSciNet  Google Scholar 

  14. Krause, A.: SFO: a toolbox for submodular function optimization. J. Mach. Learn. Res. 11, 1141–1144 (2010)

    MATH  Google Scholar 

  15. Nesterov, Y.: Introductory Lectures on Convex Programming Volume I: Basic Course. Citeseer (1998)

    Google Scholar 

  16. Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35(151), 773–782 (1980)

    Article  MathSciNet  Google Scholar 

  17. Polyak, B.T.: Introduction to optimization. Translations series in mathematics and engineering. Optimization Software Inc., Publications Division, New York (1987)

    Google Scholar 

  18. Senov, A.: Accelerating gradient descent with projective response surface methodology. In: Battiti, R., Kvasov, D.E., Sergeyev, Y.D. (eds.) LION 2017. LNCS, vol. 10556, pp. 376–382. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69404-7_34

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by Russian Science Foundation (project 16-19-00057).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Senov .

Editor information

Editors and Affiliations

A Proofs

A Proofs

Proof

(of Proposition 1)

$$\begin{aligned}&\partial _{\mathbf {x}} \mathop {\arg \!\min }\limits _{ \{\mathbf {x}\in \mathbb {R}^d\,:\,\mathbf {P}\mathbf {x} = \widehat{\mathbf {z}}\} } \sum \limits _{t=1}^{K} \left\| \mathbf {x}_t - \mathbf {x} \right\| _2^2 \\&=\partial _{\mathbf {x}} \sum \limits _{t=1}^{K} \left\| \mathbf {P}^\top \mathbf {P}\mathbf {x}_t + \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \mathbf {x}_t - \left( \mathbf {P}^\top \widehat{\mathbf {z}} + \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \mathbf {x}\right) \right\| _2^2 \\&=\partial _{\mathbf {x}} \sum \limits _{t=1}^{K} \left\| \mathbf {P}^\top \mathbf {P}\mathbf {x}_t - \mathbf {P}^\top \widehat{\mathbf {z}} \right\| _2^2 + \partial _{\mathbf {x}} \sum \limits _{t=1}^{K} \left\| \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \left( \mathbf {x}_t - \mathbf {x} \right) \right\| _2^2 \\&= \sum \limits _{t=1}^{K} 2 (\mathbf {I} - \mathbf {P}^\top \mathbf {P})\left( (\mathbf {I} - \mathbf {P}^\top \mathbf {P}) \mathbf {x}_t - \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \mathbf {x} \right) \\&=2 (\mathbf {I} - \mathbf {P}^\top \mathbf {P}) \sum \limits _{t=1}^{K} \mathbf {x}_t - 2 K \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \mathbf {x} = 0. \end{aligned}$$

Since \((\mathbf {I}-\mathbf {P}^\top \mathbf {P})\) is not invertible the above equation has infinite number of solutions. Hence, we are free to choose any one of them, e.g. \(\mathbf {x}=\frac{1}{K}\sum _1^K \mathbf {x}_t\).    \(\square \)

Proof

(of Proposition 2). From Proposition 1 and the fact that \(\mathop {\arg \!\min }\limits _{\mathbf {z}\in \mathbb {R}^q}\, f(\mathbf {P}^\top \mathbf {z} + \mathbf {v}) = -\mathbf {P} \mathbf {H}^{-1} \mathbf {b}\) it follows that \(\widehat{\mathbf {x}} = \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \overline{\mathbf {x}} - \mathbf {P}^\top \mathbf {P} \mathbf {H}^{-1}\mathbf {b} \). Hence

$$\begin{aligned}&\left\| \arg \!\min \, f - \widehat{\mathbf {x}} \right\| _2^2 = \left\| -\mathbf {H}^{-1}\mathbf {b} - \widehat{\mathbf {x}} \right\| _2^2 \\&\qquad = \left\| - \left( \mathbf {I} - \mathbf {P}^\top \mathbf {P}\right) \overline{\mathbf {x}} + \mathbf {P}^\top \mathbf {P} \mathbf {H}^{-1}\mathbf {b} - \mathbf {H}^{-1} \mathbf {b} \right\| _2^2 \\&\qquad = \left\| (\mathbf {I} - \mathbf {P}^\top \mathbf {P})\left( \mathbf {H}^{-1} \mathbf {b} - \overline{\mathbf {x}} \right) \right\| _2^2. \end{aligned}$$

   \(\square \)

Proof

(of Proposition 3). First,

$$\begin{aligned} \Vert \mathbf {H} - \widehat{\mathbf {H}}\Vert _F^2 = \Vert (\mathbf {I} - \mathbf {P}^\top \mathbf {P})\mathbf {H} (\mathbf {I} - \mathbf {P}^\top \mathbf {P})\Vert _F ^2 + \Vert \mathbf {P}^\top (\mathbf {P} \mathbf {H}\mathbf {P}^\top - \widehat{\mathbf {Q}}) \mathbf {P}\Vert _F^2 \end{aligned}$$

One can note that \(\widehat{\mathbf {Q}}_{i,j}\) are normally distributed variables s.t. \(\mathrm {E}\left[ \widehat{\mathbf {Q}}\right] = \mathbf {P}\mathbf {H}\mathbf {P}^\top \) (for example, see [1]). Moreover, consider a vectorization of the matrix \(\widehat{\mathbf {Q}}\) upper triangle \(\widehat{\mathbf {\theta }}\):

$$\begin{aligned} \widehat{\mathbf {\theta }} = \left( \widehat{\mathbf {Q}}_{1,1}, \widehat{\mathbf {Q}}_{1,2}, \ldots , \widehat{\mathbf {Q}}_{q-1,q}, \widehat{\mathbf {Q}}_{q,q}\right) , \end{aligned}$$

— its covariance matrix is equal to \(\mathbf {\Sigma }_\theta = \frac{\sigma _\varepsilon }{m}\ddot{\mathbf {Z}}\ddot{\mathbf {Z}}^\top \), where \(\ddot{\mathbf {Z}}_{i,\cdot }\) consist of quadratic elements of \(\mathbf {z}_i = \mathbf {P}\mathbf {x}_i\):

$$\begin{aligned} \ddot{\mathbf {Z}}_{i,\cdot } = \left( \mathbf {z}_{i}^{(1)} \mathbf {z}_{i}^{(1)}, \mathbf {z}_{i}^{(1)} \mathbf {z}_{i}^{(2)}, \ldots , \mathbf {z}_{i}^{(q - 1)} \mathbf {z}_{i}^{(q)}, \mathbf {z}_{i}^{(q)} \mathbf {z}_{i}^{(q)}\right) ^\top . \end{aligned}$$

Next, denote \(\mathbf {\theta }\) as a vectorization of the \(\mathbf {P} \mathbf {H}\mathbf {P}^\top \) matrix upper triangle and consider eigendecomposition of \(\mathbf {\Sigma }_\theta = \mathbf {U} \mathbf {\Lambda } \mathbf {U}^\top \). Then, vector \(\widehat{\mathbf {\beta }} = \mathbf {U}\widehat{\mathbf {\theta }}\) would have gaussian distribution with covariance matrix \(\mathbf {\Lambda }\), and

$$\begin{aligned} \Vert \mathbf {P}^\top (\mathbf {P} \mathbf {H}\mathbf {P}^\top - \widehat{\mathbf {Q}}) \mathbf {P}\Vert _F^2 \le \Vert \mathbf {P}^\top \Vert _F^2 \Vert \mathbf {P} \mathbf {H}\mathbf {P}^\top - \widehat{\mathbf {Q}} \Vert _F^2 \Vert \mathbf {P}\Vert _F^2 = q^2 \Vert \mathbf {P} \mathbf {H}\mathbf {P}^\top - \widehat{\mathbf {Q}} \Vert _F^2 . \\ \Vert \mathbf {P} \mathbf {H}\mathbf {P}^\top - \widehat{\mathbf {Q}} \Vert _F^2 = \Vert \widehat{\mathbf {\theta }} - \mathbf {\theta } \Vert _2^2 = \left( \mathbf {U}(\widehat{\mathbf {\theta }} - \mathbf {\theta })\right) ^\top \left( \mathbf {U}(\widehat{\mathbf {\theta }} - \mathbf {\theta })\right) = \sum _{i=1}^{q^2} \xi _2 \sim \sum _{i=1}^{q^2} \lambda _i^2 \chi ^2\ (1). \end{aligned}$$

Thus, \(C(\mathbf {X}\mathbf {P}^\top ) = \max \limits _i \lambda _i^2\).

   \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Senov, A. (2018). Projective Approximation Based Quasi-Newton Methods. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R. (eds) Machine Learning, Optimization, and Big Data. MOD 2017. Lecture Notes in Computer Science(), vol 10710. Springer, Cham. https://doi.org/10.1007/978-3-319-72926-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72926-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72925-1

  • Online ISBN: 978-3-319-72926-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics