Related Work on Geometry of Non-Convex Programs

Shi, Bin; Iyengar, S. S.

doi:10.1007/978-3-030-17076-9_6

Bin Shi³ &
S. S. Iyengar⁴

2593 Accesses

Abstract

Over the past few years, there have been increasing interest in understanding the geometry of non-convex programs that naturally arise from machine learning problems. It is particularly interesting to study additional properties of the considered non-convex objective such that popular optimization methods (such as gradient descent) escape saddle points and converge to a local minimum. The strict saddle property (Definition 5.6) is one such property, which was also shown to hold in a broad range of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in STOC (2017), pp. 1195–1199. http://arxiv.org/abs/1611.01146
A. Anandkumar, R. Ge, Efficient approaches for escaping higher order saddle points in non-convex optimization, in Conference on Learning Theory (2016), pp. 81–102. arXiv preprint arXiv:1602.05908
Google Scholar
A. Arnold, Y. Liu, N. Abe, Temporal causal modeling with graphical granger methods, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2007), pp. 66–75
Google Scholar
S. Bubeck, Convex optimization: algorithms and complexity. Found. Trends in Mach. Learn. 8(3–4), 231–357 (2015)
Article MATH Google Scholar
M.T. Bahadori, Y. Liu, On causality inference in time series, in AAAI Fall Symposium: Discovery Informatics (2012)
Google Scholar
P.S. Bradley, O.L. Mangasarian, K-plane clustering. J. Global Optim. 16(1), 23–32 (2000)
Article MathSciNet MATH Google Scholar
A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Y. Carmon, J.C. Duchi, Gradient descent efficiently finds the cubic-regularized non-convex Newton step. arXiv preprint arXiv:1612.00547 (2016)
Google Scholar
Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Accelerated methods for non-convex optimization. arXiv preprint arXiv:1611.00756 (2016)
Google Scholar
C.M. Carvalho, M.S. Johannes, H.F. Lopes, N.G. Polson, Particle learning and smoothing. Stat. Sci. 25, 88–106 (2010)
Article MathSciNet MATH Google Scholar
X. Chen, Y. Liu, H. Liu, J.G. Carbonell, Learning spatial-temporal varying graphs with applications to climate data analysis, in AAAI (2010)
Google Scholar
F.E. Curtis, D.P. Robinson, M. Samadi, A trust region algorithm with a worst-case iteration complexity of O(𝜖 ^−3∕2) for nonconvex optimization. Math. Program. 162(1–2), 1–32 (2014)
MathSciNet MATH Google Scholar
A. Doucet, S. Godsill, C. Andrieu, On sequential Monte Carlo sampling methods for bayesian filtering. Stat. Comput. 10(3), 197–208 (2000)
Article Google Scholar
M. Eichler, Graphical modelling of multivariate time series with latent variables. Preprint, Universiteit Maastricht (2006)
MATH Google Scholar
E. Elhamifar, R. Vidal, Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2765–2781 (2013)
Article Google Scholar
R. Ge, F. Huang, C. Jin, Y. Yuan, Escaping from saddle points—online stochastic gradient for tensor decomposition, in Proceedings of the 28th Conference on Learning Theory (2015), pp. 797–842
Google Scholar
P.E. Gill, W. Murray, Newton-type methods for unconstrained and linearly constrained optimization. Math. Program. 7(1), 311–350 (1974)
Article MathSciNet MATH Google Scholar
C.W.J. Granger, Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3), 424–438 (1969)
Article MATH Google Scholar
C.W.J. Granger, Testing for causality: a personal viewpoint. J. Econ. Dyn. Control. 2, 329–352 (1980)
Article MathSciNet Google Scholar
R. Heckel, H. Bölcskei, Robust subspace clustering via thresholding. IEEE Trans. Inf. Theory 61(11), 6320–6342 (2015)
Article MathSciNet MATH Google Scholar
D. Heckerman, A tutorial on learning with bayesian networks. Learning in Graphical Models (Springer, Berlin, 1998), pp. 301–354
Chapter MATH Google Scholar
M. Hardt, T. Ma, B. Recht, Gradient descent learns linear dynamical systems. arXiv preprint arXiv:1609.05191 (2016)
Google Scholar
R. Heckel, M. Tschannen, H. Bölcskei, Dimensionality-reduced subspace clustering. Inf. Inference: A J. IMA 6(3), 246–283 (2017)
Article MathSciNet MATH Google Scholar
C. Jin, R. Ge, P. Netrapalli, S.M. Kakade, M.I. Jordan, How to escape saddle points efficiently, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1724–1732
Google Scholar
C. Jin, P. Netrapalli, M.I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent. arXiv preprint arXiv:1711.10456 (2017)
Google Scholar
R. Jansen, H. Yu, D. Greenbaum, Y. Kluger, N.J. Krogan, S. Chung, A. Emili, M. Snyder, J.F. Greenblatt, M. Gerstein, A bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302(5644), 449–453 (2003)
Article Google Scholar
Y. Liu, T. Bahadori, H. Li, Sparse-GEV: sparse latent space model for multivariate extreme value time serie modeling. arXiv preprint arXiv:1206.4685 (2012)
Google Scholar
A.C. Lozano, H. Li, A. Niculescu-Mizil, Y. Liu, C. Perlich, J. Hosking, N. Abe, Spatial-temporal causal modeling for climate change attribution, in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM, New York, 2009), pp. 587–596
Google Scholar
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, Y. Ma, Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2013)
Article Google Scholar
Y. Liu, A. Niculescu-Mizil, A.C. Lozano, Y. Lu, Learning temporal causal graphs for relational time-series analysis, in Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010), pp. 687–694
Google Scholar
J.D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M.I. Jordan, B. Recht, First-order methods almost always avoid saddle points. arXiv preprint arXiv:1710.07406 (2017)
Google Scholar
L. Lessard, B. Recht, A. Packard, Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1), 57–95 (2016)
Article MathSciNet MATH Google Scholar
J.D. Lee, M. Simchowitz, M.I. Jordan, B. Recht, Gradient descent only converges to minimizers, in Conference on Learning Theory (2016), pp. 1246–1257
Google Scholar
M. Liu, T. Yang, On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)
Google Scholar
Y. Ma, H. Derksen, W. Hong, J. Wright, Segmentation of multivariate mixed data via lossy data coding and compression. IEEE Trans. Pattern Anal. Mach. Intell. 29(9), 1546–1562 (2007)
Article Google Scholar
J.J. Moré, D.C. Sorensen, On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)
Article MathSciNet MATH Google Scholar
K.P. Murphy, Dynamic bayesian networks: representation, inference and learning, Ph.D. thesis, University of California, Berkeley, 2002
Google Scholar
K.P. Murphy, Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, MA, 2012)
MATH Google Scholar
Y. Nesterov, A Method of Solving a Convex Programming Problem with Convergence Rate o (1/k2) Soviet Mathematics Doklady, vol. 27 (1983), pp. 372–376
MATH Google Scholar
Y. Nesterov, A. Nemirovsky, A general approach to polynomial-time algorithms design for convex programming, Tech. report, Technical report, Centr. Econ. & Math. Inst., USSR Acad. Sci., Moscow, USSR, 1988
Google Scholar
Y. Nesterov, B.T. Polyak, Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
Article MathSciNet MATH Google Scholar
B. O’Donoghue, E. Candès, Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Article MathSciNet MATH Google Scholar
M. O’Neill, S.J. Wright, Behavior of accelerated gradient methods near critical points of nonconvex problems. arXiv preprint arXiv:1706.07993 (2017)
Google Scholar
D. Park, C. Caramanis, S. Sanghavi, Greedy subspace clustering, in Advances in Neural Information Processing Systems (2014), pp. 2753–2761
Google Scholar
R. Pemantle, Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2), 698–712 (1990)
Article MathSciNet MATH Google Scholar
B.T. Polyak, Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Article Google Scholar
I. Panageas, G. Piliouras, Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv preprint arXiv:1605.00405 (2016)
Google Scholar
D.E. Rumelhart, G.E. Hinton, R.J. Williams et al., Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
Google Scholar
C.W. Royer, S.J. Wright, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)
Google Scholar
S.J. Reddi, M. Zaheer, S. Sra, B. Poczos, F. Bach, R. Salakhutdinov, A.J. Smola, A generic approach for escaping saddle points. arXiv preprint arXiv:1709.01434 (2017)
Google Scholar
W. Su, S. Boyd, E. Candes, A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights, in Advances in Neural Information Processing Systems (2014), pp. 2510–2518
Google Scholar
M. Soltanolkotabi, E.J. Candes, A geometric analysis of subspace clustering with outliers. Ann. Stat. 40(4), 2195–2238 (2012)
Article MathSciNet MATH Google Scholar
M. Soltanolkotabi, E. Elhamifar, E.J. Candes, Robust subspace clustering. Ann. Stat. 42(2), 669–699 (2014)
Article MathSciNet MATH Google Scholar
I. Sutskever, J. Martens, G. Dahl, G. Hinton, On the importance of initialization and momentum in deep learning, in International Conference on Machine Learning (2013), pp. 1139–1147
Google Scholar
J. Sun, Q. Qu, J. Wright, A geometric analysis of phase retrieval, in 2016 IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, 2016), pp. 2379–2383
Google Scholar
P.A. Traganitis, G.B. Giannakis, Sketched subspace clustering. IEEE Trans. Signal Process. 66(7), 1663–1675 (2017)
Article MathSciNet MATH Google Scholar
T. Park, G. Casella, The Bayesian Lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008)
Article MathSciNet MATH Google Scholar
P. Tseng, Nearest q-flat to m points. J. Optim. Theory Appl. 105(1), 249–252 (2000)
Article MathSciNet MATH Google Scholar
M. Tsakiris, R. Vidal, Algebraic clustering of affine subspaces. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 482–489 (2017)
Article Google Scholar
M.C. Tsakiris, R. Vidal, Theoretical analysis of sparse subspace clustering with missing entries. arXiv preprint arXiv:1801.00393 (2018)
Google Scholar
R. Vidal, Y. Ma, S. Sastry, Generalized principal component analysis (GPCA). IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1945–1959 (2005)
Article Google Scholar
A.C. Wilson, B. Recht, M.I. Jordan, A lyapunov analysis of momentum methods in optimization. arXiv preprint arXiv:1611.02635 (2016)
Google Scholar
A. Wibisono, A.C. Wilson, M.I. Jordan, A variational perspective on accelerated methods in optimization. Proc. Nat. Acad. Sci. 113(47), E7351–E7358 (2016)
Article MathSciNet MATH Google Scholar
Y. Wang, Y.-X. Wang, A. Singh, A deterministic analysis of noisy sparse subspace clustering for dimensionality-reduced data, in International Conference on Machine Learning (2015), pp. 1422–1431
Google Scholar
Y. Wang, Y.-X. Wang, A. Singh, Differentially private subspace clustering, in Advances in Neural Information Processing Systems (2015), pp. 1000–1008
Google Scholar
Y.-X. Wang, H. Xu, Noisy sparse subspace clustering. J. Mach. Learn. Res. 17(12), 1–41 (2016)
MathSciNet MATH Google Scholar
J. Yan, M. Pollefeys, A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 94–106
Google Scholar
C. Yang, D. Robinson, R. Vidal, Sparse subspace clustering with missing entries, in International Conference on Machine Learning (2015), pp. 2463–2472
Google Scholar
C. Zou, J. Feng, Granger causality vs. dynamic bayesian network inference: a comparative study. BMC Bioinf. 10(1), 122 (2009)
Google Scholar
H. Zou, T. Hastie, Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat Methodol. 67(2), 301–320 (2005)
Article MathSciNet MATH Google Scholar
C. Zeng, Q. Wang, S. Mokhtari, T. Li, Online context-aware recommendation with time varying multi-armed bandit, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, 2016), pp. 2025–2034
Google Scholar
C. Zeng, Q. Wang, W. Wang, T. Li, L. Shwartz, Online inference for time-varying temporal dependency discovery from time series, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2016), pp. 1281–1290
Book Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Berkeley, USA
Bin Shi
Florida International University, Miami, FL, USA
S. S. Iyengar

Authors

Bin Shi
View author publications
You can also search for this author in PubMed Google Scholar
S. S. Iyengar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shi, B., Iyengar, S.S. (2020). Related Work on Geometry of Non-Convex Programs. In: Mathematical Theories of Machine Learning - Theory and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-17076-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-17076-9_6
Published: 13 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17075-2
Online ISBN: 978-3-030-17076-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics