Advertisement

Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval

  • Yuxin Chen
  • Yuejie Chi
  • Jianqing Fan
  • Cong MaEmail author
Full Length Paper Series B
  • 49 Downloads

Abstract

This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest \(\varvec{x}^{\natural }\in {\mathbb {R}}^{n}\) from m quadratic equations/samples \(y_{i}=(\varvec{a}_{i}^{\top }\varvec{x}^{\natural })^{2}, 1\le i\le m\). This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficacy of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent—when randomly initialized—yields an \(\epsilon \)-accurate solution in \(O\big (\log n+\log (1/\epsilon )\big )\) iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.

Mathematics Subject Classification

90C26 

Notes

Acknowledgements

Y. Chen is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, by the ONR grant N00014-19-1-2120, and by the Princeton SEAS innovation award. Y. Chi is supported in part by AFOSR under the grant FA9550-15-1-0205, by ONR under the grant N00014-18-1-2142, by ARO under the grant W911NF-18-1-0303, and by NSF under the grants CAREER ECCS-1818571 and CCF-1806154. J. Fan is supported in part by NSF grants DMS-1662139 and DMS-1712591 and NIH grant 2R01-GM072611-13.

Supplementary material

10107_2019_1363_MOESM1_ESM.pdf (746 kb)
Supplementary material 1 (pdf 746 KB)

References

  1. 1.
    Agarwal, N., Allen-Zhu, Z., Bullins, B., Hazan, E., Ma, T.: Finding approximate local minima for nonconvex optimization in linear time (2016). arXiv preprint arXiv:1611.01146
  2. 2.
    Abbe, E., Fan, J., Wang, K., Zhong, Y.: Entrywise eigenvector analysis of random matrices with low expected rank (2017). arXiv preprint arXiv:1709.09565
  3. 3.
    Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD (2017). arXiv preprint arXiv:1708.08694
  4. 4.
    Bandeira, A.S., Cahill, J., Mixon, D.G., Nelson, A.A.: Saving phase: injectivity and stability for phase retrieval. Appl. Comput. Harmonic Anal. 37(1), 106–125 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bendory, T., Eldar, Y.C., Boumal, N.: Non-convex phase retrieval from STFT measurements. IEEE Trans. Inf. Theory 64(1), 467–484 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Chen, Y., Candès, E.J.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. Commun. Pure Appl. Math. 70(5), 822–883 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chen, Y., Candès, E.: The projected power method: an efficient algorithm for joint alignment from pairwise differences. Commun. Pure Appl. Math. 71(8), 1648–1714 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Chen, Y., Cheng, C., Fan, J.: Asymmetry helps: eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices (2018). arXiv preprint arXiv:1811.12804
  9. 9.
    Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Candès, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM J. Imaging Sci. 6(1), 199–225 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Chen, P., Fannjiang, A., Liu, G.-R.: Phase retrieval with one or two diffraction patterns by alternating projections with the null initialization. J. Fourier Anal. Appl. 24(3), 719–758 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Chen, Y., Fan, J., Ma, C., Wang, K.: Spectral method and regularized MLE are both optimal for top-\(K\) ranking (2017). arXiv preprint arXiv:1707.09971
  13. 13.
    Candès, E.J., Li, X.: Solving quadratic equations via PhaseLift when there are about as many equations as unknowns. Found. Comput. Math. 14(5), 1017–1026 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Chi, Y., Lu, Y.M.: Kaczmarz method for solving quadratic equations. IEEE Signal Process. Lett. 23(9), 1183–1187 (2016)CrossRefGoogle Scholar
  15. 15.
    Chen, J., Li, X.: Memory-efficient kernel PCA via partial matrix sampling and nonconvex optimization: a model-free analysis of local minima (2017). arXiv preprint arXiv:1711.01742
  16. 16.
    Chi, Y., Lu, Y.M., Chen, Y.: Nonconvex optimization meets low-rank matrix factorization: an overview (2018). arXiv preprint arXiv:1809.09573
  17. 17.
    Cai, T.T., Li, X., Ma, Z.: Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow. Ann. Stat. 44(5), 2221–2251 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Candès, E.J., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Cai, J.-F., Liu, H., Wang, Y.: Fast rank one alternating minimization algorithm for phase retrieval (2017). arXiv preprint arXiv:1708.08751
  20. 20.
    Candès, E.J., Strohmer, T., Voroninski, V.: Phaselift: exact and stable signal recovery from magnitude measurements via convex programming. Commun. Pure Appl. Math. 66(8), 1017–1026 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Chen, Y., Wainwright, M.J.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees (2015). arXiv preprint arXiv:1509.03025
  22. 22.
    Chen, J., Wang, L., Zhang, X., Gu, Q.: Robust Wirtinger flow for phase retrieval with arbitrary corruption (2017). arXiv preprint arXiv:1704.06256
  23. 23.
    Chen, Y., Yi, X., Caramanis, C.: A convex formulation for mixed regression with two components: minimax optimal rates. In: Conference on Learning Theory, pp. 560–604 (2014)Google Scholar
  24. 24.
    Cai, T., Zhang, A.: ROP: matrix recovery via rank-one projections. Ann. Stat. 43(1), 102–138 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Demanet, L., Hand, P.: Stable optimizationless recovery from phaseless linear measurements. J. Fourier Anal. Appl. 20(1), 199–221 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Du, S.S., Jin, C., Lee, J.D., Jordan, M.I., Singh, A., Poczos, B.: Gradient descent can take exponential time to escape saddle points. In: Advances in Neural Information Processing Systems, pp. 1067–1077 (2017)Google Scholar
  27. 27.
    Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: composite optimization for robust phase retrieval (2017). arXiv preprint arXiv:1705.02356
  28. 28.
    El Karoui, N.: On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Rel. Fields 170(1–2), 95–175 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    El Karoui, N., Bean, D., Bickel, P.J., Lim, C., Yu, B.: On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. 110(36), 14557–14562 (2013)CrossRefzbMATHGoogle Scholar
  30. 30.
    Fu, H., Chi, Y., Liang, Y.: Local geometry of one-hidden-layer neural networks for logistic regression (2018). arXiv preprint arXiv:1802.06463
  31. 31.
    Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)Google Scholar
  32. 32.
    Gao, B., Xu, Z.: Phase retrieval using Gauss–Newton method (2016). arXiv preprint arXiv:1606.08135
  33. 33.
    Huang, W., Hand, P.: Blind deconvolution by a steepest descent algorithm on a quotient manifold (2017). arXiv preprint arXiv:1710.03309
  34. 34.
    Hao, B., Zhang, A., Cheng, G.: Sparse and low-rank tensor estimation via cubic sketchings (2018). arXiv preprint arXiv:1801.09326
  35. 35.
    Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently (2017). arXiv preprint arXiv:1703.00887
  36. 36.
    Jin, C., Netrapalli, P., Jordan, M.I.: Accelerated gradient descent escapes saddle points faster than gradient descent (2017). arXiv preprint arXiv:1711.10456
  37. 37.
    Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Kueng, R., Rauhut, H., Terstiege, U.: Low rank matrix recovery from rank one measurements. Appl. Comput. Harmonic Anal. 42(1), 88–116 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Lang, S.: Real and Functional Analysis, vol. 10, pp. 11–13. Springer, New York (1993)CrossRefzbMATHGoogle Scholar
  40. 40.
    Li, G., Gu, Y., Lu, Y.M.: Phase retrieval using iterative projections: Dynamics in the large systems limit. In: Allerton Conference on Communication, Control, and Computing, pp. 1114–1118. IEEE (2015)Google Scholar
  41. 41.
    Lu, Y.M., Li, G.: Phase transitions of spectral initialization for high-dimensional nonconvex estimation (2017). arXiv preprint arXiv:1702.06435
  42. 42.
    Li, X., Ling, S., Strohmer, T., Wei, K.: Rapid, robust, and reliable blind deconvolution via nonconvex optimization (2016). arXiv preprint arXiv:1606.04933
  43. 43.
    Li, Y., Ma, C., Chen, Y., Chi, Y.: Nonconvex matrix factorization from rank-one measurements (2018). arXiv preprint arXiv:1802.06286
  44. 44.
    Li, Y., Ma, T., Zhang, H.: Algorithmic regularization in over-parameterized matrix recovery (2017). arXiv preprint arXiv:1712.09203
  45. 45.
    Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent converges to minimizers (2016). arXiv preprint arXiv:1602.04915
  46. 46.
    Mondelli, M., Montanari, A.: Fundamental limits of weak recovery with applications to phase retrieval (2017). arXiv preprint arXiv:1708.05932
  47. 47.
    Murray, R., Swenson, B., Kar, S.: Revisiting normalized gradient descent: evasion of saddle points (2017). arXiv preprint arXiv:1711.05224
  48. 48.
    Ma, C., Wang, K., Chi, Y., Chen, Y.: Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution (2017). arXiv preprint arXiv:1711.10467
  49. 49.
    Ma, J., Xu, J., Maleki, A.: Optimization-based AMP for phase retrieval: the impact of initialization and \(\ell _2\)-regularization (2018). arXiv preprint arXiv:1801.01170
  50. 50.
    Netrapalli, P., Jain, P., Sanghavi, S.: Phase retrieval using alternating minimization. In: Advances in Neural Information Processing Systems, pp. 2796–2804 (2013)Google Scholar
  51. 51.
    Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  52. 52.
    Qu, Q., Zhang, Y., Eldar, Y.C., Wright, J.: Convolutional phase retrieval via gradient descent (2017). arXiv preprint arXiv:1712.00716
  53. 53.
    Sur, P., Chen, Y., Candès, E.J.: The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probab. Theory Rel. Fields (to accepted) (2018)Google Scholar
  54. 54.
    Shechtman, Y., Eldar, Y.C., Cohen, O., Chapman, H.N., Miao, J., Segev, M.: Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015)CrossRefGoogle Scholar
  55. 55.
    Soltanolkotabi, M., Javanmard, A., Lee, J.D.: Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. arXiv preprint arXiv:1707.04926 (2017)
  56. 56.
    Sun, R., Luo, Z.-Q.: Guaranteed matrix completion via non-convex factorization. IEEE Trans. Inf. Theory 62(11), 6535–6579 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  57. 57.
    Soltanolkotabi, M.: Algorithms and Theory for Clustering and Nonconvex Quadratic Programming. PhD thesis, Stanford University (2014)Google Scholar
  58. 58.
    Soltanolkotabi, M.: Structured signal recovery from quadratic measurements: breaking sample complexity barriers via nonconvex optimization (2017). arXiv preprint arXiv:1702.06175
  59. 59.
    Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2379–2383. IEEE (2016)Google Scholar
  60. 60.
    Schudy, W., Sviridenko, M.: Concentration and moment inequalities for polynomials of independent random variables. In: Proceedings of the Twenty-Third Annual ACM–SIAM Symposium on Discrete Algorithms, pp. 437–446. ACM, New York (2012)Google Scholar
  61. 61.
    Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, Vol. 48, pp. 964–973. JMLR. org (2016)Google Scholar
  62. 62.
    Tan, Y.S., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees (2017). arXiv preprint arXiv:1706.09993
  63. 63.
    Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices (2010). arXiv preprint arXiv:1011.3027
  64. 64.
    Wei, K.: Solving systems of phaseless equations via Kaczmarz methods: a proof of concept study. Inverse Probl. 31(12), 125008 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  65. 65.
    Wang, G., Giannakis, G.B., Eldar, Y.C.: Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inf. Theory 64(2), 773–794 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  66. 66.
    Wang, G., Giannakis, G.B., Saad, Y., Chen, J.: Solving almost all systems of random quadratic equations (2017). arXiv preprint arXiv:1705.10407
  67. 67.
    Yang, Z., Yang, L.F., Fang, E.X., Zhao, T., Wang, Z., Neykov, M.: Misspecified nonconvex statistical optimization for phase retrieval (2017). arXiv preprint arXiv:1712.06245
  68. 68.
    Zhong, Y., Boumal, N.: Near-optimal bounds for phase synchronization (2017). arXiv preprint arXiv:1703.06605
  69. 69.
    Zhang, H., Chi, Y., Liang, Y.: Provable non-convex phase retrieval with outliers: median truncated Wirtinger flow. In: International Conference on Machine Learning, pp. 1022–1031 (2016)Google Scholar
  70. 70.
    Zhang, T.: Phase retrieval using alternating minimization in a batch setting (2017). arXiv preprint arXiv:1706.08167
  71. 71.
    Zheng, Q., Lafferty, J.: Convergence analysis for rectangular matrix completion using Burer–Monteiro factorization and gradient descent (2016). arXiv preprint arXiv:1605.07051
  72. 72.
    Zhang, L., Wang, G., Giannakis, G.B., Chen, J.: Compressive phase retrieval via reweighted amplitude flow (2017). arXiv preprint arXiv:1712.02426
  73. 73.
    Zhao, T., Wang, Z., Liu, H.: A nonconvex optimization framework for low rank matrix estimation. In: Advances in Neural Information Processing Systems, pp. 559–567 (2015)Google Scholar
  74. 74.
    Zhang, H., Zhou, Y., Liang, Y., Chi, Y.: A nonconvex approach for phase retrieval: reshaped Wirtinger flow and incremental algorithms. J. Mach. Learn. Res. 18(1), 5164–5198 (2017)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Department of Electrical EngineeringPrinceton UniversityPrincetonUSA
  2. 2.Department of Electrical and Computer EngineeringCarnegie Mellon UniversityPittsburghUSA
  3. 3.Department of Operations Research and Financial EngineeringPrinceton UniversityPrincetonUSA

Personalised recommendations