Advertisement

MASAGA: A Linearly-Convergent Stochastic First-Order Method for Optimization on Manifolds

  • Reza BabanezhadEmail author
  • Issam H. Laradji
  • Alireza Shafaei
  • Mark Schmidt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

We consider the stochastic optimization of finite sums over a Riemannian manifold where the functions are smooth and convex. We present MASAGA, an extension of the stochastic average gradient variant SAGA on Riemannian manifolds. SAGA is a variance-reduction technique that typically outperforms methods that rely on expensive full-gradient calculations, such as the stochastic variance-reduced gradient method. We show that MASAGA achieves a linear convergence rate with uniform sampling, and we further show that MASAGA achieves a faster convergence rate with non-uniform sampling. Our experiments show that MASAGA is faster than the recent Riemannian stochastic gradient descent algorithm for the classic problem of finding the leading eigenvector corresponding to the maximum eigenvalue. Code related to this paper is available at: https://github.com/IssamLaradji/MASAGA.

Keywords

Variance reduced stochastic optimization Riemannian manifold 

Supplementary material

478890_1_En_21_MOESM1_ESM.pdf (142 kb)
Supplementary material 1 (pdf 142 KB)

References

  1. 1.
    Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)zbMATHGoogle Scholar
  2. 2.
    Agarwal, A., Bottou, L.: A lower bound for the optimization of finite sums. arXiv preprint (2014)Google Scholar
  3. 3.
    Bietti, A., Mairal, J.: Stochastic optimization with variance reduction for infinite datasets with finite sum structure. In: Advances in Neural Information Processing Systems, pp. 1622–1632 (2017)Google Scholar
  4. 4.
    Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Cauchy, M.A.: Méthode générale pour la résolution des systèmes d’équations simultanées. Comptes rendus des séances de l’Académie des sciences de Paris 25, 536–538 (1847)Google Scholar
  6. 6.
    Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  7. 7.
    Dubey, K.A., Reddi, S.J., Williamson, S.A., Poczos, B., Smola, A.J., Xing, E.P.: Variance reduction in stochastic gradient Langevin dynamics. In: Advances in Neural Information Processing Systems, pp. 1154–1162 (2016)Google Scholar
  8. 8.
    Guyon, C., Bouwmans, T., Zahzah, E.h.: Robust principal component analysis for background subtraction: systematic evaluation and comparative analysis. In: Principal Component Analysis. InTech (2012)Google Scholar
  9. 9.
    Harikandeh, R., Ahmed, M.O., Virani, A., Schmidt, M., Konečnỳ, J., Sallinen, S.: StopWasting my gradients: practical SVRG. In: Advances in Neural Information Processing Systems, pp. 2251–2259 (2015)Google Scholar
  10. 10.
    Hosseini, R., Sra, S.: Matrix manifold optimization for Gaussian mixtures. In: Advances in Neural Information Processing Systems, pp. 910–918 (2015)Google Scholar
  11. 11.
    Jeuris, B., Vandebril, R., Vandereycken, B.: A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal. 39(EPFL-ARTICLE-197637), 379–402 (2012)Google Scholar
  12. 12.
    Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems (2013)Google Scholar
  13. 13.
    Kamvar, S., Haveliwala, T., Golub, G.: Adaptive methods for the computation of pagerank. Linear Algebra Appl. 386, 51–65 (2004)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic variance reduced gradient on Grassmann manifold. arXiv preprint arXiv:1605.07367 (2016)
  15. 15.
    Konečnỳ, J., Richtárik, P.: Semi-stochastic gradient descent methods. arXiv preprint (2013)Google Scholar
  16. 16.
    Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for strongly-convex optimization with finite training sets. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  17. 17.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  18. 18.
    Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 171–177 (2010).  https://doi.org/10.1109/TPAMI.2009.112CrossRefGoogle Scholar
  19. 19.
    Mahdavi, M., Jin, R.: MixedGrad: an \(o(1/t\)) convergence rate algorithm for stochastic smooth optimization. In: Advances in Neural Information Processing Systems (2013)Google Scholar
  20. 20.
    Mairal, J.: Optimization with first-order surrogate functions. arXiv preprint arXiv:1305.3120 (2013)
  21. 21.
    Needell, D., Ward, R., Srebro, N.: Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In: Advances in Neural Information Processing Systems, pp. 1017–1025 (2014)Google Scholar
  22. 22.
    Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \({O}(1/k^2)\). Doklady AN SSSR 269(3), 543–547 (1983)Google Scholar
  24. 24.
    Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  25. 25.
    Nguyen, L., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. arXiv preprint arXiv:1703.00102 (2017)
  26. 26.
    Nutini, J., Laradji, I., Schmidt, M.: Let’s Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence. ArXiv e-prints, December 2017Google Scholar
  27. 27.
    Petersen, P., Axler, S., Ribet, K.: Riemannian Geometry, vol. 171. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-26654-1CrossRefGoogle Scholar
  28. 28.
    Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Contr. Optim. 30(4), 838–855 (1992)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951).  https://doi.org/10.1214/aoms/1177729586MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient. arXiv preprint arXiv:1702.05594 (2017)
  31. 31.
    Schmidt, M., Babanezhad, R., Ahmed, M., Defazio, A., Clifton, A., Sarkar, A.: Non-uniform stochastic average gradient method for training conditional random fields. In: Artificial Intelligence and Statistics, pp. 819–828 (2015)Google Scholar
  32. 32.
    Shalev-Schwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Sra, S., Hosseini, R.: Geometric optimization in machine learning. In: Minh, H.Q., Murino, V. (eds.) Algorithmic Advances in Riemannian Geometry and Applications. ACVPR, pp. 73–91. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45026-1_3CrossRefzbMATHGoogle Scholar
  34. 34.
    Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere. In: 2015 International Conference on Sampling Theory and Applications (SampTA), pp. 407–410. IEEE (2015)Google Scholar
  35. 35.
    Udriste, C.: Convex Functions and Optimization Methods on Riemannian Manifolds, vol. 297. Springer, Dordrecht (1994).  https://doi.org/10.1007/978-94-015-8390-9CrossRefzbMATHGoogle Scholar
  36. 36.
    Wiesel, A.: Geodesic convexity and covariance estimation. IEEE Trans. Signal Process. 60(12), 6182–6189 (2012)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Zhang, H., Reddi, S.J., Sra, S.: Riemannian SVRG: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592–4600 (2016)Google Scholar
  38. 38.
    Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: Conference on Learning Theory, pp. 1617–1638 (2016)Google Scholar
  39. 39.
    Zhang, T., Yang, Y.: Robust principal component analysis by manifold optimization. arXiv preprint arXiv:1708.00257 (2017)
  40. 40.
    Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning, pp. 1–9 (2015)Google Scholar
  41. 41.
    Ziller, W.: Riemannian manifolds with positive sectional curvature. In: Dearricott, O., Galaz-Garcia, F., Kennard, L., Searle, C., Weingart, G., Ziller, W. (eds.) Geometry of Manifolds with Non-negative Sectional Curvature. LNM, vol. 2110, pp. 1–19. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-06373-7_1CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Reza Babanezhad
    • 1
    Email author
  • Issam H. Laradji
    • 1
  • Alireza Shafaei
    • 1
  • Mark Schmidt
    • 1
  1. 1.Department of Computer ScienceUniversity of British ColumbiaVancouverCanada

Personalised recommendations