Abstract
This paper studies the complexity of the stochastic gradient algorithm for PCA when the data are observed in a streaming setting. We also propose an online approach for selecting the learning rate. Simulation experiments confirm the practical relevance of the plain stochastic gradient approach and that drastic improvements can be achieved by learning the learning rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allen-Zhu, Z., Li, Y.: LazySVD: even faster SVD decomposition yet without agonizing pain. In: Advances in Neural Information Processing Systems, pp. 974–982 (2016)
Bandeira, A.S.: Ten lectures and forty-two open problems in the mathematics of data science (2015)
Cardot, H., Degras, D.: Online principal component analysis in high dimension: which algorithm to choose? arXiv preprint arXiv:1511.03688 (2015)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Grone, R., Johnson, C.R., Sá, E.M., Wolkowicz, H.: Positive definite completions of partial Hermitian matrices. Linear Algebra Appl. 58, 109–124 (1984)
Hazan, E., et al.: Introduction to online convex optimization. Found. Trends® Optim. 2(3–4), 157–325 (2016)
Jin, C., Kakade, S.M., Musco, C., Netrapalli, P., Sidford, A.: Robust shift-and-invert preconditioning: faster and more sample efficient algorithms for eigenvector computation, arXiv preprint arXiv:1510.08896 (2015)
Laurent, M.: A tour d’horizon on positive semidefinite and euclidean distance matrix completion problems. Top. Semidefinite Inter.-Point Methods 18, 51–76 (1998)
Laurent, M.: Matrix completion problems. In: Floudas, C.A., Pardalos, P.M. (eds.) Encyclopedia of Optimization, pp. 1311–1319. Springer, Boston (2001). https://doi.org/10.1007/0-306-48332-7_271
Shalev-Shwartz, S., et al.: Online learning and online convex optimization. Found. Trends®Mach. Learn. 4(2), 107–194 (2012)
Shamir, O.: A stochastic PCA and SVD algorithm with an exponential convergence rate. In: ICML, pp. 144–152 (2015)
Shamir, O.: Convergence of stochastic gradient descent for PCA. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, pp. 257–265, vol. 48. JMLR.org (2016)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Technical lemmæ
A Technical lemmæ
Recall that
Lemma 2
In the case of matrix completion, given a matrix X, we have
Proof
The resulting matrix writes
Therefore the expected matrix writes
Using the symmetry of A gives the result.
Now our next goal is to see how \(\mathrm {diag}\left( A^{\top } \mathrm {diag}(\mathbb E[B_{T-1}])A\right) \) evolves with the iterations. For this purpose, take the diagonal of (12), multiply from the left by \(A^{\top } \) and from the right by A and take the diagonal of the resulting expression.
Lemma 3
We have that
Proof
Expanding the recurrence relationship (12) gives
For any diagonal matrix \(\varDelta \) and symmetric matrix A, we have
Therefore, by taking the operator norm on both sides of the equality, we have
We conclude using \(\Vert \mathrm {diag}(A^{\top } E[B_{T-1}])\Vert \le \Vert A\Vert _{1\rightarrow 2}\Vert \mathbb E[B_{T-1}]\Vert _{1\rightarrow 2}\) and \(\Vert A\Vert _{1\rightarrow 2 }\le 1\).
We also have to understand how the \(\ell _{1\rightarrow 2}\) norm evolves.
Lemma 4
We have
Proof
Expanding the recurrence relationship gives
For a diagonal matrix \(\varDelta \), we have \(\Vert \varDelta \Vert _{1\rightarrow 2}=\Vert \varDelta \Vert \). This leads to
Finally, using \(\Vert A\Vert _{1\rightarrow 2}\le 1\) concludes the proof.
We then have to understand how the operator norm of \(\mathbb E [B_T]\) evolves
Lemma 5
We have
Proof
Expanding the recurrence relationship (12) return
Then using similar inequalities as in the proof of the lemmas above, we have the result.
Lemma 6
Let \(\Vert A\Vert =1\), then we have
where
Proof
Expanding the recurrence and using Eqs. (27), (30), and (31) yields the following system
To obtain the result, we expand the inequality by recurrence. Therefore, we are interested in computing the T-th power of the matrix in inequality (33). We have
After computing the power matrices, it result that
We conclude after computing the sums and bounding from above \(\Vert \mathbb {E}[B_0]\Vert \) by \(\max _{j}(1-\varepsilon -s_j)\).
Lemma 7
For \(\eta <1\) and \(\varepsilon >0\), we have
Proof
Denote \(f(s)=(1+2\eta \ s)^{T}(1-\varepsilon -s)\). Differentiating f and setting to zero, we obtain
Let \(s_c=\frac{T-\varepsilon -1/2\eta }{T+1}\) denote this critical point. Consider the two following cases:
-
if \(s_c \notin [0,1]\), then f has no critical point in the domain and therefore is maximised at either domain endpoint, i.e.
$$\max _{s\in [0,1]} f(s)=\max \{ f(0)=1-\varepsilon ,f(1)=-\varepsilon (1+2\eta )^T\}\le 1$$ -
if \(s_c \in [0,1]\), then f is maximised at \(s_c\) and the value of f at \(s_c\) is
$$\begin{aligned}&\Bigg ( 1+2\eta \frac{T(1-\varepsilon )-1/2\eta }{T+1}\Bigg )^{T} \Bigg (1-\varepsilon -\frac{T(1-\varepsilon )-1/2\eta }{T+1}\Bigg )\\&=\Bigg (1+\frac{2\eta T(1-\varepsilon )-1}{T+1}\Bigg )^{T} \Bigg (\frac{1-\varepsilon + 1/2\eta }{T+1}\Bigg )\\&\le (1+2\eta (1-\varepsilon ))^{T} \Bigg (\frac{1+1/2\eta }{T+1}\Bigg ) \le \frac{(1+2\eta (1-\varepsilon ))^{T}}{\eta (T+1)}. \end{aligned}$$
This analysis proves that the maximum value f can achieve is less than
\(\max \{1, \frac{(1+2\eta (1-\varepsilon ))^{T}}{\eta (T+1)}\}\le 1+ \frac{(1+2\eta (1-\varepsilon ))^{T}}{\eta (T+1)}\}\). Hence the result.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chrétien, S., Guyeux, C., Ho, ZW.O. (2019). Average Performance Analysis of the Stochastic Gradient Method for Online PCA. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-13709-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13708-3
Online ISBN: 978-3-030-13709-0
eBook Packages: Computer ScienceComputer Science (R0)