Semi-sparse PCA

Eldén, Lars; Trendafilov, Nickolay

doi:10.1007/s11336-018-9650-9

Semi-sparse PCA

Published: 27 November 2018

Volume 84, pages 164–185, (2019)
Cite this article

Psychometrika Aims and scope Submit manuscript

619 Accesses
2 Citations
Explore all metrics

Abstract

It is well known that the classical exploratory factor analysis (EFA) of data with more observations than variables has several types of indeterminacy. We study the factor indeterminacy and show some new aspects of this problem by considering EFA as a specific data matrix decomposition. We adopt a new approach to the EFA estimation and achieve a new characterization of the factor indeterminacy problem. A new alternative model is proposed, which gives determinate factors and can be seen as a semi-sparse principal component analysis (PCA). An alternating algorithm is developed, where in each step a Procrustes problem is solved. It is demonstrated that the new model/algorithm can act as a specific sparse PCA and as a low-rank-plus-sparse matrix decomposition. Numerical examples with several large data sets illustrate the versatility of the new model, and the performance and behaviour of its algorithmic implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Exploratory Factor Analysis

Article 13 July 2017

Comparing Classical and Robust Sparse PCA

Weakly Correlated Sparse Components with Nearly Orthonormal Loadings

Notes

The PCA concept is very specific and well defined: it is equivalent to a low-rank approximation of the data matrix using the singular value decomposition. Our proposed method is similar to PCA in a sense that it gives orthogonal factors, but it is not a PCA in the strict sense. In view of the fact that there are other related concepts, such as sparse PCA or robust PCA, which are not real PCA’s, we decided to name our method semi-sparse PCA, SSPCA.
https://se.mathworks.com/help/stats/select-data-and-validation-for-classification-problem.html?s_tid=srchtitle.
Available from http://datam.i2r.a-star.edu.sg/datasets/krbd/Leukemia/MLL.html.

References

Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization Algorithms on Matrix Manifolds. Princeton: Princeton University Press.
Book Google Scholar
Adachi, K., & Trendafilov, N. (2017). Sparsest factor analysis for clustering variables: A matrix decomposition approach. Advances in Data Analysis and Classification, 12, 778–794.
Google Scholar
Aravkin, A., Becker, S., Cevher, V., & Olsen, P. (2014). A variational approach to stable principal component pursuit. In Conference on uncertainty in artificial intelligence (UAI).
Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., et al. (2002). MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30, 41–47.
Article PubMed Google Scholar
Cai, J.-F., Candès, E. J., & Shen, Z. (2008). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20, 1956–1982.
Article Google Scholar
Candès, E. J., Li, X., Ma, Y., & Wright, J. (2009). Robust principal component analysis? Journal of ACM, 58, 1–37.
Article Google Scholar
De Leeuw, J. (2004). Least squares optimal scaling of partially observed linear systems. In K. van Montfort, J. Oud, & A. Satorra (Eds.), Recent developments on structural equation models: Theory and applications (pp. 121–134). Dordrecht, NL: Kluwer Academic Publishers.
Chapter Google Scholar
Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20, 303–353.
Article Google Scholar
Eldén, L. (2007). Matrix methods in data mining and pattern recognition. Philadelphia: SIAM.
Book Google Scholar
Golub, G. H., & Van Loan, C. F. (2013). Matrix computations (4th ed.). Baltimore, MD: Johns Hopkins University Press.
Google Scholar
Harman, H. H. (1976). Modern factor analysis (3rd ed.). Chicago, IL: University of Chicago Press.
Google Scholar
Jolliffe, I. T., Trendafilov, N. T., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531–547.
Article Google Scholar
Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. Journal of Machine Learning Research, 11, 517–553.
Google Scholar
Lin, Z., Chen, M., Wu, L., & Ma, Y. (2009). The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report, UILU-ENG-09-2215, November.
Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., & Ma, Y. (2009). Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. UIUC Technical Report, UILU-ENG-09-2214, August.
Mulaik, S. A. (2005). Looking back on the factor indeterminacy controversies in factor analysis. In In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary Psychometrics (pp. 174–206). Mahwah, NJ: Lawrence Erlbaum Associates Inc.
Google Scholar
Mulaik, S. A. (2010). The foundations of factor analysis (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC.
Google Scholar
Shen, H., & Huang, J. Z. (2008). Sparse principal component analysis via regularized low-rank matrix approximation. Journal of Multivariate Analysis, 99, 1015–1034.
Article Google Scholar
Steiger, J. H. (1979). Factor indeterminacy in the 1930’s and the 1970’s: Some interesting parallels. Psychometrika, 44, 157–166.
Article Google Scholar
Steiger, J. H., & Schonemann, P. H. (1978). A history of factor indeterminacy (pp. 136–178). Chicago, IL: University of Chicago Press.
Google Scholar
Trendafilov, N., Fontanella, S., & Adachi, K. (2017). Sparse exploratory factor analysis. Psychometrika, 82, 778–794.
Article Google Scholar
Trendafilov, N. T., & Unkel, S. (2011). Exploratory factor analysis of data matrices with more variables than observations. Journal of Computational and Graphical Statistics, 20, 874–891.
Article Google Scholar
Unkel, S., & Trendafilov, N. T. (2010). Simultaneous parameter estimation in exploratory factor analysis: An expository review. International Statistical Review, 78, 363–382.
Article Google Scholar
Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation. Biostatistics, 10, 515–534.
Article PubMed PubMed Central Google Scholar
Yuan, X., & Yang, J. (2013). Sparse and low-rank matrix decomposition via alternating direction methods. Pacific Journal of Optimization, 9, 167–180.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Linköping University, Linköping, Sweden
Lars Eldén
School of Mathematics and Statistics, The Open University, Milton Keynes, UK
Nickolay Trendafilov

Authors

Lars Eldén
View author publications
You can also search for this author in PubMed Google Scholar
Nickolay Trendafilov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nickolay Trendafilov.

Proof of Lemma 2.1

Proof

In the proof we will, for convenience, denote $\mathcal {D}= \mathcal {D}(U_1^\top R)$. Without loss of generality, we assume that the diagonal elements are ordered, $| u_1^\top r_1 | \ge | u_2^\top r_2 | \ge \cdots \ge | u_p^\top r_p |$. Here $r_i$ and $u_i$ denote the i’th column of R and $U_1$, respectively.

Let the CS decomposition (Golub and Van Loan, 2013, Sect. 2.5.4) of U be

$$\begin{aligned} \begin{pmatrix} U_1 \\ U_2 \end{pmatrix} = \begin{pmatrix} Q_1 &{}\quad 0 \\ 0 &{}\quad Q_2 \end{pmatrix} \begin{pmatrix} C \\ S \end{pmatrix} V^\top , \quad C^2 + S^2 = I_p, \end{aligned}$$

where $Q_1$, $Q_2$ and V are orthogonal and C and S are diagonal. The diagonal elements of C satisfy $c_1 \ge c_2 \ge \cdots \ge c_p \ge 0.$ Clearly, the statement of the lemma is equivalent to $C=I_p$ and $S=0$.

We insert the CS decomposition in (15), set $\nabla \Gamma = 0$ and multiply by

$$\begin{aligned} \begin{pmatrix} Q_1^\top &{}\quad 0 \\ 0 &{}\quad Q_2^\top \end{pmatrix} \end{aligned}$$

from the left and by V from the right. We get

$$\begin{aligned} \begin{pmatrix} Q_1^\top R \mathcal {D}V \\ 0 \end{pmatrix} = \begin{pmatrix} C V^\top \mathcal {D}R^\top Q_1 C \\ S V^\top \mathcal {D}R^\top Q_1 C \end{pmatrix}. \end{aligned}$$

With $ B = Q_1^\top R \mathcal {D}V \in \mathbb {R}^{p \times p}$, we thus have $B = C B^\top C$, or, equivalently,

$$\begin{aligned} B = C^2 B C^2. \end{aligned}$$

(26)

Clearly, for any $b_{ij} \ne 0$ we must have $c_i = c_j = 1$. Assume that

$$\begin{aligned} B = \begin{pmatrix} B_s &{}\quad 0 \\ 0 &{}\quad 0 \end{pmatrix}, \quad B_s \in \mathbb {R}^{s \times s}, \end{aligned}$$

(27)

and that, for some $1 \le i \le p$, $b_{is} \ne 0,$ or, for some $1 \le j \le p$, $b_{sj} \ne 0$ (we allow $s=p$, in which case $B_s=B$). Then, since $b_{is} = c_i^2 b_{is} c_s^2$, or $b_{sj} = c_s^2 b_{sj} c_j^2$, and since the $c_i$’s are ordered, we must have $c_1 = \cdots = c_s = 1$. Thus, if $s=p$, then $C=I_p$, and the lemma is true.

If $s<p$, C has the structure

$$\begin{aligned} C = \begin{pmatrix} I_s &{}\quad 0 \\ 0 &{}\quad C_2 \end{pmatrix}, \end{aligned}$$

where $C_2=0$ or its diagonal elements are nonnegative.

We now assume that

$$\begin{aligned} C_2={{\mathrm{diag}}}(c_{s+1},\ldots ,c_t,0,\ldots ,0) \end{aligned}$$

(28)

with $c_t>0$, i.e. $U_1$ has rank $t<p$. We will show that then the stationary point does not correspond to a global maximum.

Due to (27), we can write

$$\begin{aligned} B V^\top = Q_1^\top R \mathcal {D}= \begin{pmatrix} \bar{B}_{s} &{} \bar{B}_{1s} \\ 0 &{} 0 \end{pmatrix}. \end{aligned}$$

Consider the last row of $BV^\top $:

$$\begin{aligned} 0&= \begin{pmatrix} q_{p}^\top r_1 &{} q_{p}^\top r_2 &{} \cdots &{} q_{p}^\top r_p\\ \end{pmatrix} \begin{pmatrix} u_1^\top r_1 \\ &{} u_2^\top r_2 \\ &{} &{} \ddots \\ &{} &{} &{} u_p^\top r_p \end{pmatrix}\\&=: c^\top \mathcal {D}, \end{aligned}$$

where $r_i$ denotes the i’th column of R. Since R is nonsingular, there must exist at least one nonzero element in $c^\top $, say $q_p^\top r_k$. Then the corresponding element i $\mathcal {D}$ must be equal to zero, $u_k^\top r_k=0$.

Under the assumption (28), $U_1$ has rank t: using the CS decomposition we can write $u_j = \sum _{i=1}^t c_i v_{ji} q_i$, for $j=1,2,\ldots ,p$. Clearly $q_p$ is orthogonal to $\{u_1 ,\; u_2,\; \ldots , u_p\}$. Thus, we can replace the column $u_k$ in $U_1$ by $q_p$ and make the objective function larger. It follows that the assumption that ${{\mathrm{rank}}}(U_1)=t<p$ cannot be valid at the global maximum.

It remains to consider the case when all the diagonal elements of C are positive, and $U_1$ is nonsingular. Due to the structure (27), we have

$$\begin{aligned} (Q_1^\top R)^{-1} B = \mathcal {D}V = \begin{pmatrix} \tilde{B}_{s} &{}\quad 0 \\ \tilde{B}_{s1} &{}\quad 0 \end{pmatrix}. \end{aligned}$$

With the corresponding blocking $V = (V_1 \; V_2)$, we have $\mathcal {D}V_2 = 0$, i.e. $\mathcal {D}$ has a null space of dimension $p-s$. Since the diagonal elements of $\mathcal {D}$ are ordered, it follows that $\mathcal {D}$ has the structure

$$\begin{aligned} \mathcal {D}= \begin{pmatrix} \mathcal {D}_s &{}\quad 0 \\ 0 &{}\quad 0 \end{pmatrix}, \quad \mathcal {D}_{s} \in \mathbb {R}^{s \times s}, \end{aligned}$$

where $\mathcal {D}_s$ is nonsingular. Put

$$\begin{aligned} V_2 = \begin{pmatrix} V_{12} \\ V_{22} \end{pmatrix}, \quad V_{22} \in \mathbb {R}^{(p-s) \times (p-s)}. \end{aligned}$$

From the identity $\mathcal {D}V_2 =0$, we then get $V_{12}=0$; consequently, $V_{22}$ is an orthogonal matrix, and it follows that

$$\begin{aligned} \quad V = \begin{pmatrix} V_{11} &{}\quad 0 \\ 0 &{}\quad V_{22} \end{pmatrix}. \end{aligned}$$

From the CS decomposition, we then have

$$\begin{aligned} u_j = {\left\{ \begin{array}{ll} \sum _{i=1}^s v_{ji} q_i, &{} j=1,2,\ldots ,s,\\ \sum _{i=s+1}^p c_i v_{ji} q_i, &{} j=s+1,\ldots ,p. \end{array}\right. } \end{aligned}$$

It follows that $q_p$ is orthogonal to $u_j$ for $j=1,2,\ldots ,s$, and as in the cases above we can now replace $u_p$ by $q_p$ and increase the value of the objective function.

Thus, we have shown that for $C \ne I_p$ the stationary point does not correspond to the global maximum, which proves the lemma. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eldén, L., Trendafilov, N. Semi-sparse PCA. Psychometrika 84, 164–185 (2019). https://doi.org/10.1007/s11336-018-9650-9

Download citation

Received: 18 August 2017
Published: 27 November 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11336-018-9650-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-sparse PCA

Abstract

Access this article

Similar content being viewed by others

Sparse Exploratory Factor Analysis

Comparing Classical and Robust Sparse PCA

Weakly Correlated Sparse Components with Nearly Orthonormal Loadings

Notes

References

Author information

Authors and Affiliations

Corresponding author

Proof of Lemma 2.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-sparse PCA

Abstract

Access this article

Similar content being viewed by others

Sparse Exploratory Factor Analysis

Comparing Classical and Robust Sparse PCA

Weakly Correlated Sparse Components with Nearly Orthonormal Loadings

Notes

References

Author information

Authors and Affiliations

Corresponding author

Proof of Lemma 2.1

Proof of Lemma 2.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation