Skip to main content

Advertisement

Log in

Semi-sparse PCA

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

It is well known that the classical exploratory factor analysis (EFA) of data with more observations than variables has several types of indeterminacy. We study the factor indeterminacy and show some new aspects of this problem by considering EFA as a specific data matrix decomposition. We adopt a new approach to the EFA estimation and achieve a new characterization of the factor indeterminacy problem. A new alternative model is proposed, which gives determinate factors and can be seen as a semi-sparse principal component analysis (PCA). An alternating algorithm is developed, where in each step a Procrustes problem is solved. It is demonstrated that the new model/algorithm can act as a specific sparse PCA and as a low-rank-plus-sparse matrix decomposition. Numerical examples with several large data sets illustrate the versatility of the new model, and the performance and behaviour of its algorithmic implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The PCA concept is very specific and well defined: it is equivalent to a low-rank approximation of the data matrix using the singular value decomposition. Our proposed method is similar to PCA in a sense that it gives orthogonal factors, but it is not a PCA in the strict sense. In view of the fact that there are other related concepts, such as sparse PCA or robust PCA, which are not real PCA’s, we decided to name our method semi-sparse PCA, SSPCA.

  2. https://se.mathworks.com/help/stats/select-data-and-validation-for-classification-problem.html?s_tid=srchtitle.

  3. Available from http://datam.i2r.a-star.edu.sg/datasets/krbd/Leukemia/MLL.html.

References

  • Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization Algorithms on Matrix Manifolds. Princeton: Princeton University Press.

    Book  Google Scholar 

  • Adachi, K., & Trendafilov, N. (2017). Sparsest factor analysis for clustering variables: A matrix decomposition approach. Advances in Data Analysis and Classification, 12, 778–794.

    Google Scholar 

  • Aravkin, A., Becker, S., Cevher, V., & Olsen, P. (2014). A variational approach to stable principal component pursuit. In Conference on uncertainty in artificial intelligence (UAI).

  • Armstrong, S. A., Staunton, J. E., Silverman, L. B., Pieters, R., den Boer, M. L., Minden, M. D., et al. (2002). MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30, 41–47.

    Article  PubMed  Google Scholar 

  • Cai, J.-F., Candès, E. J., & Shen, Z. (2008). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20, 1956–1982.

    Article  Google Scholar 

  • Candès, E. J., Li, X., Ma, Y., & Wright, J. (2009). Robust principal component analysis? Journal of ACM, 58, 1–37.

    Article  Google Scholar 

  • De Leeuw, J. (2004). Least squares optimal scaling of partially observed linear systems. In K. van Montfort, J. Oud, & A. Satorra (Eds.), Recent developments on structural equation models: Theory and applications (pp. 121–134). Dordrecht, NL: Kluwer Academic Publishers.

    Chapter  Google Scholar 

  • Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20, 303–353.

    Article  Google Scholar 

  • Eldén, L. (2007). Matrix methods in data mining and pattern recognition. Philadelphia: SIAM.

    Book  Google Scholar 

  • Golub, G. H., & Van Loan, C. F. (2013). Matrix computations (4th ed.). Baltimore, MD: Johns Hopkins University Press.

    Google Scholar 

  • Harman, H. H. (1976). Modern factor analysis (3rd ed.). Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Jolliffe, I. T., Trendafilov, N. T., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531–547.

    Article  Google Scholar 

  • Journée, M., Nesterov, Y., Richtárik, P., & Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. Journal of Machine Learning Research, 11, 517–553.

    Google Scholar 

  • Lin, Z., Chen, M., Wu, L., & Ma, Y. (2009). The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report, UILU-ENG-09-2215, November.

  • Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., & Ma, Y. (2009). Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. UIUC Technical Report, UILU-ENG-09-2214, August.

  • Mulaik, S. A. (2005). Looking back on the factor indeterminacy controversies in factor analysis. In In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary Psychometrics (pp. 174–206). Mahwah, NJ: Lawrence Erlbaum Associates Inc.

    Google Scholar 

  • Mulaik, S. A. (2010). The foundations of factor analysis (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC.

    Google Scholar 

  • Shen, H., & Huang, J. Z. (2008). Sparse principal component analysis via regularized low-rank matrix approximation. Journal of Multivariate Analysis, 99, 1015–1034.

    Article  Google Scholar 

  • Steiger, J. H. (1979). Factor indeterminacy in the 1930’s and the 1970’s: Some interesting parallels. Psychometrika, 44, 157–166.

    Article  Google Scholar 

  • Steiger, J. H., & Schonemann, P. H. (1978). A history of factor indeterminacy (pp. 136–178). Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Trendafilov, N., Fontanella, S., & Adachi, K. (2017). Sparse exploratory factor analysis. Psychometrika, 82, 778–794.

    Article  Google Scholar 

  • Trendafilov, N. T., & Unkel, S. (2011). Exploratory factor analysis of data matrices with more variables than observations. Journal of Computational and Graphical Statistics, 20, 874–891.

    Article  Google Scholar 

  • Unkel, S., & Trendafilov, N. T. (2010). Simultaneous parameter estimation in exploratory factor analysis: An expository review. International Statistical Review, 78, 363–382.

    Article  Google Scholar 

  • Witten, D. M., Tibshirani, R., & Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation. Biostatistics, 10, 515–534.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yuan, X., & Yang, J. (2013). Sparse and low-rank matrix decomposition via alternating direction methods. Pacific Journal of Optimization, 9, 167–180.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nickolay Trendafilov.

Proof of Lemma 2.1

Proof of Lemma 2.1

Proof

In the proof we will, for convenience, denote \(\mathcal {D}= \mathcal {D}(U_1^\top R)\). Without loss of generality, we assume that the diagonal elements are ordered, \(| u_1^\top r_1 | \ge | u_2^\top r_2 | \ge \cdots \ge | u_p^\top r_p |\). Here \(r_i\) and \(u_i\) denote the i’th column of R and \(U_1\), respectively.

Let the CS decomposition (Golub and Van Loan, 2013, Sect. 2.5.4) of U be

$$\begin{aligned} \begin{pmatrix} U_1 \\ U_2 \end{pmatrix} = \begin{pmatrix} Q_1 &{}\quad 0 \\ 0 &{}\quad Q_2 \end{pmatrix} \begin{pmatrix} C \\ S \end{pmatrix} V^\top , \quad C^2 + S^2 = I_p, \end{aligned}$$

where \(Q_1\), \(Q_2\) and V are orthogonal and C and S are diagonal. The diagonal elements of C satisfy \(c_1 \ge c_2 \ge \cdots \ge c_p \ge 0.\) Clearly, the statement of the lemma is equivalent to \(C=I_p\) and \(S=0\).

We insert the CS decomposition in (15), set \(\nabla \Gamma = 0\) and multiply by

$$\begin{aligned} \begin{pmatrix} Q_1^\top &{}\quad 0 \\ 0 &{}\quad Q_2^\top \end{pmatrix} \end{aligned}$$

from the left and by V from the right. We get

$$\begin{aligned} \begin{pmatrix} Q_1^\top R \mathcal {D}V \\ 0 \end{pmatrix} = \begin{pmatrix} C V^\top \mathcal {D}R^\top Q_1 C \\ S V^\top \mathcal {D}R^\top Q_1 C \end{pmatrix}. \end{aligned}$$

With \( B = Q_1^\top R \mathcal {D}V \in \mathbb {R}^{p \times p}\), we thus have \(B = C B^\top C\), or, equivalently,

$$\begin{aligned} B = C^2 B C^2. \end{aligned}$$
(26)

Clearly, for any \(b_{ij} \ne 0\) we must have \(c_i = c_j = 1\). Assume that

$$\begin{aligned} B = \begin{pmatrix} B_s &{}\quad 0 \\ 0 &{}\quad 0 \end{pmatrix}, \quad B_s \in \mathbb {R}^{s \times s}, \end{aligned}$$
(27)

and that, for some \(1 \le i \le p\), \(b_{is} \ne 0,\) or, for some \(1 \le j \le p\), \(b_{sj} \ne 0\) (we allow \(s=p\), in which case \(B_s=B\)). Then, since \(b_{is} = c_i^2 b_{is} c_s^2\), or \(b_{sj} = c_s^2 b_{sj} c_j^2\), and since the \(c_i\)’s are ordered, we must have \(c_1 = \cdots = c_s = 1\). Thus, if \(s=p\), then \(C=I_p\), and the lemma is true.

If \(s<p\), C has the structure

$$\begin{aligned} C = \begin{pmatrix} I_s &{}\quad 0 \\ 0 &{}\quad C_2 \end{pmatrix}, \end{aligned}$$

where \(C_2=0\) or its diagonal elements are nonnegative.

We now assume that

$$\begin{aligned} C_2={{\mathrm{diag}}}(c_{s+1},\ldots ,c_t,0,\ldots ,0) \end{aligned}$$
(28)

with \(c_t>0\), i.e. \(U_1\) has rank \(t<p\). We will show that then the stationary point does not correspond to a global maximum.

Due to (27), we can write

$$\begin{aligned} B V^\top = Q_1^\top R \mathcal {D}= \begin{pmatrix} \bar{B}_{s} &{} \bar{B}_{1s} \\ 0 &{} 0 \end{pmatrix}. \end{aligned}$$

Consider the last row of \(BV^\top \):

$$\begin{aligned} 0&= \begin{pmatrix} q_{p}^\top r_1 &{} q_{p}^\top r_2 &{} \cdots &{} q_{p}^\top r_p\\ \end{pmatrix} \begin{pmatrix} u_1^\top r_1 \\ &{} u_2^\top r_2 \\ &{} &{} \ddots \\ &{} &{} &{} u_p^\top r_p \end{pmatrix}\\&=: c^\top \mathcal {D}, \end{aligned}$$

where \(r_i\) denotes the i’th column of R. Since R is nonsingular, there must exist at least one nonzero element in \(c^\top \), say \(q_p^\top r_k\). Then the corresponding element i \(\mathcal {D}\) must be equal to zero, \(u_k^\top r_k=0\).

Under the assumption (28), \(U_1\) has rank t: using the CS decomposition we can write \(u_j = \sum _{i=1}^t c_i v_{ji} q_i\), for \(j=1,2,\ldots ,p\). Clearly \(q_p\) is orthogonal to \(\{u_1 ,\; u_2,\; \ldots , u_p\}\). Thus, we can replace the column \(u_k\) in \(U_1\) by \(q_p\) and make the objective function larger. It follows that the assumption that \({{\mathrm{rank}}}(U_1)=t<p\) cannot be valid at the global maximum.

It remains to consider the case when all the diagonal elements of C are positive, and \(U_1\) is nonsingular. Due to the structure (27), we have

$$\begin{aligned} (Q_1^\top R)^{-1} B = \mathcal {D}V = \begin{pmatrix} \tilde{B}_{s} &{}\quad 0 \\ \tilde{B}_{s1} &{}\quad 0 \end{pmatrix}. \end{aligned}$$

With the corresponding blocking \(V = (V_1 \; V_2)\), we have \(\mathcal {D}V_2 = 0\), i.e. \(\mathcal {D}\) has a null space of dimension \(p-s\). Since the diagonal elements of \(\mathcal {D}\) are ordered, it follows that \(\mathcal {D}\) has the structure

$$\begin{aligned} \mathcal {D}= \begin{pmatrix} \mathcal {D}_s &{}\quad 0 \\ 0 &{}\quad 0 \end{pmatrix}, \quad \mathcal {D}_{s} \in \mathbb {R}^{s \times s}, \end{aligned}$$

where \(\mathcal {D}_s\) is nonsingular. Put

$$\begin{aligned} V_2 = \begin{pmatrix} V_{12} \\ V_{22} \end{pmatrix}, \quad V_{22} \in \mathbb {R}^{(p-s) \times (p-s)}. \end{aligned}$$

From the identity \(\mathcal {D}V_2 =0\), we then get \(V_{12}=0\); consequently, \(V_{22}\) is an orthogonal matrix, and it follows that

$$\begin{aligned} \quad V = \begin{pmatrix} V_{11} &{}\quad 0 \\ 0 &{}\quad V_{22} \end{pmatrix}. \end{aligned}$$

From the CS decomposition, we then have

$$\begin{aligned} u_j = {\left\{ \begin{array}{ll} \sum _{i=1}^s v_{ji} q_i, &{} j=1,2,\ldots ,s,\\ \sum _{i=s+1}^p c_i v_{ji} q_i, &{} j=s+1,\ldots ,p. \end{array}\right. } \end{aligned}$$

It follows that \(q_p\) is orthogonal to \(u_j\) for \(j=1,2,\ldots ,s\), and as in the cases above we can now replace \(u_p\) by \(q_p\) and increase the value of the objective function.

Thus, we have shown that for \(C \ne I_p\) the stationary point does not correspond to the global maximum, which proves the lemma. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eldén, L., Trendafilov, N. Semi-sparse PCA. Psychometrika 84, 164–185 (2019). https://doi.org/10.1007/s11336-018-9650-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-018-9650-9

Keywords

Navigation