Some inequalities contrasting principal component and factor analyses solutions

Original Paper

Abstract

Principal component analysis (PCA) and factor analysis (FA) are two time-honored dimension reduction methods. In this paper, some inequalities are presented to contrast the parameters’ estimates in PCA and FA. For this reason, we take advantage of the recently established matrix decomposition (MD) formulation of FA. In summary, the resulting inequalities show that (1) FA gives a better fit to a data set than PCA, (2) PCA extracts a larger amount of common “information” than FA, and (3) for each variable, its unique variance in FA is larger than its residual variance in PCA minus the one in FA. The resulting inequalities can be useful to suggest whether PCA or FA should be used for a particular data set. The answers can also be valid for the classic FA formulation not relying on the MD-FA definition, as both “types” FA provide almost equal solutions. Additionally, the inequalities give theoretical explanation of some empirically observed tendencies in PCA and FA solutions, e.g., that the absolute values of PCA loadings tend to be larger than those for FA loadings and that the unique variances in FA tend to be larger than the residual variances of PCA.

1 Introduction

Principal component analysis (PCA) was conceived by Pearson (1901) and formulated by Hotelling (1933) who named the procedure PCA. On the other hand, factor analysis (FA) was proposed by Spearman (1904) and further developed to its modern form as known today by Thurstone (1935). Both procedures are time-honored dimension reduction methods for an n-observations × q-variables column-centered data matrix X = [x1, …, xq]. Thus, PCA and FA are often applied on identical data sets (e.g., Adachi 2016; Jolliffe 2002). The resulting solutions are compared mathematically and numerically in this paper. Throughout the paper, n ≥ rank(X) = q is supposed with rank(X) denoting the rank of X.

PCA can be formulated in a number of different ways (Okamoto 1969; ten Berge and Kiers 1996). One of them is to define PCA as “composing scores by variables”, i.e., summing weighted observed variables to provide composite scores (Hotelling 1933). This formulation of PCA is rather opposite to the FA assumption of “composing variables by scores”, i.e., summing the weighted unobserved (factor) scores to provide observed variables. A formulation, which allows PCA to be comparable with FA, is to model PCA as
$${\mathbf{X}} = {\mathbf{PC^{\prime}}} + {\mathbf{E}}_{\text{PC}} ,$$
(1)
where P = [p1, …, pm] is an n-observation × m-components PC score matrix, C = (cjk) is a q × m component loading matrix, and EPC (n × q) contains errors, with mq. (e.g., Adachi 2016). The implication of (1) can be illustrated as in Fig. 1a: in (1) the variables x1, …, xq are commonly explained by the PC scores p1, …, pm which are weighted by their loadings cjk, and the errors in EPC remain unexplained. In this point, we call PC′ a common part. The matrices P and C that minimize a least square error ||EPC||2 = trEPCEPC = ||X − PC′||2 is given through the singular value decomposition (SVD) of X (Eckart and Young 1936). Thus, PCA can be regarded as a matrix decomposition problem for approximating X by a lower rank matrix PC′ with rank(PC′) = mq, also known as the truncated SVD. Fig. 1 Graphical representation of PCA as reduced rank approximation (RRA) and FA with q = 5 and m = 2
In a similar manner as PCA, FA can be formulated as a matrix decomposition problem. It was first proposed by Henk A. L. Kiers as described in Sočan (2003, pp. 19–20) and recently established (Unkel and Trendafilov 2010; Stegeman 2016; Adachi and Trendafilov 2018). In this formulation, FA is modeled as
$${\mathbf{X}} = {\mathbf{FA^{\prime}}} + {\mathbf{U}}{\varvec{\Psi}} + {\mathbf{E}}_{\text{FA}} .$$
(2)
Here, F = [f1, …, fm] is the n × m matrix containing common factor scores, U = [u1, …, uq] is the n × q matrix of unique factor scores, A = (ajk) is a q × m factor loading matrix, Ψ is the q × q diagonal matrix whose diagonal elements are ψ1, …, ψq, and EFA (n × q) contains unsystematic errors, with ψ 1 2 , …, ψ q 2 called unique variances. The factor score matrices are constrained as
$$\frac{1}{n}{\mathbf{F^{\prime}F}} = {\mathbf{I}}_{m} ,\quad\frac{1}{n}{\mathbf{U^{\prime}U}} = {\mathbf{I}}_{q} ,{\text{ and }}\quad{\mathbf{F^{\prime}U}} ={{}_{m}}{\mathbf{O}}_{q}$$
(3)
with mOq being the m × q matrix of zeros. The implications of (2) can be illustrated in Fig. 1b: FA′ is the common part as PC′ in (1), while a unique part is added in FA with the jth unique factor uj being weighted by ψj, which affects only (uniquely) the corresponding variable xj.
In (2), all F, A, U, and Ψ are treated as fixed unknown matrices. In contrast, the classic formulation of FA treats the elements of F and U as the random variables following distributional assumptions associated with (3). Then, the covariance matrix among the columns of FA′ +  can be expressed as AA′+Ψ2, which is supposed to approximate the sample counterpart SXX = n−1XX:
$${\mathbf{S}}_{\text{XX}} \cong {\mathbf{AA^{\prime}}} + {\varvec{\Psi}}^{ 2} .$$
(4)
In the classic FA, the discrepancy between SXX and AA′ + Ψ2 is minimized over A and Ψ2 (e.g., Harman 1976; Mulaik 2010). It is known that this approach and the one in the last paragraph provide almost equivalent solutions (Adachi 2012, 2015; Stegeman 2016). However, it is difficult to compare (4) with (1), as the former concerns moments (i.e., covariances), while the latter directly fits the data. Nevertheless, several studies exist comparing the classic FA (4) with PCA (e.g., Bentler and Kano 1990; Ogasawara 2000; Sato 1990). In contrast, comparing (2) with (1) is straightforward, as they both fit the data. The only difference is that (2) also involves a unique part. In this paper, we compare the properties of the PCA solutions with those obtained by the FA procedure based on (2), referred to simply as FA and the classic one based on (4)—as random FA (RFA).
The main goal of this paper is to quantify the illustration in Fig. 2. It depicts how ||X||2, i.e., the total sum of squares (SS) of a data set is decomposed into different SS’s in cases of PCA and FA, and thus, quantifying the critical distiction between them. There, the areas of the common part and residuals for PCA stand for ||$${\mathbf{\hat{P}\hat{C}^{\prime}}}$$||2 and ||$${\hat{\mathbf{E}}}_{\text{PC}}$$||2, respectively. On the other hand, the areas of the common part, unique part, and residuals for FA correspond to ||$${\mathbf{\hat{F}\hat{A}^{\prime}}}$$||2, ||$${\hat{\mathbf{U}}}\widehat{{\varvec{\Psi}}}$$||2, and ||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2, respectively. Here, $${\mathbf{\hat{P}\hat{C^{\prime}}}}$$ denotes the estimate of PC′ in PCA with $${\hat{\mathbf{E}}}_{\text{PC}}$$ containing the resulting residuals, while $${\mathbf{\hat{F}\hat{A}^{\prime}}}$$ and $${\hat{\mathbf{U}}}\widehat{{\varvec{\Psi}}}$$ are the estimates of FA′ and in FA, respectively, with $${\hat{\mathbf{E}}}_{{\text{FA}}}$$ the resulting residual matrix. The relative largeness of the areas in Fig. 2 shows the following: Fig. 2 Relative largeness of common parts, unique part, and residuals in PCA and FA solutions
1. 1.

The common part for PCA is larger than that for FA;

2. 2.

The residual part for FA is smaller than that for PCA;

3. 3.

The unique part for FA is larger than the residual one for PCA.

Here, (1) and (2) always hold, while it is suggested that (3) is often observed. Those assertions are proved in Sect. 3 in the form of several inequalities, after preliminary results are presented in Sect. 2. The theoretical results obtained in Sect. 3 are illustrated in Sects. 4 and 5. Throughout the paper, we distinguish a model parameter from its estimate, by putting “hat” on the former to express the latter, as found above.

2 Preliminary notes

In this section, the solutions of PCA and FA are described, which are followed by their rotational indeterminacy. It serves as the preparations for the next section.

2.1 PCA solution

As described in the last section, PCA can be formulated as minimizing
$$f_{\text{PC}} ({\mathbf{P}},{\mathbf{C}}|{\mathbf{X}}) \, = ||{\mathbf{E}}_{\text{PC}} ||^{ 2} = ||{\mathbf{X}}{\mathbf{ - }}{\mathbf{PC^{\prime}}}||^{ 2}$$
(5)
over P and C for a given data matrix X. It is attained through the SVD of X defined as X = VΘW′ with VV = WW = Iq and Θ the q × q diagonal matrix whose diagonal elements are arranged in decreasing order. The solution satisfies $${\mathbf{\hat{P}\hat{C^{\prime}}}}$$ = VmΘmWm′. Here, Vm (n × m) and Wm (q × m) contain the first m columns of V and W, respectively, and Θm is the first m × m diagonal block of Θ. For an identification purpose, the condition
$$\frac{1}{n}{\mathbf{P^{\prime}P}} = {\mathbf{I}}_{m}$$
(6)
is introduced. Then, we can choose $${\hat{\mathbf{P}}}$$ and $${\hat{\mathbf{C}}}$$ as
$${\hat{\mathbf{P}}} = n^{ 1/ 2} {\mathbf{V}}_{m} = n^{ 1/ 2} {\mathbf{XW}}_{m} {\varvec{\Theta}}_{m}^{{-1}} \;{\text{and}}\;\,{\hat{\mathbf{C}}} = n^{{- 1/ 2}} {\mathbf{W}}_{m} {\varvec{\Theta}}_{m}$$
(7)
since n1/2Vm= n1/2VΘWWmΘ m −1  = n1/2XWmΘ m −1 can be substituted for P in (6), and (7) leads to $${\mathbf{\hat{P}\hat{C}^{\prime}}}$$ = VmΘmWm′ (e.g., Adachi 2016), though rotational indeterminacy remains as explained in Sect. 2.3.

2.2 FA solution

The FA model (2) leads to the minimization of the following least squares function
$$f_{\text{FA}} ({\mathbf{F}},{\mathbf{A}},{\mathbf{U}},{\varvec{\Psi}}{\mathbf{|}}{\mathbf{X}}) = ||{\mathbf{E}}_{\text{FA}} ||^{ 2} = ||{\mathbf{X}}-({\mathbf{FA^{\prime}}} + {\mathbf{U}}{\varvec{\Psi}})||^{ 2}$$
(8)
over F, A, U, and Ψ subject to (3). Though the solution cannot be given explicitly and must be obtained through iterative algorithms, the optimal A and Ψ are known to satisfy
$${\hat{\mathbf{A}}} = {\mathbf{S}}_{{\text{X}\hat{\text{F}}}} \;{\text{and}}\;\widehat{{\varvec{\Psi}}} = {\text{ diag}}({\mathbf{S}}_{{\text{X}\hat{\text{U}}}} ),$$
(9)
with $${\mathbf{S}}_{{\text{X}\hat{\text{F}}}}$$ = n−1X$${\hat{\mathbf{F}}}$$, $${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$ = n−1X$${\hat{\mathbf{U}}}$$, and diag($${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$) denotes the diagonal matrix containing the main diagonal of $${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$ (e.g., Adachi and Trendafilov 2018; Stegeman 2016). The optimal factor score matrices $${\hat{\mathbf{F}}}$$ and $${\hat{\mathbf{U}}}$$ are undetermined, but $${\mathbf{S}}_{{\text{X}\hat{\text{F}}}}$$ and $${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$ can be uniquely determined (Adachi and Trendafilov 2018). Using (3), the function (8), in which $${\hat{\mathbf{F}}}$$ and $${\hat{\mathbf{U}}}$$ are substituted, can be expressed as fFA($${\hat{\mathbf{F}}}$$, A, $${\hat{\mathbf{U}}}$$, Ψ|X) = ntr(SXX + AA′ + Ψ2 −2$${\mathbf{S}}_{{\text{X}\hat{\text{F}}}}$$A′− 2 $${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$Ψ). Futher, we can substitute (9) in (8) to have fFA($${\hat{\mathbf{F}}}$$, $${\hat{\mathbf{A}}}$$, $${\hat{\mathbf{U}}}$$, $$\widehat{{\varvec{\Psi}}}$$|X) = ntr(SXX+ $${\mathbf{\hat{A}\hat{A}^{\prime}}}$$+ $$\widehat{{\varvec{\Psi}}}$$2 − 2$${\mathbf{\hat{A}\hat{A}^{\prime}}}$$ − 2$${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$$$\widehat{{\varvec{\Psi}}}$$). Here, tr $${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$$$\widehat{{\varvec{\Psi}}}$$=tr diag($${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$)$$\widehat{{\varvec{\Psi}}}$$ = tr$$\widehat{{\varvec{\Psi}}}$$2, since of $$\widehat{{\varvec{\Psi}}}$$ being diagonal and (9). Thus, we have
$$f_{\text{FA}} ({\hat{\mathbf{F}}},{\hat{\mathbf{A}}},{\hat{\mathbf{U}}},\widehat{{\varvec{\Psi}}}|{\mathbf{X}}) = n{\text{tr}}({\mathbf{S}}_{\text{XX}}- {\mathbf{\hat{A}\hat{A}^{\prime}}}-\widehat{{\varvec{\Psi}}}^{2} ).$$
(10)
It is irrelevant to this paper how $${\hat{\mathbf{F}}}$$, $${\hat{\mathbf{U}}}$$, $${\mathbf{S}}_{{\text{X}\hat{\text{F}}}}$$, and $${\mathbf{S}}_{{\text{X}\hat{\text{U}}}}$$ are expressed. Also, in spite of (8) being a data-fitting problem, the solution {$${\hat{\mathbf{A}}}$$, $$\widehat{{\varvec{\Psi}}}$$} can be obtained, if only covariance matrix SXX is given, but the original data set X is unavailable. This property is detailed in Adachi (2012, Sect. 2) and Adachi and Trendafilov (2018, Sect. 2).

2.3 Rotational indeterminacy

The rotational indeterminacy of the loading matrices in PCA and FA affect some of the results to be presented in the next section.

Let TP and TF be m × m arbitrary orthogonal matrices, with
$${\mathbf{T}}_{\text{P}}^{\prime } {\mathbf{T}}_{\text{P}} = {\mathbf{T}}_{\text{P}} {\mathbf{T}}_{\text{P}}^{\prime } = {\mathbf{T}}_{\text{F}}^{\prime } {\mathbf{T}}_{\text{F}} = {\mathbf{T}}_{\text{F}} {\mathbf{T}}_{\text{F}}^{\prime } = {\mathbf{I}}_{m} .$$
(11)
Since PTP and FTF can be substituted for P in (6) and for F in (3), respectively, with PC′ = PTPTPC′ and FA′ = FTFTFA′, PCA and FA solutions have the rotational indeterminacy: score and loading matrices can be rotated without affecting the PCA and FA fit. This property is exploited to obtain the unique orthogonal TP or TF which make the rotated loading matrices $${\hat{\mathbf{P}}}$$TP or $${\hat{\mathbf{A}}}$$TF most interpretable in some sense. This procedure is called orthogonal rotation (e.g., Adachi 2016; Mulaik 2010). As the orthogonal matrices TP and TF obtained by rotation in context of PCA and FA are generally different, we have distinguished between them by attaching subscripts to T. For the same reason, the subsctipts are attached to N appearing below.
The constraint (6) and n−1FF = Im in (3) are relaxed to n−1diag(FF) = Im and n−1diag(PP) = Im, respectively, under an oblique solution. Then, PCA and FA have the following rotational indeterminacy: if the m × m nonsingular matrices NP and NF satisfy
$$\frac{1}{n}{\text{diag}}({\mathbf{N}}_{\text{P}}^{\prime } {\mathbf{P^{\prime}PN}}_{\text{P}} ) \, = {\mathbf{I}}_{m} \quad{\text{and}}\quad\frac{1}{n}{\text{diag}}({\mathbf{N}}_{\text{F}}^{\prime } {\mathbf{F^{\prime}FN}}_{\text{F}} ) = {\mathbf{I}}_{m} ,$$
(12)
then PNP and FNF can be substituted for P and F in the above relaxed constraints, with PC′ = PNPN P −1 C′ and FA′ = FNFN F −1 A′. These are called oblique rotations and are also used to obtain NP or NF satisfying (12), such that $${\hat{\mathbf{C}}}$$NP−1 and $${\hat{\mathbf{A}}}$$NF−1 are most interpretable (e.g., Adachi 2016; Mulaik 2010).

3 Results

In this section, we present four theorems, which help to contrast the PCA and FA solutions minimizing (5) subject to (6) and minimizing (8) under (3), respectively.

We start with introducing Trendafilov et al.’s (2013) FA-like PCA which is utilized in the proof for the following theorems. In the FA-like PCA, the PCA solution $${\mathbf{\hat{P}\hat{C^{\prime}}}}$$ minimizing (5) is substituted for FA′ in the FA loss function (8) as
$$f_{\text{FA}} ({\hat{\mathbf{P}}},{\hat{\mathbf{C}}},{\mathbf{U}}^{*} ,{\varvec{\Psi}}^{*} |{\mathbf{X}}) \, = ||{\mathbf{X}}-({\mathbf{\hat{P}\hat{C^{\prime}}}} + {\mathbf{U}}^{*} {\varvec{\Psi}}^{*} )||^{ 2} .$$
(13)
Then, it is minimized over U* and Ψ* subject to $${\hat{\mathbf{P}}}$$U* = mOq and n−1U*U* = Iq, with Ψ* being diagonal.

Theorem 1

FA model (2) always fits better for a given data matrixX(n × q) than PCA (1):
$$||{\hat{\mathbf{E}}}_{\text{PC}} ||^{ 2} \ge ||{\hat{\mathbf{E}}}_{{\text{FA}}} ||^{ 2}$$
(14)

Proof

Obviously, ||X$${\mathbf{\hat{P}\hat{C^{\prime}}}}$$||2 ≥ ||X$${\mathbf{\hat{P}\hat{C^{\prime}}}}$$||2 − ||U*Ψ*||2 holds true. Further, the FA-like PCA optimality condition Ψ*= n−1diag(U*X) (Trendafilov, et al’s, 2013, 5.2.1) shows ||X$${\mathbf{\hat{P}\hat{C^{\prime}}}}$$||2 − ||U*Ψ*||2 = ||X − ($${\mathbf{\hat{P}\hat{C^{\prime}}}}$$ + U*Ψ*)||2. Thus, the value of PCA loss function (5) cannot be less than the (13) value:
$$||{\mathbf{X}} - {\mathbf{\hat{P}\hat{C^{\prime}}}}||^{ 2} \ge ||{\mathbf{X}}-({\mathbf{\hat{P}\hat{C^{\prime}}}} + {\mathbf{U}}^{*} {\varvec{\Psi}}^{*} )||^{ 2}$$
(15)
Now, $${\mathbf{\hat{P}\hat{C^{\prime}}}}$$, U*, and Ψ* in the right-hand side of (15) can be replaced by the corresponding FA solutions $${\mathbf{\hat{F}\hat{A}^{\prime}}}$$,$${\hat{\mathbf{U}}}$$, and $$\hat{\varvec{\Psi} }$$. The function value after this substitution cannot exceed the right-hand side of (15):
$$||{\mathbf{X}}-({\mathbf{\hat{P}\hat{C^{\prime}}}} + {\mathbf{U}}^{*} {\varvec{\Psi}}^{*} )||^{ 2} \ge ||{\mathbf{X}}-({\mathbf{\hat{F}\hat{A}^{\prime}}} + {\hat{\mathbf{U}}}\widehat{{\varvec{\Psi}}})||^{2} ,$$
(16)
because the FA loss function ||X − (FA′ + )||2 is minimized for FA′ + UΨ = $${\mathbf{\hat{F}\hat{A}^{\prime}}}$$ + $${\hat{\mathbf{U}}}\widehat{{\varvec{\Psi}}}$$. Thus, the inequalities (15) and (16) lead to (14).$$\square$$

The theorem suggests that for better fit to the data, FA should be preferred over PCA.

The next theorem concerns the largeness of squared loadings and common parts:

Theorem 2

For a givenX, the sum of squared PCA loadings is always equal to or larger than the sum of squared FA ones [under constraints (3) and (6)]:
$$||{\hat{\mathbf{C}}}||^{ 2} \ge ||{\hat{\mathbf{A}}}||^{ 2} ,$$
(17)
$$||{\hat{\mathbf{C}}\mathbf{T}}_{{\text{P}}} ||^{ 2} \ge ||{\hat{\mathbf{A}}\mathbf{T}}_{{\text{F}}} ||^{ 2},$$
(18)
withTPandTFsatisfying (11). This implies that the common part in PCA is always equal to or larger than that one for FA:
$$||{\mathbf{\hat{P}\hat{C^{\prime}}}}||^{ 2} \; \ge \;||{\mathbf{\hat{F}\hat{A}^{\prime}}}||^{ 2} ,$$
(19)
$$||{\hat{\mathbf{P}}\mathbf{N}}_{{\text{P}}} {\mathbf{N}}_{{\text{P}}}^{ - 1} {\mathbf{\hat{C^{\prime}}}}||^{ 2} \; \ge \;||{\hat{\mathbf{F}}\mathbf{N}}_{{\text{F}}} {\mathbf{N}}_{{\text{F}}}^{ - 1} {\mathbf{\hat{A}^{\prime}}}||^{ 2} ,$$
(20)
withNPandNFarbitrary nonsingular m × m matrices.

Proof

The PCA loss function (5) is expanded as fPC(P,C|X) = ||X||2 − 2trXPC′ +||PC′||2. By substituting $${\mathbf{\hat{P}\hat{C^{\prime}}}}$$ for PC′ in fPC(P,C|X) and using (6) and (7), we have
$$f_{\text{PC}} ({\hat{\mathbf{P}}},{\hat{\mathbf{C}}}|{\mathbf{X}}) \, = n({\text{tr}}{\mathbf{S}}_{\text{XX}} -{\text{tr}}{\mathbf{\hat{C}\hat{C}^{\prime}}}).$$
(21)
Now, let us consider fPC($${\hat{\mathbf{F}}}$$, $${\hat{\mathbf{A}}}$$|X) = ||X$${\mathbf{\hat{F}\hat{A}^{\prime}}}$$||2, i.e., the PCA function (5) with the FA solution $${\mathbf{\hat{F}\hat{A}^{\prime}}}$$ substituted for PC′. Using (3) and (9), fPC($${\hat{\mathbf{F}}}$$,$${\hat{\mathbf{A}}}$$|X) can be rewritten as n(trSXX − 2tr $${\mathbf{S}}_{{\text{X}\hat{\text{F}}}}$$$${\mathbf{\hat{A}^{\prime}}}$$ + tr $${\mathbf{\hat{A}\hat{A}^{\prime}}}$$) = n(trSXX − tr $${\mathbf{\hat{A}\hat{A}^{\prime}}}$$). Clearly, fPC($${\hat{\mathbf{F}}}$$,$${\hat{\mathbf{A}}}$$|X)  =  n(trSXX − tr $${\mathbf{\hat{A}\hat{A}^{\prime}}}$$) cannot be lower than (21), since the PCA solution is known as the best low-rank approximation (Eckart and Young 1936). Thus, we finally have
$${\text{tr}}{\mathbf{S}}_{\text{XX}} - {\text{tr}}{\mathbf{\hat{C}\hat{C}^{\prime}}} \le {\text{tr}}{\mathbf{S}}_{\text{XX}} - {\text{tr}}{\mathbf{\hat{A}\hat{A}^{\prime}}},$$
(22)
which gives (17). It also implies (18), because of the orthogonality property (11).

Inequality (17) leads to (19), since ||$${\mathbf{\hat{F}\hat{A}^{\prime}}}$$||2 = ntr$${\mathbf{\hat{A}\hat{A}^{\prime}}}$$ = n||$${\hat{\mathbf{A}}}$$||2 and ||$${\mathbf{\hat{P}\hat{C^{\prime}}}}$$||2 = ntr$${\mathbf{\hat{C}\hat{C}^{\prime}}}$$ = n||$${\hat{\mathbf{C}}}$$||2 follow from (3) and (6), respectively. Obviously, (19) leads to (20).$$\square$$

This theorem shows that the common part in PCA is larger than in FA. Inequality (20) shows that the common part is larger in PCA solutions even after oblique rotation. On the other hand, TP and TF in (18) cannot be replaced by NP and NF. That is, after the oblique rotation, ||$${\mathbf{\hat{C}N^{\prime}}}_{{\text{P}}}^{ - 1}$$||2 ≥ ||$${\mathbf{\hat{A}N^{\prime}}}_{{\text{F}}}^{ - 1}$$||2 does not necessarily hold.

Though Theorem 2 discusses the lower limits of the magnitudes of the squared loadings and common part in PCA solutions, the next one shows their upper limits.

Theorem 3

For a given X , the sum of the squared PCA loadings cannot exceed the sum of the squared loadings and unique variances in the FA solution:

$$||{\hat{\mathbf{C}}}||^{ 2} \le ||{\hat{\mathbf{A}}}||^{ 2} + ||\widehat{{\varvec{\Psi}}}||^{ 2} ,$$
(23)
$$||{\hat{\mathbf{C}}\mathbf{T}}_{\text{P}} ||^{ 2} \le ||{\hat{\mathbf{A}}\mathbf{T}}_{{\text{F}}} ||^{ 2} + ||\widehat{{\varvec{\Psi}}}||^{ 2} ,$$
(24)
withTPandTFsatisfying (11). It implies that the squared norm of the PCA common part cannot exceed the FA model part one:
$$||{\mathbf{\hat{P}\hat{C^{\prime}}}}||^{ 2} \le \;||{\mathbf{\hat{F}\hat{A}^{\prime}}} + {\hat{\mathbf{U}}}\widehat{{\varvec{\Psi}}}||^{ 2} ,$$
(25)
$$||{\hat{\mathbf{P}}\mathbf{N}}_{{\text{P}}} {\mathbf{N}}_{{\text{P}}}^{ - 1} {\mathbf{\hat{C^{\prime}}}}||^{ 2} \le \;||{\hat{\mathbf{F}}\mathbf{N}}_{{\text{F}}} {\mathbf{N}}_{{\text{F}}}^{ - 1} {\mathbf{\hat{A}^{\prime}}} + {\hat{\mathbf{U}}}\widehat{{\varvec{\Psi}}}||^{ 2} .$$
(26)
withNPandNFarbitrary nonsingular m × m matrices.

Proof

(10) and (21) are rewritten as ||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2 = n(trSXX − tr$${\mathbf{\hat{A}\hat{A}^{\prime}}}$$ − tr$$\widehat{{\varvec{\Psi}}}^{2}$$) and ||$${\hat{\mathbf{E}}}_{\text{PC}}$$||2 =  n(trSXX − tr$${\mathbf{\hat{C}\hat{C}^{\prime}}}$$), respectively. Using them in (14), we have (23) and it leads to (24), because of (11). Inequality (23) leads to (25) and thus (26), since ||$${\mathbf{\hat{F}\hat{A}^{\prime}}}$$ + $${\hat{\mathbf{U}}}\widehat{{\varvec{\Psi}}}$$||2 = n||$${\hat{\mathbf{A}}}$$||2 + n||$$\widehat{{\varvec{\Psi}}}$$||2 and ||$${\mathbf{\hat{P}\hat{C^{\prime}}}}$$||2 = n||$${\hat{\mathbf{C}}}$$||2 follow from (3) and (6), respectively.$$\square$$

Inequality (26) shows that the model part is larger in FA even after oblique rotation. However, after the rotation, the sum of squared loadings in PCA is not necessarily less than the sum of squared loadings and unique variances in FA, since TP and TF in (24) cannot be replaced by NP and NF.

The following theorem concerns the magnitudes of the unique variances in FA:

Theorem 4

For a given X , the sum of the unique variances in FA is larger than the average of squared residuals for PCA minus the average for FA:
$$||\widehat{{\varvec{\Psi}}}||^{ 2} \ge \frac{1}{n}||{\hat{\mathbf{E}}}_{{\text{PC}}} ||^{ 2} - \frac{1}{n}||{\hat{\mathbf{E}}}_{{\text{FA}}} ||^{ 2} .$$
(27)

Proof

We can rewrite (10) as n−1||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2 = tr(SXX$${\mathbf{\hat{A}\hat{A}^{\prime}}}$$$$\widehat{{\varvec{\Psi}}}^{2}$$), which leads to trSXX − tr$${\mathbf{\hat{A}\hat{A}^{\prime}}}$$ = ||$$\widehat{{\varvec{\Psi}}}^{2}$$|| + n−1||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2. We can also rewrite (21) as trSXX − tr$${\mathbf{\hat{C}\hat{C}^{\prime}}}$$ = n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2. Their use in (22) we have n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2 ≤ ||$$\widehat{{\varvec{\Psi}}}^{2}$$|| + n−1||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2, which can be rewritten as (27).$$\square$$

The mathematical results presented so far always hold. However, Theorems 2 and 4 also suggest the following, which are likely but not necessarily holding in every occasion:
1. [S1]

The absolute value of each PCA loading before/after orthogonal rotation tends to be greater than the absolute one of the corresponding FA loading (though exceptions can also exist), which is suggested by (17) and (18).

2. [S2]

If ||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2 is small enough, ||$$\widehat{{\varvec{\Psi}}}$$||2 tends to be larger than n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2. The unique variance $$\widehat{{{\varPsi}}}_{j}^{2}$$ for variable j tends to be greater than the corresponding PCA residual variance n−1||$${\hat{\mathbf{e}}}_{j}^{\text{PC}}$$||2, where $${\hat{\mathbf{e}}}_{j}^{\text{PC}}$$ and $$\widehat{{{\varPsi}}}_{j}^{2}$$ are the j-th column and diagonal element of $${\hat{\mathbf{E}}}_{{\text{PC}}}$$ and $$\widehat{{\varvec{\Psi}}}$$, respectively. Note, n−1||$${\hat{\mathbf{e}}}_{j}^{\text{PC}}$$||2 is a variance, because $${\hat{\mathbf{E}}}_{{\text{PC}}}$$= X$${\mathbf{\hat{P}\hat{C^{\prime}}}}$$ is column-centered, because X is column-centered and $${\hat{\mathbf{P}}}$$ is also so as found in (7).

These features are numerically assessed in the following sections.

4 Illustration

In this section, two real data examples are used in order to illustrate the theorems in the last section as well as [S1] and [S2]. For every data set, we carry out PCA and FA, together with two classic random FA (RFA) procedures. One of the two RFA procedures is the least squares RFA (LS-RFA) with loss function ||SXX − (AA′ + Ψ2)||2. The other one is the maximum likelihood RFA (ML-RFA), whose loss function is trSXX(AA′ + Ψ2)−1 − log|AA′ + Ψ2| following from certain normality assumptions, with |·| denoting the determinant of its argument. As the theorems in the last section are derived from the formulation of FA with (2), they are not guaranteed to hold in RFA with (4). Thus, it is of interest to see to what extent the RFA solutions follow the inequalities in the theorems. Of course, Theorem 1 is not considered because the error matrix $${\hat{\mathbf{E}}}_{{\text{FA}}}$$ is not relevant to RFA with (4). The resulting loadings in LS- and ML-RFA are expressed as $${\hat{\mathbf{A}}}_{\text{L}}$$ and $${\hat{\mathbf{A}}}_{\text{M}}$$, respectively, with the corresponding unique variances matrices denoted as $$\widehat{{\varvec{\Psi}}}_{\text{L}}^{2}$$ and $$\widehat{{\varvec{\Psi}}}_{\text{M}}^{2}$$. The loading matrices in all procedures are rotated by the orthogonal varimax rotation (Kaiser, 1958). We denote the rotated PCA, FA, LS-RFA, and ML-RFA loading matrices as $${\hat{\mathbf{C}}\mathbf{T}}_{{\text{P}}}$$, $${\hat{\mathbf{A}}\mathbf{T}}_{{\text{F}}}$$, $${\hat{\mathbf{A}}}_{\text{L}} {\mathbf{T}}_{\text{L}}$$, and $${\hat{\mathbf{A}}}_{\text{M}} {\mathbf{T}}_{\text{M}}$$, respectively.

The first example is the standardized version of the test score data with q = 5 courses for n = 20 examinees (Adachi and Trendafilov 2018, Table 1), which is a part of Tanaka and Tarumi’s (1995) data. Table 1 shows the solutions with m = 2. First, let us consider the bottom parts in the left two panels of PCA and FA. Those parts illustrate Theorems 14, as listed below:
Table 1

The solutions of PCA, FA, LS-RFA, and ML-RFA for a part of Tanaka and Tarumi’s (1995) test score data (Adachi and Trendafiov 2018), with Res standing for residual variances and the PCA loadings boldfaced whose absolute values are larger than the FA counterparts

PCA

FA

LS-RFA

ML-RFA

$${\hat{\mathbf{C}}\mathbf{T}}_{{\text{P}}}$$

Res

$${\hat{\mathbf{A}}\mathbf{T}}_{{\text{F}}}$$

$$\widehat{{\varvec{\Psi}}}^{2}$$

Res

$${\hat{\mathbf{A}}}_{\text{L}} {\mathbf{T}}_{\text{L}}$$

$$\widehat{{\varvec{\Psi}}}_{\text{L}}^{2}$$

$$\hat{\mathbf{A}}_{\text{M}}{\mathbf{T}}_{\text{M}}$$

$$\widehat{{\varvec{\Psi}}}_{\text{M}}^{2}$$

Japanese

0.51

0.62

0.13

0.38

0.60

0.50

0.001

0.38

0.60

0.50

0.37

0.61

0.50

English

0.25

0.81

0.08

0.21

0.76

0.37

0.002

0.19

0.77

0.37

0.21

0.76

0.38

Sociala

− 0.02

0.86

0.07

0.03

0.65

0.58

0.002

0.03

0.64

0.59

0.02

0.65

0.58

Mathematics

0.80

0.26

0.08

0.59

0.34

0.53

0.003

0.58

0.35

0.54

0.58

0.34

0.55

Sciences

0.90

0.02

0.03

0.89

0.10

0.19

0.001

0.91

0.10

0.17

0.90

0.11

0.17

Sum of

Squares

$$\left\| {{\hat{\mathbf{C}}}} \right\|^{2}$$

$$n^{ - 1} \left\| {{\hat{\mathbf{E}}}_{{\text{PC}}} } \right\|^{2}$$

$$\left\| {{\hat{\mathbf{A}}}} \right\|^{2}$$

$$\left\| {\widehat{{\varvec{\Psi}}}} \right\|^{2}$$

$$n^{ - 1} \left\| {{\hat{\mathbf{E}}}_{{\text{FA}}} } \right\|^{2}$$

$$\left\| {{\hat{\mathbf{A}}}_{\text{L}} } \right\|^{2}$$

$$\left\| {\widehat{{\varvec{\Psi}}}_{\text{L}} } \right\|^{2}$$

$$\left\| {{\hat{\mathbf{A}}}_{\text{M}} } \right\|^{2}$$

$$\left\| {\widehat{{\varvec{\Psi}}}_{\text{M}} } \right\|^{2}$$

3.62

1.38

2.81

2.18

0.008

2.83

2.17

2.82

2.18

aSocial studies

• [Theorem 1] n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2 = 1.38 > n−1||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2 = 0.008;

• [Theorem 2] n−1||$${\mathbf{\hat{P}\hat{C^{\prime}}}}$$||2 = ||$${\hat{\mathbf{C}}}$$||2 = 3.62 ≥ n−1||$${\mathbf{\hat{F}\hat{A}^{\prime}}}$$||2 = ||$${\hat{\mathbf{A}}}$$||2 = 2.81;

• [Theorem 3] ||$${\hat{\mathbf{C}}}$$||2 = 3.62 ≤||$${\hat{\mathbf{A}}}$$||2 + ||$$\widehat{{\varvec{\Psi}}}$$||2 = 2.81 + 2.18 = 4.99;

• [Theorem 4] ||$$\widehat{{\varvec{\Psi}}}$$||2 = 2.18 ≥ n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2n−1||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2 = 1.38 − 0.008 = 1.37.

Next, we consider the loadings, residuals, and unique variance in the left two panels in Table 1. Seven PCA loadings among all ten are bold-faced. Their absolute values are larger than their FA counterparts, which supports the suggestion by [S1]. We also find that ||$${\hat{\mathbf{E}}}_{{\text{FA}}}$$||2 is close to zero and ||$$\widehat{{\varvec{\Psi}}}$$||2 = 2.18 > n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2 = 1.38 with all unique variances in FA larger than the corresponding “Res” (residual variances) in PCA, i.e., as suggested by [S2].

Now, we consider the right three panels. The panels for RFA do not have column “Res”, since $$\widehat{{\varvec{\Psi}}}_{\text{L}}^{2}$$= diag(SXX$${\hat{\mathbf{A}}}_{\text{L}} {\mathbf{\hat{A}^{\prime}}}_{\text{L}}$$) and $$\widehat{{\varvec{\Psi}}}_{\text{M}}^{2}$$= diag(SXX$${\hat{\mathbf{A}}}_{\text{M}} {\mathbf{\hat{A}^{\prime}}}_{\text{M}}$$): the residual variances for variables are always estimated as zero. Besides “Res”, all three FA solutions (loadings and unique variances) are almost identical. Thus, the RFA solutions show the same relationships to PCA ones as the FA solutions.

The second example is the Mullen’s (1939) data set with q = 8 physical variables for n = 305 girls. The inter-variable correlation matrix is also available from Harman (1976, p. 22). Table 2 presents the m = 2 solutions. Again, we can empirically confirm the theoretical results established in the last section. The findings are pretty similar to those observed in the first example and are summarized as follows:
• [O1] The absolute values of the PCA loadings tend to be greater than those of FA.

• [O2] ||$$\widehat{{\varvec{\Psi}}}$$||2 tends to be larger than n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2.

• [O3] The unique variance $$\widehat{{{\varPsi}}}_{j}^{2}$$ for variable j tends to be greater than the variance of PCA residuals n−1||$${\hat{\mathbf{e}}}_{j}^{\text{PC}}$$||2 for j

• [O4] FA and RFA solutions are broadly equivalent.

• [O5] The inequalities in Theorems 24 also hold in RFA solutions.

Here [O1] corresponds to [S1] from Sect. 3, while [S2] is divided into [O2] and [O3].
Table 2

The solutions of PCA, FA, LS-RFA, and ML-RFA for a part of Mullen’s (1939) physical variables data, with Res standing for residual variances and the PCA loadings boldfaced whose absolute values are larger than the FA counterparts

PCA

FA

LS-RFA

ML-RFA

$${\hat{\mathbf{C}}\mathbf{T}}_{{\text{P}}}$$

Res

$${\hat{\mathbf{A}}\mathbf{T}}_{{\text{F}}}$$

$$\widehat{{\varvec{\Psi}}}^{2}$$

Res

$${\hat{\mathbf{A}}}_{\text{L}} {\mathbf{T}}_{\text{L}}$$

$$\widehat{{\varvec{\Psi}}}_{\text{L}}^{2}$$

$$\hat{\mathbf{A}}_{\text{M}}{\mathbf{T}}_{\text{M}}$$

$$\widehat{{\varvec{\Psi}}}_{\text{M}}^{2}$$

Height

0.24

0.91

0.12

0.26

0.88

0.16

0.005

0.25

0.88

0.16

0.27

0.87

0.17

Arm span

0.18

0.93

0.10

0.17

0.93

0.10

0.006

0.18

0.93

0.11

0.16

0.93

0.11

Forearma

0.14

0.92

0.13

0.16

0.89

0.17

0.002

0.16

0.89

0.18

0.16

0.90

0.17

Lower lega

0.21

0.90

0.14

0.23

0.87

0.19

0.005

0.22

0.87

0.19

0.23

0.86

0.20

Weight

0.88

0.27

0.15

0.91

0.26

0.10

0.002

0.91

0.26

0.11

0.92

0.25

0.09

Bitrochantericb

0.84

0.20

0.26

0.77

0.21

0.36

0.002

0.77

0.21

0.36

0.77

0.21

0.36

Chest girth

0.84

0.12

0.28

0.75

0.15

0.41

0.002

0.75

0.15

0.42

0.75

0.15

0.42

Chest width

0.74

0.27

0.38

0.64

0.28

0.52

0.002

0.64

0.28

0.51

0.62

0.29

0.54

Sum of squares

$$\left\| {{\hat{\mathbf{C}}}} \right\|^{2}$$

$$n^{ - 1} \left\| {{\hat{\mathbf{E}}}_{{\text{PC}}} } \right\|^{2}$$

$$\left\| {{\hat{\mathbf{A}}}} \right\|^{2}$$

$$\left\| {\widehat{{\varvec{\Psi}}}} \right\|^{2}$$

$$n^{ - 1} \left\| {{\hat{\mathbf{E}}}_{{\text{FA}}} } \right\|^{2}$$

$$\left\| {{\hat{\mathbf{A}}}_{\text{L}} } \right\|^{2}$$

$$\left\| {\widehat{{\varvec{\Psi}}}_{\text{L}} } \right\|^{2}$$

$$\left\| {{\hat{\mathbf{A}}}_{\text{M}} } \right\|^{2}$$

$$\left\| {\widehat{{\varvec{\Psi}}}_{\text{M}} } \right\|^{2}$$

6.44

1.56

5.97

2.01

0.025

5.96

2.04

5.95

5.95

aLength

bDiameters

5 Supplementary simulation studies

In this section, we explore whether [O1]–[O5] from the last section are fulfilled for most of the data sets in practice. It is not efficient to make such assessments with real data sets. We thus resort to using simulated data. Indeed, the correctness of [O4] was demonstrated in the past simulation studies in Adachi (2012, 2015) and Stegeman (2016). Adachi (2012) and Stegeman (2016) have indirectly shown [O4]. This has been assessed without direct comparison of FA and RFA solutions. Instead, it has been shown that the true parameters are recovered well both by FA and RFA. Here, we assess [O4] with direct comparisons.

We simulate two types of data sets: one is the PCA-modeled data set synthesized with (1) and (6), while the other type is the FA-modeled data set following (2) and (3). For each of m = 1, …, 5, we synthesize a data set. The steps in the synthesis procedure are described as follows, by abbreviating the uniform distribution as U and its discrete version as DU:
1. 1.

Choose q from DU(4m, 8m) and n from DU(8q, 12q), with DU(4m, 8m) defined for the integers within the range [4m, 8m].

2. 2.

Draw each element of P, F, U, and E (n × q) from the standard normal distribution.

3. 3.

Draw each element of q × m matrix A0 from U(− 1, 1) and each diagonal element of q × q diagonal matrix Ψ0 from U(0.1, 1), with U(− 1, 1) defined for the real values within [− 1, 1].

4. 4.

Set C = αA0 and EPC = E so that ||PC′||2/(||PC′||2 +||EPC||2) = 0.75 with α > 0.

5. 5.

Set A = βA0, Ψ = γΨ0, and EFA = E so that ||FA′||2/SST = 0.55 and ||||2/SST = 0.42 with SST = ||FA′||2 + ||||2 + ||EFA||2, β > 0, and γ > 0.

6. 6.

Generate X with (1) and another one with (2), followed by the column standardization.

The procedures were repeated 250 times to provide 250 (replications) × 5 (m) × 2 (PCA-FA) = 2500 data sets. They are analyzed as in the last section.
We first assess how [O4] (the equivalences of solutions between FA procedures) is fulfilled, making use of the averaged absolute difference (AAD) of the elements between the resulting matrices. It is defined as AAD($${\hat{\mathbf{A}}}$$, $${\hat{\mathbf{A}}}_{\text{L}}$$) = (qm)−1||$${\hat{\mathbf{A}}}$$$${\hat{\mathbf{A}}}_{\text{L}}$$R||l1 for $${\hat{\mathbf{A}}}$$ and $${\hat{\mathbf{A}}}_{\text{L}}$$, where ||·||l1 denotes the l1 matrix norm, and R is the m × m orthonormal matrix minimizing ||$${\hat{\mathbf{A}}}$$$${\hat{\mathbf{A}}}_{\text{L}}$$R||2 = ||$${\hat{\mathbf{A}}}$$R ′  −  $${\hat{\mathbf{A}}}_{\text{L}}$$||2, i.e., performing Procrustes rotation (e.g., Gower and Dijksterhuis 2004). This is required, since the loading matrices have rotational indeterminacy. The averages and 95 percentiles of the AAD values among $${\hat{\mathbf{A}}}$$, $${\hat{\mathbf{A}}}_{\text{L}}$$, $${\hat{\mathbf{A}}}_{\text{M}}$$, and $${\hat{\mathbf{P}}}$$ are presented for each of the two types of data in Table 3, where the cells concerning PCA solutions are colored gray. In the other cells, we can find that the FA loadings are broadly equivalent to RFA ones, with the averages less than 0.010 and even the highest 95 percentile 0.012. These statistics are rather higher within the RFA procedures than between them and FA. In contrast to the equivalence between FA and RFA loadings, it is found that the PCA ones differ from FA and RFA loadings, as the averaged AAD between FA and PCA are about seven to ten times of the values within FA solutions. Table 4 shows the statistics of the AAD values for unique variances, with AAD($$\widehat{{\varvec{\Psi}}}$$, $$\widehat{{\varvec{\Psi}}}_{\text{L}}$$) = q−1||$$\widehat{{\varvec{\Psi}}}$$ − $$\widehat{{\varvec{\Psi}}}_{\text{L}}$$||l1. Clearly, the FA and RFA solutions are pretty close.
Table 3

Table 4

Averages and 95 percentiles of AAD values for unique variances

PC-modeled data

FA-modeled data

LS-RFA

ML-RFA

LS-RFA

ML-RFA

FA

Ave

0.006

0.006

0.010

0.010

95%

0.011

0.010

0.016

0.017

LS-RFA

Ave

0.007

0.013

95%

0.017

0.026

The RFA solutions were found to satisfy the inequalities in Theorems 24 for every simulated data set, i.e., [O5] is also verified.

Next, we consider [O2], i.e., that ||$$\widehat{{\varvec{\Psi}}}$$||2 tends to be larger than n−1||$${\hat{\mathbf{E}}}_{{\text{PC}}}$$||2. It turns out that this is fulfilled for every data set, in the FA solutions, and also in the LS- and ML-RFA solutions.

To assess [O3], i.e., that $$\widehat{{{\varPsi}}}_{j}^{2}$$ tends to be greater than n−1||$${\hat{\mathbf{e}}}_{j}^{\text{PC}}$$||2, for each data set we count U/q, where U is the number of variables for which n−1||$${\hat{\mathbf{e}}}_{j}^{\text{PC}}$$||2 > $$\widehat{{{\varPsi}}}_{j}^{2}$$: a deviation from [O3] occurs. The resulting statistics are presented in Table 5. The average and 95 percentiles are found to be substantially smaller than 0.5. It allows us to conclude that n−1||$${\hat{\mathbf{e}}}_{j}^{\text{PC}}$$||2 tends to be smaller than $$\widehat{{{\varPsi}}}_{j}^{2}$$.
Table 5

Averages and 95 percentiles of the proportions of the variables for which the squared sums of PCA residuals are greater than the FA unique variances

PCA-modeled data

FA-modeled data

FA

LS-RFA

ML-RFA

FA

LS-RFA

ML-RFA

Ave

0.10

0.07

0.07

0.21

0.18

0.19

95%

0.29

0.25

0.25

0.35

0.30

0.33

Now, let us consider [O1]. The proportion L/(qm) measuring the deviation from [O1] is recorded for each data set. Here, L is the number of the PCA loadings whose absolute values are less than their FA counterparts. The resulting statistics are presented in Table 6A. The averages are found to be substantially less than 0.5, which shows that [O1] is observed in around 30% of the data sets or less. But, the 95 percentiles in Table 5 are close to 0.5, suggesting that the solutions without feature [O1] are likely to be observed.
Table 6

Averages and 95 percentiles of the proportions of the PCA loadings whose absolute values are less than the FA counterparts

(A) After orthogonal rotation

(B) After oblique rotation

FA

LS-RFA

ML-RFA

FA

LS-RFA

ML-RFA

PCA-modeled data

Ave

0.27

0.27

0.28

0.26

0.25

0.27

95%

0.44

0.44

0.44

0.45

0.45

0.46

FA-modeled data

Ave

0.30

0.30

0.31

0.30

0.29

0.31

95%

0.46

0.47

0.46

0.46

0.46

0.47

We further assess whether the relationships (18) and (24) often occur, even if the orthonormal matrices TP and TF are replaced by nonsingular matrices subject to (12), i.e., even after oblique rotation. For this assessment, we perform Jennrich’s (2006) oblique rotation, in which ||$${\hat{\mathbf{C}}\mathbf{N}}_{{\text{P}}}$$||l1 is minimized over NP and ||$${\hat{\mathbf{A}}\mathbf{N}}_{{\text{F}}}$$||l1 is minimized over NF under (12) for PCA and FA solutions, respectively. As a result, it was found for every data set that the sum of the squares of obliquely rotated PCA loadings was greater than the sum for FA/RFA, but less than that sum plus the sum of FA/RFA unique variances.

6 Discussion

In this paper, we derive several theorems contrasting PCA and FA solutions, with both PCA and FA formulated as matrix decomposition problems. Next, the conclusions from the theorems are assessed numerically.

Theorems 1 and 2 show that FA fits better than PCA, but PCA extracts a larger common part than FA, for a certain data set. To the best of our knowledge, no research exists suggesting which technique, PCA or FA, should be used for a particular data set X. This might be due to the fact that PCA (1) had been originally considered the transformation of the observed variables (Hotelling 1933), while the classic FA (4) looks for new latent variables. However, the whole comparative story makes perfect sense, when the FA formulation (2) is introduced. Then, PCA and FA can be considered purely as data matrix decompositions, and thus, comparable. In this respect, these theorems suggest the following:
• [P] Choose PCA when a large common part is wished to be found in X.

• [F] Choose FA when X is wished to be better explained.

The conclusions from the theorems are numerically assessed in Sects. 4 and 5. The experimental findings are summarized as follows:
1. 1.

The absolute values of PCA loadings tend to be greater than the corresponding FA ones, though solutions can also occur in which this is not clearly found.

2. 2.

It is a common result that the sum of unique variances in FA is larger than the sum of residual variances of PCA. Further, the unique variance for each variable in FA tends to be greater than the corresponding residual variance in PCA.

Finding  can be restated as that the relationships of variables to components tend to be estimated as stronger than those to common factors. This suggests that in a number of cases interpreting component loadings may provide more reliable information than interpreting factor loadings. On the other hand,  impresses how important the role of the unique factors is in FA.

As the inequalities in Sect. 4 are derived from the matrix decomposition formulation of FA with (2), they are not guaranteed to hold in the classic random FA (RFA) formulated as (4). However, as found in Sects. 4 and 5, the matrix decomposition FA solutions are broadly equivalent to the RFA ones. Thus, the inequalities in the theorems are likely to hold for RFA, except Theorem 1 which does not make sense in RFA.

The above statement “FA fits better than PCA” is to be carefully reconsidered. As found in (1) and (2), the addition of the unique part to the PCA model leads to the FA model. Thus, PCA has fewer parameters than FA and can be viewed as more parsimonious. This suggests that a model selection strategy taking into account the model’s parsimony remains to be studied for prescribing whether PCA or FA is suitable for a particular data set.

References

1. Adachi, K. (2012). Some contributions to data-fitting factor analysis with empirical comparisons to covariance-fitting factor analysis. Journal of the Japanese Society of Computational Statistics, 25, 25–38.
2. Adachi, K. (2015). A matrix-intensive approach to factor analysis. Journal of the Japan Statistical Society, Japanese Issue, 44, 363–382. (in Japanese).
3. Adachi, K. (2016). Matrix-based introduction to multivariate data analysis. Singapore: Springer.
4. Adachi, & Trendafilov. (2018). Some mathematical properties of the matrix decomposition solution in factor analysis. Psychometrika, 83, 407–424.
5. Bentler, P. M., & Kano, Y. (1990). On the equivalence of factors and components. Multivariate Behavioral Research, 25, 67–74.
6. Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.
7. Gower, J. C., & Dijksterhuis, G. B. (2004). Procrustes problems. Oxford: Oxford University Press.
8. Harman, H. H. (1976). Modern factor analysis (3rd ed.). Chicago: The University of Chicago Press.
9. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Statistics, 24, 417–441.
10. Jennrich, R. I. (2006). Rotation to simple loadings using component loss function: The oblique case. Psychometrika, 71, 173–191.
11. Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York: Springer.
12. Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.
13. Mulaik, S. A. (2010). Foundations of factor analysis (2nd ed.). Boca Raton: CRC Press.
14. Mullen, F. (1939). Factors in the growth of girls seven to seventeen years of age. Ph.D Dissertation. University of Chicago, Department of Education.Google Scholar
15. Ogasawara, H. (2000). Some relationships between factors and components. Psychometrika, 65, 167–185.
16. Okamoto, M. (1969). Optimality of principal components. In P. R. Krishinaiah (Ed.), Multivariate analysis (Vol. II, pp. 673–687). New York: Academic Press.Google Scholar
17. Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazines, 2, 559–572.
18. Sato, M. (1990). Some remarks on principal component analysis as a substitute for factor analysis in monofactor cases. Journal of the Japan Statistical Society, 20, 23–31.
19. Sočan, G. (2003). The incremental value of minimum rank factor analysis. PhD Thesis, University of Groningen: Groningen.Google Scholar
20. Spearman, C. (1904). “General Intelligence”, objectively determined and measured. American Journal of Psychology, 15, 201–293.
21. Stegeman, A. (2016). A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts. Computational Statistics and Data Analysis, 99, 189–203.
22. Tanaka, Y., & Tarumi, T. (1995). Handbook for statistical analysis: Multivariate analysis (windows version). Tokyo: Kyoritsu-Shuppan. (in Japanese).Google Scholar
23. ten Berge, J. M. F., & Kiers, H. A. L. (1996). Optimality criteria for principal component analysis and generalizations. British Journal of Mathematical and Statistical Psychology, 49, 335–345.
24. Thurstone, L. L. (1935). The vectors of mind. Chicago: University if Chicago Press.
25. Trendafilov, N. T., Unkel, S., & Krzanowski, W. (2013). Exploratory factor and principal component analyses: Some new aspects. Statistics and Computing, 23, 209–220.
26. Unkel, S., & Trendafilov, N. T. (2010). Simultaneous parameter estimation in exploratory factor analysis: An expository review. International Statistical Review, 78, 363–382. 