Statistical Analysis of Management Data pp 217230  Cite as
Canonical Correlation Analysis
 1 Citations
 3.4k Downloads
Abstract
In canonical correlation analysis the objective is to relate a set of dependent or criterion variables to another set of independent or predictor variables. For example, we would like to establish the relationship between socioeconomic status and consumption by households. A set of characteristics determines socioeconomic status: education level, age, income, etc. Another set of variables measures consumption such as purchases of cars, luxury items, or food products.
Keywords
Criterion Variable Canonical Correlation Canonical Correlation Analysis Canonical Variable Unit VarianceIn canonical correlation analysis the objective is to relate a set of dependent or criterion variables to another set of independent or predictor variables. For example, we would like to establish the relationship between socioeconomic status and consumption by households. A set of characteristics determines socioeconomic status: education level, age, income, etc. Another set of variables measures consumption such as purchases of cars, luxury items, or food products.
7.1 The Method
In the figure, z and w represent two unobserved constructs that are correlated. The Xs are indicators that determine z and the Ys are indicators that determine w.
Formally, let \( \mathop{\mathbf{X}}\limits_{{N\times p}} \) be the matrix of p predictor variables (centered, i.e., taking the deviations from their means) on N observations and \( \mathop{\mathbf{Y}}\limits_{{N\times p}} \) be the matrix of q criterion variables (also centered) on the same N observations.
Therefore, the problem is to find (u, v) so as to maximize \( {\mathbf{u}}^{\prime}{{\mathbf{S}}_{\mathrm{ xy}}}\mathbf{v} \) subject to \( {\mathbf{u}}^{\prime}{{\mathbf{S}}_{\mathrm{ xx}}}\mathbf{u}={\mathbf{v}}^{\prime}{{\mathbf{S}}_{\mathrm{ yy}}}\mathbf{v}=1 \).
The eigenvalue gives the maximum squared correlation r _{zw}. This is the percentage of variance in w explained by z.
Two additional notions can be helpful in understanding the relationships between the set of x and the set of y variables: canonical loadings and redundancy analysis.
7.1.1 Canonical Loadings
7.1.2 Canonical Redundancy Analysis
Canonical redundancy measures how well the original variables y can be predicted from the canonical variables. It reflects the correlation between the z and the y variables. Redundancy is the product of the percentage variance in w explained by z and the percentage variance in y explained by w. The first component is the squared correlation μ ^{2}. The second component is the sum of squares of the canonical loadings for y.
7.2 Testing the Significance of the Canonical Correlations
Based on this expression of Λ, either as a function of the λ _{ i }s or as a function of the μ _{ i }s, it is possible to compute Bartlett’s V or Rao’s R, as discussed in Chap. 2. The degrees of freedom are not expressed in terms of the number of groups K, since this notion of group does not fit the canonical correlation model concerned with continuous variables. Instead, the equivalent is the parameter (q − 1), the number of variates on the left side, which corresponds to the number of dummy variables that would be required to determine K groups.
R is distributed approximately as an F distribution with pq degrees of freedom in the numerator and \( wt\frac{pq }{2}+1 \) degrees of freedom in the denominator. This last test (Rao’s R) is the one reported in the SAS output (rather than Bartlett’s V ).
These tests are joint tests of the significance of the q canonical correlations. However, each term in the sum containing the eigenvalues in Eq. (7.31) or (7.32) is distributed approximately as a chisquare with p + q − (2i − 1) degrees of freedom where i is the ith eigenvalue from i = 1 to q.
Consequently, the joint test that the remaining canonical correlations μ _{ 2 }, μ _{ 3 }, μ _{ 4 }, … μ _{ q } are zero is obtained by subtracting V _{ 1 } from V. V − V _{ 1 } is approximately chisquare distributed and the number of degrees of freedom is the difference between the degrees of freedom of V and those of V _{ 1 }, i.e., pq − (p + q−1). This can be continued until the last qth eigenvalue. The same computations as those detailed above with Bartlett’s V can be performed with Rao’s R .
7.3 Multiple Regression as a Special Case of Canonical Correlation Analysis
7.4 Examples

X1: This new product is hard to understand.

X2: This new product is not really easy to use.

X3: Using this new product is not very compatible with the way I do things.

…

X13: I feel positive about this new product.

X14: I really like this new product.

X15: I am favorably disposed towards this new product.
The SAS procedure “proc cancorr” runs the canonical correlation analysis. The X variables (see Fig. 7.1) are indicated in the list following the key word “VAR” and the Y variables (see Fig. 7.1) are listed after the key word “with.” Titles can be inserted for the output in single quotes after the word “title.”
The procedure “canon” is used in STATA with the X and the Y variables listed in their own sets of parentheses. The matrices “canload11” and “canload22” correspond to the canonical loadings of the X and Y variables, respectively. These canonical loadings can also be displayed using the command “estat loadings.” In addition to the canonical loadings, the correlations between the X variables and the W canonical variates, as well as the correlations between the Y variables and the Z canonical variates, are displayed. The last line of commands in Fig. 7.3 concerns the test of the significance of the individual canonical correlations. The command “canon” (without arguments) repeats the output of the prior canonical analysis requested and the “test” option is followed by the canonical correlation numbers for which testing is requested: test (1) tests for all three canonical correlations, test (2) tests for the significance of canonical correlations 2 and 3 jointly, and so on.
When the canonical correlations are listed, we see that one correlation coefficient of 0.35131 appears larger than the other two values. Therefore, we can concentrate on this larger value. These correlations correspond to the eigenvalues that give a solution to Eq. (7.21) (the canonical correlation is the square root of these eigenvalues).
Given the relationship between the λ _{ i }s and the μ _{ i }s, these eigenvalues provide the same information as the canonical correlations. The F test corresponding to Rao’s R (highlighted in grey in Fig. 7.4) indicates that the set of canonical correlations (or eigenvalues) are jointly significantly different from zero (F = 6.21 with 9 and 959.04 degrees of freedom). Then, the next row in that part of the table shows that after removing the first canonical correlation, the remaining canonical correlations are jointly statistically insignificant at the 0.05 level (F = 0.61 with 4 and 790 degrees of freedom). Therefore, we can concentrate on the results concerning the first canonical variable.
The raw (highlighted in grey in Fig. 7.4) and the standardized eigenvectors are then listed in the SAS output. The raw values are subject to variations due to the scale units of each variate and should be interpreted accordingly. It should be noted that the canonical variables are normalized to unit variance as per Eq. (7.8), and consequently, the magnitude of the coefficients that are the elements of the eigenvectors u and v are affected as well by the unit of the variates. The first eigenvector indicates that innovations that are not complex and that are easy to understand (x1, x2, and x3) are associated with greater positive responses (x13, x14, and x15).
Then, the correlation of each variate to the canonical variables (composite variable v and then w) is contained in the last tables of Fig. 7.4. This allows us to assess the strength of the relationships that form a composite (unobserved) canonical variable and of the relationship of a variable to the other composite canonical variable.
In the last section of the STATA output, the heading “Test of significance of canonical correlation 1–3” corresponds to the joint test of all the canonical correlations shown at the top of the output. The “Test of significance of canonical correlation 2–3” is the joint test of canonical correlations 2 through 3. Given that it is insignificant (F = 0.6138), we conclude that only the first canonical correlation is significant.
7.5 Assignment
Bibliography
Application Readings
 Gomez, L. F. (2009). Time to socialize: Organizational socialization structures and temporality. Journal of Business Communication, 46(2), 179–207.CrossRefGoogle Scholar
 Hosamane, M. D., & Alroaia, Y. V. (2009). Entrepreneurship and development of smallscale industries in Iran: Strategic management tools and business performance assessment. The Icfai University Journal of Entrepreneurship Development, 6(1), 27–40.Google Scholar
 Hultink, E. J., Griffin, A., Robben, H. S. J., & Hart, S. (1998). In search of generic launch strategies for new products. International Journal of Research in Marketing, 15(3), 269–285.CrossRefGoogle Scholar
 Voss, M. D., Calantone, R. J., & Keller, S. B. (2005). Internal service quality: Determinants of distribution center performance. International Journal of Physical Distribution & Logistics Management, 35(3), 161–176.CrossRefGoogle Scholar