Abstract
Correspondence analysis (CA) and nonsymmetric correspondence analysis are based on generalized singular value decomposition, and, in general, they are not equivalent. Taxicab correspondence analysis (TCA) is a \(\hbox {L}_{1}\) variant of CA, and it is based on the generalized taxicab singular value decomposition (GTSVD). Our aim is to study the taxicab variant of nonsymmetric correspondence analysis. We find that for diagonal metric matrices GTSVDs of a given data set are equivalent; from which we deduce the equivalence of TCA and taxicab nonsymmetric correspondence analysis. We also attempt to show that TCA stays as close as possible to the original correspondence matrix without calculating a dissimilarity (or similarity) measure between rows or columns. Further, we discuss some new geometric and distance aspects of TCA.
Similar content being viewed by others
References
Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New Jersey
Alon N, Naor A (2006) Approximating the cut-norm via Grothendieck’s inequality. SIAM J Comput 35:787–803
Balbi S (1998) Graphical displays in nonsymmetrical correspondence analysis. In: Blasius J, Greenacre M (eds) Visualization of categorical data. Academic Press, London, pp 297–309
Beh E (2012) Simple correspondence analysis using adjusted residuals. J Stat Plan Inference 142:965–973
Benzécri JP (1973) L’Analyse des Données. In: L’Analyse des Correspondances, vol 2, Dunod, Paris
Benzécri JP (1992) Correspondence analysis handbook. Marcel Dekker, New York
Choulakian V (2003) The optimality of the centroid method. Psychometrika 68:473–475
Choulakian V (2005) Transposition invariant principal component analysis in \(\text{ L }_{1}\) for long tailed data. Stat Probab Lett 71:23–31
Choulakian V (2006a) Taxicab correspondence analysis. Psychometrika 71:333–345
Choulakian V (2006b) \(\text{ L }_{1}\) norm projection pursuit principal component analysis. Comput Stat Data Anal 50:1441–1451
Choulakian V (2008a) Taxicab correspondence analysis of contingency tables with one heavyweight column. Psychometrika 73:309–319
Choulakian V (2008b) Multiple taxicab correspondence analysis. Adv Data Anal Classif 2:177–206
Choulakian V, de Tibeiro J (2013) Graph partitioning by correspondence analysis and taxicab correspondence analysis. J Classif 30(3):397–427
Choulakian V, Allard J, Simonetti B (2013) Multiple taxicab correspondence analysis of a survey related to health services. J Data Sci 11(2):205–229
Choulakian V, Kasparian S, Miyake M, Akama H, Makoshi N, Nakagawa M (2006) A statistical analysis of synoptic gospels. In: Viprey JR (ed) Proceedings of 8th international conference on textual data. JADT’2006, Press Universitaires de Franche-Comté, pp 281–288
Choulakian V (2013) The simple sum score statistic in taxicab correspondence analysis. In: Brentari E, Carpita M (eds) eBook, Advances in latent variables, Vita e Pensiero, Milan, Italy, ISBN:9788834325568
Fichet B (2009) Metrics of \(\text{ L }_{p}\)-type and distributional equivalence principle. Adv Data Anal Classif 3:305–314
Gabriel KR, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21:489–498
Gifi A (1990) Nonlinear multivariate analysis. Wiley, New York
Greenacre M (1984) Theory and applications of correspondence analysis. Academic Press, London
Greenacre M (2010) Correspondence analysis of raw data. Ecology 91(4):958–963
Haberman SJ (1973) The analysis of residuals in cross-classified tables. Biometrics 75:457–467
Kreyszig E (1978) Introduction to functional analysis with applications. Wiley, New York
Lauro NC, D’Ambra L et al (1984) L’analyse non symétrique des correspondances. In: Diday E et al (eds) Data analysis and informatics. Amsterdam, North Holland, pp 433–446
Le Roux B, Rouanet H (2004) Geometric data analysis. From correspondence analysis to structured data analysis. Kluwer–Springer, Dordrecht
Murtagh F (2005) Correspondence analysis and data coding with Java and R. Chapman & Hall/CRC, Boca Raton
Nishisato S (1984) Forced classification: a simple application of a quantification method. Psychometrika 49(1):25–36
Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum, Hillsdale
Takane Y, Jung S (2009) Tests of ignoring and eliminating in nonsymmetric correspondence analysis. Adv Data Anal Classif 3(3):315–340
Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiah PR (ed) Multivariate analysis. Academic Press, New York, pp 391–420
Acknowledgments
The authors are grateful to the editor, Pr. A. Cerioli, associate editor, and the two reviewers for their constructive comments, which improved the presentation of the paper. V. Choulakian’s research is financed by NSERC of Canada.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Singular value decomposition
Let \(\mathbf{X}\) be a data set of dimension \(I\times J\), where \(I\) observations are described by the \(J\) variables. The ordinary SVD can be described as successive maximization of the \(L_{2}\)-norm of the linear combination of the columns of \(\mathbf{X}\) subject to a quadratic constraint; that is, it is based on the following optimization problem
or equivalently, it can also be described as maximization of the the \(L_{2}\)-norm of the linear combination of the rows of \(\mathbf{X}\)
Equation (36) is the dual of (37), and they can be reexpressed as matrix norms
The solution to (38), \(\lambda _{1},\ \)is the square root of the greatest eigenvalue of the matrix \(\mathbf{X}^{\prime }\mathbf{X}\) or \(\mathbf{XX}^{\prime }.\) The first principal axes, \(\mathbf{u}_{1}\) and \(\mathbf{v}_{1},\) can be defined as
where \(\mathbf{u}_{1}\) is the eigenvector of the matrix \(\mathbf{X}^{\prime }\mathbf{X }\) associated with the greatest eigenvalue \(\lambda _{1}^{2};\) and
Let \(\mathbf{a}_{1}\) and \(\mathbf{b}_{1}\) be defined as
then
Equations (41) and (42) are named transition formulas, because \(\mathbf{v}_{1}\) and \(\mathbf{a}_{1},\) and, \(\mathbf{u}_{1}\) and \(\mathbf{b}_{1},\) are related by
To obtain \(\mathbf{a}_{2}\) and \(\mathbf{b}_{2},\) and axes \(\mathbf{u}_{2}\) and \(\mathbf{v }_{2}\), we repeat the above procedure on the residual dataset
where \(\mathbf{X}_{1}\mathbf{=X.}\) We note that \(rank(\mathbf{X}_{2}\mathbf{)=}rank(\mathbf{X_{1})-}1,\) because by (41) and (42)
Classical SVD can be described as the sequential repetition of the above procedure for \(k=rank(\mathbf{X)}\) times till the residual matrix becomes \(\mathbf{0}\); thus, using \(\alpha =1,\ldots ,k\) as indices, the matrix \(\mathbf{X}\) can be written as
which, by (43), can be rewritten in a much more familiar form
Further, we have
and
1.2 Taxicab singular value decomposition
TSVD consists of maximizing the \(L_{1}\) norm of the linear combination of the columns of X subject to the \(L_{\infty }\) norm constraint; more precisely, it is based on the following optimization problem
or equivalently, it can also be described as maximization of the \(L_{1}\) norm of the linear combination of the rows of the matrix \(\mathbf{X}\)
Equation (50) is the dual of (51), and they can be reexpressed as matrix norms
which is a well known and much discussed matrix norm related to Grothendieck problem, see for instance, Alon and Naor (2006). The solution to (52), \( \lambda _{1},\ \)is a combinatorial optimization problem given by
Equation (53) characterizes the robustness of the method, in the sense that the weights affected to the columns (similarly to the rows by duality) are uniformly distributed on \(\{-1,+1\}\). The vectors, \(\mathbf{u}_{1}\) and \(\mathbf{v} _{1},\) are defined as
and
Let \(\mathbf{a}_{1}\) and \(\mathbf{b}_{1}\) be
then
Equations (56) and (57) are named transition formulas, because \(\mathbf{v}_{1}\) and \(\mathbf{a}_{1},\) and, \(\mathbf{u}_{1}\) and \(\mathbf{b}_{1},\) are related by
where \(sgn(\mathbf{g}_{1})=(sgn(g_{1}(1)),\ldots ,sgn(g_{1}(J))^{\prime },\) and \( sgn(g_{1}(j))=1\) if \(g_{1}(j)>0,\) \(sgn(g_{1}(j))=-1\) otherwise. Note that (58) is completely different from (43).
To obtain \(\mathbf{a}_{2}\), \(\mathbf{b}_{2},\) and axes \(\mathbf{u}_{2}\) and \(\mathbf{v} _{2}\), we repeat the above procedure on the residual dataset
where \(\mathbf{X}_{1}\mathbf{=X.}\) We note that \(rank(\mathbf{X}_{2}\mathbf{)=}rank(\mathbf{X}_{1}\mathbf{)-}1,\) because by (56), (57) and (58)
which implies that
TSVD is described as the sequential repetition of the above procedure for \( k=rank(\mathbf{X)}\) times till the residual matrix becomes \(\mathbf{0}\); thus the matrix \(\mathbf{X}\) can be written as
It is important to note that (62) has the same form as (4) and (46), but it can not be rewritten as (47).
Further, similar to (57), we have
But the dispersion measures \(\lambda _{\alpha }\)’s in (63) will not satisfy (49), because the Pythagorean theorem is not satisfied in \(L_{1}.\)
In TSVD, the optimization problem (50), (51) or (52) can be accomplished by two algorithms. The first one is based on complete enumeration (53); this can be applied, with the present state of desktop computing power, say, if \( min(I,J)\simeq 25.\) The second one is based on iterating the transition formulas (56), (57) and (58), similar to Wold (1966) NIPALS (nonlinear iterative partial alternating least squares) algorithm, also named criss-cross regression by Gabriel and Zamir (1979). It is easy to show that this is also an ascent algorithm. The criss-cross algorithm is nonlinear and can be summarized in the following way, where \(\mathbf{b}\) is a starting value:
Step 1: \(\mathbf{u=}sgn\mathbf{(b)}\), \(\mathbf{a=Xu}\) and \(\lambda (\mathbf{u)} =\left| \left| \mathbf{Xu}\right| \right| _{1};\)
Step 2: \(\mathbf{v=}sgn\mathbf{(a),}\) \(\mathbf{b=X}^{\prime }\mathbf{v}\) and \(\lambda ( \mathbf{v)}=\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{1};\)
Step 3: If \(\lambda (\mathbf{v)-}\lambda (\mathbf{u)>}0\mathbf{,}\) go to Step 1; otherwise, stop.
This is an ascent algorithm; that is, it increases the value of the objective function \(\lambda \) at each iteration. The convergence of the algorithm is superlinear (very fast, at most two iterations); however it could converge to a local maximum; so we restart the algorithm \(I\) times using each row of \(\mathbf{X}\) as a starting value. The iterative algorithm is statistically consistent in the sense that as the sample size increases there will be some observations in the direction of the principal axes, so the algorithm will find the optimal solution.
Rights and permissions
About this article
Cite this article
Choulakian, V., Simonetti, B. & Pham Gia, T. Some new aspects of taxicab correspondence analysis. Stat Methods Appl 23, 401–416 (2014). https://doi.org/10.1007/s10260-014-0259-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-014-0259-6