Skip to main content
Log in

Some new aspects of taxicab correspondence analysis

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

Correspondence analysis (CA) and nonsymmetric correspondence analysis are based on generalized singular value decomposition, and, in general, they are not equivalent. Taxicab correspondence analysis (TCA) is a \(\hbox {L}_{1}\) variant of CA, and it is based on the generalized taxicab singular value decomposition (GTSVD). Our aim is to study the taxicab variant of nonsymmetric correspondence analysis. We find that for diagonal metric matrices GTSVDs of a given data set are equivalent; from which we deduce the equivalence of TCA and taxicab nonsymmetric correspondence analysis. We also attempt to show that TCA stays as close as possible to the original correspondence matrix without calculating a dissimilarity (or similarity) measure between rows or columns. Further, we discuss some new geometric and distance aspects of TCA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New Jersey

    MATH  Google Scholar 

  • Alon N, Naor A (2006) Approximating the cut-norm via Grothendieck’s inequality. SIAM J Comput 35:787–803

    Article  MATH  MathSciNet  Google Scholar 

  • Balbi S (1998) Graphical displays in nonsymmetrical correspondence analysis. In: Blasius J, Greenacre M (eds) Visualization of categorical data. Academic Press, London, pp 297–309

    Google Scholar 

  • Beh E (2012) Simple correspondence analysis using adjusted residuals. J Stat Plan Inference 142:965–973

    Article  MATH  MathSciNet  Google Scholar 

  • Benzécri JP (1973) L’Analyse des Données. In: L’Analyse des Correspondances, vol 2, Dunod, Paris

  • Benzécri JP (1992) Correspondence analysis handbook. Marcel Dekker, New York

    MATH  Google Scholar 

  • Choulakian V (2003) The optimality of the centroid method. Psychometrika 68:473–475

    Article  MathSciNet  Google Scholar 

  • Choulakian V (2005) Transposition invariant principal component analysis in \(\text{ L }_{1}\) for long tailed data. Stat Probab Lett 71:23–31

    Article  MATH  MathSciNet  Google Scholar 

  • Choulakian V (2006a) Taxicab correspondence analysis. Psychometrika 71:333–345

    Article  MathSciNet  Google Scholar 

  • Choulakian V (2006b) \(\text{ L }_{1}\) norm projection pursuit principal component analysis. Comput Stat Data Anal 50:1441–1451

    Article  MATH  MathSciNet  Google Scholar 

  • Choulakian V (2008a) Taxicab correspondence analysis of contingency tables with one heavyweight column. Psychometrika 73:309–319

    Article  MATH  MathSciNet  Google Scholar 

  • Choulakian V (2008b) Multiple taxicab correspondence analysis. Adv Data Anal Classif 2:177–206

    Article  MathSciNet  Google Scholar 

  • Choulakian V, de Tibeiro J (2013) Graph partitioning by correspondence analysis and taxicab correspondence analysis. J Classif 30(3):397–427

    Google Scholar 

  • Choulakian V, Allard J, Simonetti B (2013) Multiple taxicab correspondence analysis of a survey related to health services. J Data Sci 11(2):205–229

    MathSciNet  Google Scholar 

  • Choulakian V, Kasparian S, Miyake M, Akama H, Makoshi N, Nakagawa M (2006) A statistical analysis of synoptic gospels. In: Viprey JR (ed) Proceedings of 8th international conference on textual data. JADT’2006, Press Universitaires de Franche-Comté, pp 281–288

  • Choulakian V (2013) The simple sum score statistic in taxicab correspondence analysis. In: Brentari E, Carpita M (eds) eBook, Advances in latent variables, Vita e Pensiero, Milan, Italy, ISBN:9788834325568

  • Fichet B (2009) Metrics of \(\text{ L }_{p}\)-type and distributional equivalence principle. Adv Data Anal Classif 3:305–314

    MathSciNet  Google Scholar 

  • Gabriel KR, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21:489–498

    MATH  Google Scholar 

  • Gifi A (1990) Nonlinear multivariate analysis. Wiley, New York

    MATH  Google Scholar 

  • Greenacre M (1984) Theory and applications of correspondence analysis. Academic Press, London

    MATH  Google Scholar 

  • Greenacre M (2010) Correspondence analysis of raw data. Ecology 91(4):958–963

    Google Scholar 

  • Haberman SJ (1973) The analysis of residuals in cross-classified tables. Biometrics 75:457–467

    Google Scholar 

  • Kreyszig E (1978) Introduction to functional analysis with applications. Wiley, New York

    Google Scholar 

  • Lauro NC, D’Ambra L et al (1984) L’analyse non symétrique des correspondances. In: Diday E et al (eds) Data analysis and informatics. Amsterdam, North Holland, pp 433–446

  • Le Roux B, Rouanet H (2004) Geometric data analysis. From correspondence analysis to structured data analysis. Kluwer–Springer, Dordrecht

    MATH  Google Scholar 

  • Murtagh F (2005) Correspondence analysis and data coding with Java and R. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  • Nishisato S (1984) Forced classification: a simple application of a quantification method. Psychometrika 49(1):25–36

    Google Scholar 

  • Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum, Hillsdale

    Google Scholar 

  • Takane Y, Jung S (2009) Tests of ignoring and eliminating in nonsymmetric correspondence analysis. Adv Data Anal Classif 3(3):315–340

    MathSciNet  Google Scholar 

  • Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiah PR (ed) Multivariate analysis. Academic Press, New York, pp 391–420

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to the editor, Pr. A. Cerioli, associate editor, and the two reviewers for their constructive comments, which improved the presentation of the paper. V. Choulakian’s research is financed by NSERC of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vartan Choulakian.

Appendix

Appendix

1.1 Singular value decomposition

Let \(\mathbf{X}\) be a data set of dimension \(I\times J\), where \(I\) observations are described by the \(J\) variables. The ordinary SVD can be described as successive maximization of the \(L_{2}\)-norm of the linear combination of the columns of \(\mathbf{X}\) subject to a quadratic constraint; that is, it is based on the following optimization problem

$$\begin{aligned} max\left| \left| \mathbf{Xu}\right| \right| _{2}\quad \text { subject to }\,\left| \left| \mathbf{u}\right| \right| _{2}=1\mathbf{;} \end{aligned}$$
(36)

or equivalently, it can also be described as maximization of the the \(L_{2}\)-norm of the linear combination of the rows of \(\mathbf{X}\)

$$\begin{aligned} max\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{2}\quad \text {subject to }\,\left| \left| \mathbf{v}\right| \right| _{2}=1\mathbf{.} \end{aligned}$$
(37)

Equation (36) is the dual of (37), and they can be reexpressed as matrix norms

$$\begin{aligned} \lambda _{1}&= \max _{\mathbf{u\in \mathbb {R} }^{J}}\frac{\left| \left| \mathbf{Xu}\right| \right| _{2}}{ \left| \left| \mathbf{u}\right| \right| _{2}}, \nonumber \\&= \max _{\mathbf{v\in \mathbb {R} }^{I}}\frac{\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{2}}{\left| \left| \mathbf{v}\right| \right| _{2}},\nonumber \\&= \max _{\mathbf{u\in \mathbb {R} }^{J},\mathbf{v\in \mathbb {R} }^{I}}\frac{\mathbf{v}^{\prime }\mathbf{Xu}}{\left| \left| \mathbf{u} \right| \right| _{2}\left| \left| \mathbf{v}\right| \right| _{2}}. \end{aligned}$$
(38)

The solution to (38), \(\lambda _{1},\ \)is the square root of the greatest eigenvalue of the matrix \(\mathbf{X}^{\prime }\mathbf{X}\) or \(\mathbf{XX}^{\prime }.\) The first principal axes, \(\mathbf{u}_{1}\) and \(\mathbf{v}_{1},\) can be defined as

$$\begin{aligned} \mathbf{u}_{1}=\arg \max _{\mathbf{u}}\left| \left| \mathbf{Xu}\right| \right| _{2}\,\text { such that }\,\left| \left| \mathbf{u} _{1}\right| \right| _{2}=1, \end{aligned}$$
(39)

where \(\mathbf{u}_{1}\) is the eigenvector of the matrix \(\mathbf{X}^{\prime }\mathbf{X }\) associated with the greatest eigenvalue \(\lambda _{1}^{2};\) and

$$\begin{aligned} \mathbf{v}_{1}=\arg \max _{\mathbf{v}}\left| \left| \mathbf{X}^{\prime }\mathbf{v} \right| \right| _{2}\,\text { such that }\,\left| \left| \mathbf{v} _{1}\right| \right| _{2}=1. \end{aligned}$$
(40)

Let \(\mathbf{a}_{1}\) and \(\mathbf{b}_{1}\) be defined as

$$\begin{aligned} \mathbf{a}_{1}=\mathbf{Xu}_{1}\, \text { and }\, \mathbf{b}_{1}=\mathbf{X}^{\prime }\mathbf{v}_{1}; \end{aligned}$$
(41)

then

$$\begin{aligned} \left| \left| \mathbf{a}_{1}\right| \right| _{2}=\mathbf{v} _{1}^{\prime }\mathbf{a}_{1}=\left| \left| \mathbf{b}_{1}\right| \right| _{2}=\mathbf{u}_{1}^{\prime }\mathbf{b}_{1}=\lambda _{1}. \end{aligned}$$
(42)

Equations (41) and (42) are named transition formulas, because \(\mathbf{v}_{1}\) and \(\mathbf{a}_{1},\) and, \(\mathbf{u}_{1}\) and \(\mathbf{b}_{1},\) are related by

$$\begin{aligned} \mathbf{u}_{1}=\mathbf{b}_{1}/\lambda _{1} \quad \text { and} \quad \mathbf{v}_{1}=\mathbf{a} _{1}/\lambda _{1}. \end{aligned}$$
(43)

To obtain \(\mathbf{a}_{2}\) and \(\mathbf{b}_{2},\) and axes \(\mathbf{u}_{2}\) and \(\mathbf{v }_{2}\), we repeat the above procedure on the residual dataset

$$\begin{aligned} \mathbf{X}_{2}\mathbf{=X}_{1}\mathbf{-a}_{1}\mathbf{b}_{1}^{^{\prime }}/\lambda _{1}, \end{aligned}$$
(44)

where \(\mathbf{X}_{1}\mathbf{=X.}\) We note that \(rank(\mathbf{X}_{2}\mathbf{)=}rank(\mathbf{X_{1})-}1,\) because by (41) and (42)

$$\begin{aligned} \mathbf{X}_{2}\mathbf{u}_{1}=\mathbf{0}\, \text { and}\, \mathbf{X}_{2}^{\prime }\mathbf{v}_{1}= \mathbf{0}. \end{aligned}$$
(45)

Classical SVD can be described as the sequential repetition of the above procedure for \(k=rank(\mathbf{X)}\) times till the residual matrix becomes \(\mathbf{0}\); thus, using \(\alpha =1,\ldots ,k\) as indices, the matrix \(\mathbf{X}\) can be written as

$$\begin{aligned} \mathbf{X}=\sum _{\alpha =1}^{k}\mathbf{a}_{\alpha }\mathbf{b}_{\alpha }^{\prime }/\lambda _{\alpha }, \end{aligned}$$
(46)

which, by (43), can be rewritten in a much more familiar form

$$\begin{aligned} \mathbf{X}=\sum _{\alpha =1}^{k}\lambda _{\alpha }\mathbf{v}_{\alpha }\mathbf{u} _{\alpha }^{\prime }. \end{aligned}$$
(47)

Further, we have

$$\begin{aligned} \lambda _{\alpha }=\left| \left| \mathbf{a}_{\alpha }\right| \right| _{2}=\left| \left| \mathbf{b}_{\alpha }\right| \right| _{2}\quad \text {and }\, \lambda _{\alpha }\text {'}s \, \text {are decreasing for} \quad \alpha =1,\ldots ,k; \end{aligned}$$
(48)

and

$$\begin{aligned} Tr(\mathbf{X}^{\prime }\mathbf{X})&= Tr(\mathbf{XX}^{\prime })=\sum _{\alpha =1}^{k}\lambda _{\alpha }^{2}, \\&= \sum _{\alpha =1}^{k}\left| \left| \mathbf{a}_{\alpha }\right| \right| _{2}^{2}=\sum _{\alpha =1}^{k}\left| \left| \mathbf{b} _{\alpha }\right| \right| _{2}^{2}. \nonumber \end{aligned}$$
(49)

1.2 Taxicab singular value decomposition

TSVD consists of maximizing the \(L_{1}\) norm of the linear combination of the columns of X subject to the \(L_{\infty }\) norm constraint; more precisely, it is based on the following optimization problem

$$\begin{aligned} max\left| \left| \mathbf{Xu}\right| \right| _{1}\quad \text { subject to }\,\left| \left| \mathbf{u}\right| \right| _{\infty }=1 \mathbf{;} \end{aligned}$$
(50)

or equivalently, it can also be described as maximization of the \(L_{1}\) norm of the linear combination of the rows of the matrix \(\mathbf{X}\)

$$\begin{aligned} max\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{1}\quad \text {subject to }\, \left| \left| \mathbf{v}\right| \right| _{\infty }=1\mathbf{.} \end{aligned}$$
(51)

Equation (50) is the dual of (51), and they can be reexpressed as matrix norms

$$\begin{aligned} \lambda _{1}&= \max _{\mathbf{u\in \mathbb {R} }^{J}}\frac{\left| \left| \mathbf{Xu}\right| \right| _{1}}{ \left| \left| \mathbf{u}\right| \right| _{\infty }}, \nonumber \\&= \max _{\mathbf{v\in \mathbb {R} }^{I}}\frac{\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{1}}{\left| \left| \mathbf{v}\right| \right| _{\infty }}, \nonumber \\&= \max _{\mathbf{u\in \mathbb {R} }^{J},\mathbf{v\in \mathbb {R} }^{I}}\frac{\mathbf{v}^{\prime }\mathbf{Xu}}{\left| \left| \mathbf{u} \right| \right| _{\infty }\left| \left| \mathbf{v}\right| \right| _{\infty }}, \end{aligned}$$
(52)

which is a well known and much discussed matrix norm related to Grothendieck problem, see for instance, Alon and Naor (2006). The solution to (52), \( \lambda _{1},\ \)is a combinatorial optimization problem given by

$$\begin{aligned} \max \mathbf{||Xu||}_{1}\quad \text {subject to }\,\mathbf{u}\in \left\{ -1,+1\right\} ^{J}. \end{aligned}$$
(53)

Equation (53) characterizes the robustness of the method, in the sense that the weights affected to the columns (similarly to the rows by duality) are uniformly distributed on \(\{-1,+1\}\). The vectors, \(\mathbf{u}_{1}\) and \(\mathbf{v} _{1},\) are defined as

$$\begin{aligned} \mathbf{u}_{1}=\arg \max _{\mathbf{u}}\left| \left| \mathbf{Xu}\right| \right| _{1}\, \text { such that }\,\left| \left| \mathbf{u} _{1}\right| \right| _{\infty }=1, \end{aligned}$$
(54)

and

$$\begin{aligned} \mathbf{v}_{1}=\arg \max _{\mathbf{v}}\left| \left| \mathbf{X}^{\prime }\mathbf{v} \right| \right| _{1}\, \text { such that }\,\left| \left| \mathbf{v} _{1}\right| \right| _{\infty }=1. \end{aligned}$$
(55)

Let \(\mathbf{a}_{1}\) and \(\mathbf{b}_{1}\) be

$$\begin{aligned} \mathbf{a}_{1}=\mathbf{Xu}_{1}\, \text { and }\, \mathbf{b}_{1}=\mathbf{X}^{\prime }\mathbf{v}_{1} \text {;} \end{aligned}$$
(56)

then

$$\begin{aligned} \left| \left| \mathbf{a}_{1}\right| \right| _{1}=\mathbf{v} _{1}^{\prime }\mathbf{a}_{1}=\left| \left| \mathbf{b}_{1}\right| \right| _{1}=\mathbf{u}_{1}^{\prime }\mathbf{b}_{1}=\lambda _{1}. \end{aligned}$$
(57)

Equations (56) and (57) are named transition formulas, because \(\mathbf{v}_{1}\) and \(\mathbf{a}_{1},\) and, \(\mathbf{u}_{1}\) and \(\mathbf{b}_{1},\) are related by

$$\begin{aligned} \mathbf{u}_{1}=sgn(\mathbf{b}_{1}) \quad \text {and } \quad \mathbf{v}_{1}=sgn(\mathbf{a}_{1}), \end{aligned}$$
(58)

where \(sgn(\mathbf{g}_{1})=(sgn(g_{1}(1)),\ldots ,sgn(g_{1}(J))^{\prime },\) and \( sgn(g_{1}(j))=1\) if \(g_{1}(j)>0,\) \(sgn(g_{1}(j))=-1\) otherwise. Note that (58) is completely different from (43).

To obtain \(\mathbf{a}_{2}\), \(\mathbf{b}_{2},\) and axes \(\mathbf{u}_{2}\) and \(\mathbf{v} _{2}\), we repeat the above procedure on the residual dataset

$$\begin{aligned} \mathbf{X}_{2}&= \mathbf{X}_{1}\mathbf{-X}_{1}\mathbf{u}_{1}\mathbf{v}_{1}^{\prime } \mathbf{X}_{1}/\lambda _{1}, \nonumber \\&= \mathbf{X}_{1}\mathbf{-a}_{1}\mathbf{b}_{1}^{\prime }/\lambda _{1} \end{aligned}$$
(59)

where \(\mathbf{X}_{1}\mathbf{=X.}\) We note that \(rank(\mathbf{X}_{2}\mathbf{)=}rank(\mathbf{X}_{1}\mathbf{)-}1,\) because by (56), (57) and (58)

$$\begin{aligned} \mathbf{X}_{2}\mathbf{u}_{1}=\mathbf{0}\, \text { and }\, \mathbf{X}_{2}^{\prime }\mathbf{v}_{1}= \mathbf{0;} \end{aligned}$$
(60)

which implies that

$$\begin{aligned} \mathbf{u}_{1}^{\prime }\mathbf{b}_{\alpha }=0 \, \text {and}\,\mathbf{v}_{1}^{\prime } \mathbf{a}_{\alpha }=0\,\text { for }\, \alpha =2,\ldots ,k. \end{aligned}$$
(61)

TSVD is described as the sequential repetition of the above procedure for \( k=rank(\mathbf{X)}\) times till the residual matrix becomes \(\mathbf{0}\); thus the matrix \(\mathbf{X}\) can be written as

$$\begin{aligned} \mathbf{X}=\sum _{\alpha =1}^{k}\mathbf{a}_{\alpha }\mathbf{b}_{\alpha }^{\prime }/\lambda _{\alpha }. \end{aligned}$$
(62)

It is important to note that (62) has the same form as (4) and (46), but it can not be rewritten as (47).

Further, similar to (57), we have

$$\begin{aligned} \lambda _{\alpha }=\left| \left| \mathbf{a}_{\alpha }\right| \right| _{1}=\left| \left| \mathbf{b}_{\alpha }\right| \right| _{1} \quad \text {for} \quad \alpha =1,\ldots ,k. \end{aligned}$$
(63)

But the dispersion measures \(\lambda _{\alpha }\)’s in (63) will not satisfy (49), because the Pythagorean theorem is not satisfied in \(L_{1}.\)

In TSVD, the optimization problem (50), (51) or (52) can be accomplished by two algorithms. The first one is based on complete enumeration (53); this can be applied, with the present state of desktop computing power, say, if \( min(I,J)\simeq 25.\) The second one is based on iterating the transition formulas (56), (57) and (58), similar to Wold (1966) NIPALS (nonlinear iterative partial alternating least squares) algorithm, also named criss-cross regression by Gabriel and Zamir (1979). It is easy to show that this is also an ascent algorithm. The criss-cross algorithm is nonlinear and can be summarized in the following way, where \(\mathbf{b}\) is a starting value:

Step 1: \(\mathbf{u=}sgn\mathbf{(b)}\), \(\mathbf{a=Xu}\) and \(\lambda (\mathbf{u)} =\left| \left| \mathbf{Xu}\right| \right| _{1};\)

Step 2: \(\mathbf{v=}sgn\mathbf{(a),}\) \(\mathbf{b=X}^{\prime }\mathbf{v}\) and \(\lambda ( \mathbf{v)}=\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{1};\)

Step 3: If \(\lambda (\mathbf{v)-}\lambda (\mathbf{u)>}0\mathbf{,}\) go to Step 1; otherwise, stop.

This is an ascent algorithm; that is, it increases the value of the objective function \(\lambda \) at each iteration. The convergence of the algorithm is superlinear (very fast, at most two iterations); however it could converge to a local maximum; so we restart the algorithm \(I\) times using each row of \(\mathbf{X}\) as a starting value. The iterative algorithm is statistically consistent in the sense that as the sample size increases there will be some observations in the direction of the principal axes, so the algorithm will find the optimal solution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choulakian, V., Simonetti, B. & Pham Gia, T. Some new aspects of taxicab correspondence analysis. Stat Methods Appl 23, 401–416 (2014). https://doi.org/10.1007/s10260-014-0259-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-014-0259-6

Keywords

Navigation