Some new aspects of taxicab correspondence analysis

Choulakian, Vartan; Simonetti, Biagio; Pham Gia, Thu

doi:10.1007/s10260-014-0259-6

Some new aspects of taxicab correspondence analysis

Published: 08 March 2014

Volume 23, pages 401–416, (2014)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Vartan Choulakian¹,
Biagio Simonetti² &
Thu Pham Gia¹

181 Accesses
3 Citations
Explore all metrics

Abstract

Correspondence analysis (CA) and nonsymmetric correspondence analysis are based on generalized singular value decomposition, and, in general, they are not equivalent. Taxicab correspondence analysis (TCA) is a $\hbox {L}_{1}$ variant of CA, and it is based on the generalized taxicab singular value decomposition (GTSVD). Our aim is to study the taxicab variant of nonsymmetric correspondence analysis. We find that for diagonal metric matrices GTSVDs of a given data set are equivalent; from which we deduce the equivalence of TCA and taxicab nonsymmetric correspondence analysis. We also attempt to show that TCA stays as close as possible to the original correspondence matrix without calculating a dissimilarity (or similarity) measure between rows or columns. Further, we discuss some new geometric and distance aspects of TCA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiple correspondence analysis: one only or several techniques?

Article 21 April 2015

Joint Correspondence Analysis Versus Multiple Correspondence Analysis: A Solution to an Undetected Problem

Toward Robust and Fast Two-Dimensional Linear Discriminant Analysis

References

Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New Jersey
MATH Google Scholar
Alon N, Naor A (2006) Approximating the cut-norm via Grothendieck’s inequality. SIAM J Comput 35:787–803
Article MATH MathSciNet Google Scholar
Balbi S (1998) Graphical displays in nonsymmetrical correspondence analysis. In: Blasius J, Greenacre M (eds) Visualization of categorical data. Academic Press, London, pp 297–309
Google Scholar
Beh E (2012) Simple correspondence analysis using adjusted residuals. J Stat Plan Inference 142:965–973
Article MATH MathSciNet Google Scholar
Benzécri JP (1973) L’Analyse des Données. In: L’Analyse des Correspondances, vol 2, Dunod, Paris
Benzécri JP (1992) Correspondence analysis handbook. Marcel Dekker, New York
MATH Google Scholar
Choulakian V (2003) The optimality of the centroid method. Psychometrika 68:473–475
Article MathSciNet Google Scholar
Choulakian V (2005) Transposition invariant principal component analysis in $\text{ L }_{1}$ for long tailed data. Stat Probab Lett 71:23–31
Article MATH MathSciNet Google Scholar
Choulakian V (2006a) Taxicab correspondence analysis. Psychometrika 71:333–345
Article MathSciNet Google Scholar
Choulakian V (2006b) $\text{ L }_{1}$ norm projection pursuit principal component analysis. Comput Stat Data Anal 50:1441–1451
Article MATH MathSciNet Google Scholar
Choulakian V (2008a) Taxicab correspondence analysis of contingency tables with one heavyweight column. Psychometrika 73:309–319
Article MATH MathSciNet Google Scholar
Choulakian V (2008b) Multiple taxicab correspondence analysis. Adv Data Anal Classif 2:177–206
Article MathSciNet Google Scholar
Choulakian V, de Tibeiro J (2013) Graph partitioning by correspondence analysis and taxicab correspondence analysis. J Classif 30(3):397–427
Google Scholar
Choulakian V, Allard J, Simonetti B (2013) Multiple taxicab correspondence analysis of a survey related to health services. J Data Sci 11(2):205–229
MathSciNet Google Scholar
Choulakian V, Kasparian S, Miyake M, Akama H, Makoshi N, Nakagawa M (2006) A statistical analysis of synoptic gospels. In: Viprey JR (ed) Proceedings of 8th international conference on textual data. JADT’2006, Press Universitaires de Franche-Comté, pp 281–288
Choulakian V (2013) The simple sum score statistic in taxicab correspondence analysis. In: Brentari E, Carpita M (eds) eBook, Advances in latent variables, Vita e Pensiero, Milan, Italy, ISBN:9788834325568
Fichet B (2009) Metrics of $\text{ L }_{p}$-type and distributional equivalence principle. Adv Data Anal Classif 3:305–314
MathSciNet Google Scholar
Gabriel KR, Zamir S (1979) Lower rank approximation of matrices by least squares with any choice of weights. Technometrics 21:489–498
MATH Google Scholar
Gifi A (1990) Nonlinear multivariate analysis. Wiley, New York
MATH Google Scholar
Greenacre M (1984) Theory and applications of correspondence analysis. Academic Press, London
MATH Google Scholar
Greenacre M (2010) Correspondence analysis of raw data. Ecology 91(4):958–963
Google Scholar
Haberman SJ (1973) The analysis of residuals in cross-classified tables. Biometrics 75:457–467
Google Scholar
Kreyszig E (1978) Introduction to functional analysis with applications. Wiley, New York
Google Scholar
Lauro NC, D’Ambra L et al (1984) L’analyse non symétrique des correspondances. In: Diday E et al (eds) Data analysis and informatics. Amsterdam, North Holland, pp 433–446
Le Roux B, Rouanet H (2004) Geometric data analysis. From correspondence analysis to structured data analysis. Kluwer–Springer, Dordrecht
MATH Google Scholar
Murtagh F (2005) Correspondence analysis and data coding with Java and R. Chapman & Hall/CRC, Boca Raton
Google Scholar
Nishisato S (1984) Forced classification: a simple application of a quantification method. Psychometrika 49(1):25–36
Google Scholar
Nishisato S (1994) Elements of dual scaling: an introduction to practical data analysis. Lawrence Erlbaum, Hillsdale
Google Scholar
Takane Y, Jung S (2009) Tests of ignoring and eliminating in nonsymmetric correspondence analysis. Adv Data Anal Classif 3(3):315–340
MathSciNet Google Scholar
Wold H (1966) Estimation of principal components and related models by iterative least squares. In: Krishnaiah PR (ed) Multivariate analysis. Academic Press, New York, pp 391–420
Google Scholar

Download references

Acknowledgments

The authors are grateful to the editor, Pr. A. Cerioli, associate editor, and the two reviewers for their constructive comments, which improved the presentation of the paper. V. Choulakian’s research is financed by NSERC of Canada.

Author information

Authors and Affiliations

Université de Moncton, Moncton, Canada
Vartan Choulakian & Thu Pham Gia
University of Sannio, Benevento, Italy
Biagio Simonetti

Authors

Vartan Choulakian
View author publications
You can also search for this author in PubMed Google Scholar
Biagio Simonetti
View author publications
You can also search for this author in PubMed Google Scholar
Thu Pham Gia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vartan Choulakian.

Appendix

1.1 Singular value decomposition

Let $\mathbf{X}$ be a data set of dimension $I\times J$, where $I$ observations are described by the $J$ variables. The ordinary SVD can be described as successive maximization of the $L_{2}$-norm of the linear combination of the columns of $\mathbf{X}$ subject to a quadratic constraint; that is, it is based on the following optimization problem

$$\begin{aligned} max\left| \left| \mathbf{Xu}\right| \right| _{2}\quad \text { subject to }\,\left| \left| \mathbf{u}\right| \right| _{2}=1\mathbf{;} \end{aligned}$$

(36)

or equivalently, it can also be described as maximization of the the $L_{2}$-norm of the linear combination of the rows of $\mathbf{X}$

$$\begin{aligned} max\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{2}\quad \text {subject to }\,\left| \left| \mathbf{v}\right| \right| _{2}=1\mathbf{.} \end{aligned}$$

(37)

Equation (36) is the dual of (37), and they can be reexpressed as matrix norms

$$\begin{aligned} \lambda _{1}&= \max _{\mathbf{u\in \mathbb {R} }^{J}}\frac{\left| \left| \mathbf{Xu}\right| \right| _{2}}{ \left| \left| \mathbf{u}\right| \right| _{2}}, \nonumber \\&= \max _{\mathbf{v\in \mathbb {R} }^{I}}\frac{\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{2}}{\left| \left| \mathbf{v}\right| \right| _{2}},\nonumber \\&= \max _{\mathbf{u\in \mathbb {R} }^{J},\mathbf{v\in \mathbb {R} }^{I}}\frac{\mathbf{v}^{\prime }\mathbf{Xu}}{\left| \left| \mathbf{u} \right| \right| _{2}\left| \left| \mathbf{v}\right| \right| _{2}}. \end{aligned}$$

(38)

The solution to (38), $\lambda _{1},\ $is the square root of the greatest eigenvalue of the matrix $\mathbf{X}^{\prime }\mathbf{X}$ or $\mathbf{XX}^{\prime }.$ The first principal axes, $\mathbf{u}_{1}$ and $\mathbf{v}_{1},$ can be defined as

$$\begin{aligned} \mathbf{u}_{1}=\arg \max _{\mathbf{u}}\left| \left| \mathbf{Xu}\right| \right| _{2}\,\text { such that }\,\left| \left| \mathbf{u} _{1}\right| \right| _{2}=1, \end{aligned}$$

(39)

where $\mathbf{u}_{1}$ is the eigenvector of the matrix $\mathbf{X}^{\prime }\mathbf{X }$ associated with the greatest eigenvalue $\lambda _{1}^{2};$ and

$$\begin{aligned} \mathbf{v}_{1}=\arg \max _{\mathbf{v}}\left| \left| \mathbf{X}^{\prime }\mathbf{v} \right| \right| _{2}\,\text { such that }\,\left| \left| \mathbf{v} _{1}\right| \right| _{2}=1. \end{aligned}$$

(40)

Let $\mathbf{a}_{1}$ and $\mathbf{b}_{1}$ be defined as

$$\begin{aligned} \mathbf{a}_{1}=\mathbf{Xu}_{1}\, \text { and }\, \mathbf{b}_{1}=\mathbf{X}^{\prime }\mathbf{v}_{1}; \end{aligned}$$

(41)

then

$$\begin{aligned} \left| \left| \mathbf{a}_{1}\right| \right| _{2}=\mathbf{v} _{1}^{\prime }\mathbf{a}_{1}=\left| \left| \mathbf{b}_{1}\right| \right| _{2}=\mathbf{u}_{1}^{\prime }\mathbf{b}_{1}=\lambda _{1}. \end{aligned}$$

(42)

Equations (41) and (42) are named transition formulas, because $\mathbf{v}_{1}$ and $\mathbf{a}_{1},$ and, $\mathbf{u}_{1}$ and $\mathbf{b}_{1},$ are related by

$$\begin{aligned} \mathbf{u}_{1}=\mathbf{b}_{1}/\lambda _{1} \quad \text { and} \quad \mathbf{v}_{1}=\mathbf{a} _{1}/\lambda _{1}. \end{aligned}$$

(43)

To obtain $\mathbf{a}_{2}$ and $\mathbf{b}_{2},$ and axes $\mathbf{u}_{2}$ and $\mathbf{v }_{2}$, we repeat the above procedure on the residual dataset

$$\begin{aligned} \mathbf{X}_{2}\mathbf{=X}_{1}\mathbf{-a}_{1}\mathbf{b}_{1}^{^{\prime }}/\lambda _{1}, \end{aligned}$$

(44)

where $\mathbf{X}_{1}\mathbf{=X.}$ We note that $rank(\mathbf{X}_{2}\mathbf{)=}rank(\mathbf{X_{1})-}1,$ because by (41) and (42)

$$\begin{aligned} \mathbf{X}_{2}\mathbf{u}_{1}=\mathbf{0}\, \text { and}\, \mathbf{X}_{2}^{\prime }\mathbf{v}_{1}= \mathbf{0}. \end{aligned}$$

(45)

Classical SVD can be described as the sequential repetition of the above procedure for $k=rank(\mathbf{X)}$ times till the residual matrix becomes $\mathbf{0}$; thus, using $\alpha =1,\ldots ,k$ as indices, the matrix $\mathbf{X}$ can be written as

$$\begin{aligned} \mathbf{X}=\sum _{\alpha =1}^{k}\mathbf{a}_{\alpha }\mathbf{b}_{\alpha }^{\prime }/\lambda _{\alpha }, \end{aligned}$$

(46)

which, by (43), can be rewritten in a much more familiar form

$$\begin{aligned} \mathbf{X}=\sum _{\alpha =1}^{k}\lambda _{\alpha }\mathbf{v}_{\alpha }\mathbf{u} _{\alpha }^{\prime }. \end{aligned}$$

(47)

Further, we have

$$\begin{aligned} \lambda _{\alpha }=\left| \left| \mathbf{a}_{\alpha }\right| \right| _{2}=\left| \left| \mathbf{b}_{\alpha }\right| \right| _{2}\quad \text {and }\, \lambda _{\alpha }\text {'}s \, \text {are decreasing for} \quad \alpha =1,\ldots ,k; \end{aligned}$$

(48)

and

$$\begin{aligned} Tr(\mathbf{X}^{\prime }\mathbf{X})&= Tr(\mathbf{XX}^{\prime })=\sum _{\alpha =1}^{k}\lambda _{\alpha }^{2}, \\&= \sum _{\alpha =1}^{k}\left| \left| \mathbf{a}_{\alpha }\right| \right| _{2}^{2}=\sum _{\alpha =1}^{k}\left| \left| \mathbf{b} _{\alpha }\right| \right| _{2}^{2}. \nonumber \end{aligned}$$

(49)

1.2 Taxicab singular value decomposition

TSVD consists of maximizing the $L_{1}$ norm of the linear combination of the columns of X subject to the $L_{\infty }$ norm constraint; more precisely, it is based on the following optimization problem

$$\begin{aligned} max\left| \left| \mathbf{Xu}\right| \right| _{1}\quad \text { subject to }\,\left| \left| \mathbf{u}\right| \right| _{\infty }=1 \mathbf{;} \end{aligned}$$

(50)

or equivalently, it can also be described as maximization of the $L_{1}$ norm of the linear combination of the rows of the matrix $\mathbf{X}$

$$\begin{aligned} max\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{1}\quad \text {subject to }\, \left| \left| \mathbf{v}\right| \right| _{\infty }=1\mathbf{.} \end{aligned}$$

(51)

Equation (50) is the dual of (51), and they can be reexpressed as matrix norms

$$\begin{aligned} \lambda _{1}&= \max _{\mathbf{u\in \mathbb {R} }^{J}}\frac{\left| \left| \mathbf{Xu}\right| \right| _{1}}{ \left| \left| \mathbf{u}\right| \right| _{\infty }}, \nonumber \\&= \max _{\mathbf{v\in \mathbb {R} }^{I}}\frac{\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{1}}{\left| \left| \mathbf{v}\right| \right| _{\infty }}, \nonumber \\&= \max _{\mathbf{u\in \mathbb {R} }^{J},\mathbf{v\in \mathbb {R} }^{I}}\frac{\mathbf{v}^{\prime }\mathbf{Xu}}{\left| \left| \mathbf{u} \right| \right| _{\infty }\left| \left| \mathbf{v}\right| \right| _{\infty }}, \end{aligned}$$

(52)

which is a well known and much discussed matrix norm related to Grothendieck problem, see for instance, Alon and Naor (2006). The solution to (52), $ \lambda _{1},\ $is a combinatorial optimization problem given by

$$\begin{aligned} \max \mathbf{||Xu||}_{1}\quad \text {subject to }\,\mathbf{u}\in \left\{ -1,+1\right\} ^{J}. \end{aligned}$$

(53)

Equation (53) characterizes the robustness of the method, in the sense that the weights affected to the columns (similarly to the rows by duality) are uniformly distributed on $\{-1,+1\}$. The vectors, $\mathbf{u}_{1}$ and $\mathbf{v} _{1},$ are defined as

$$\begin{aligned} \mathbf{u}_{1}=\arg \max _{\mathbf{u}}\left| \left| \mathbf{Xu}\right| \right| _{1}\, \text { such that }\,\left| \left| \mathbf{u} _{1}\right| \right| _{\infty }=1, \end{aligned}$$

(54)

and

$$\begin{aligned} \mathbf{v}_{1}=\arg \max _{\mathbf{v}}\left| \left| \mathbf{X}^{\prime }\mathbf{v} \right| \right| _{1}\, \text { such that }\,\left| \left| \mathbf{v} _{1}\right| \right| _{\infty }=1. \end{aligned}$$

(55)

Let $\mathbf{a}_{1}$ and $\mathbf{b}_{1}$ be

$$\begin{aligned} \mathbf{a}_{1}=\mathbf{Xu}_{1}\, \text { and }\, \mathbf{b}_{1}=\mathbf{X}^{\prime }\mathbf{v}_{1} \text {;} \end{aligned}$$

(56)

then

$$\begin{aligned} \left| \left| \mathbf{a}_{1}\right| \right| _{1}=\mathbf{v} _{1}^{\prime }\mathbf{a}_{1}=\left| \left| \mathbf{b}_{1}\right| \right| _{1}=\mathbf{u}_{1}^{\prime }\mathbf{b}_{1}=\lambda _{1}. \end{aligned}$$

(57)

Equations (56) and (57) are named transition formulas, because $\mathbf{v}_{1}$ and $\mathbf{a}_{1},$ and, $\mathbf{u}_{1}$ and $\mathbf{b}_{1},$ are related by

$$\begin{aligned} \mathbf{u}_{1}=sgn(\mathbf{b}_{1}) \quad \text {and } \quad \mathbf{v}_{1}=sgn(\mathbf{a}_{1}), \end{aligned}$$

(58)

where $sgn(\mathbf{g}_{1})=(sgn(g_{1}(1)),\ldots ,sgn(g_{1}(J))^{\prime },$ and $ sgn(g_{1}(j))=1$ if $g_{1}(j)>0,$ $sgn(g_{1}(j))=-1$ otherwise. Note that (58) is completely different from (43).

To obtain $\mathbf{a}_{2}$, $\mathbf{b}_{2},$ and axes $\mathbf{u}_{2}$ and $\mathbf{v} _{2}$, we repeat the above procedure on the residual dataset

$$\begin{aligned} \mathbf{X}_{2}&= \mathbf{X}_{1}\mathbf{-X}_{1}\mathbf{u}_{1}\mathbf{v}_{1}^{\prime } \mathbf{X}_{1}/\lambda _{1}, \nonumber \\&= \mathbf{X}_{1}\mathbf{-a}_{1}\mathbf{b}_{1}^{\prime }/\lambda _{1} \end{aligned}$$

(59)

where $\mathbf{X}_{1}\mathbf{=X.}$ We note that $rank(\mathbf{X}_{2}\mathbf{)=}rank(\mathbf{X}_{1}\mathbf{)-}1,$ because by (56), (57) and (58)

$$\begin{aligned} \mathbf{X}_{2}\mathbf{u}_{1}=\mathbf{0}\, \text { and }\, \mathbf{X}_{2}^{\prime }\mathbf{v}_{1}= \mathbf{0;} \end{aligned}$$

(60)

which implies that

$$\begin{aligned} \mathbf{u}_{1}^{\prime }\mathbf{b}_{\alpha }=0 \, \text {and}\,\mathbf{v}_{1}^{\prime } \mathbf{a}_{\alpha }=0\,\text { for }\, \alpha =2,\ldots ,k. \end{aligned}$$

(61)

TSVD is described as the sequential repetition of the above procedure for $ k=rank(\mathbf{X)}$ times till the residual matrix becomes $\mathbf{0}$; thus the matrix $\mathbf{X}$ can be written as

$$\begin{aligned} \mathbf{X}=\sum _{\alpha =1}^{k}\mathbf{a}_{\alpha }\mathbf{b}_{\alpha }^{\prime }/\lambda _{\alpha }. \end{aligned}$$

(62)

It is important to note that (62) has the same form as (4) and (46), but it can not be rewritten as (47).

Further, similar to (57), we have

$$\begin{aligned} \lambda _{\alpha }=\left| \left| \mathbf{a}_{\alpha }\right| \right| _{1}=\left| \left| \mathbf{b}_{\alpha }\right| \right| _{1} \quad \text {for} \quad \alpha =1,\ldots ,k. \end{aligned}$$

(63)

But the dispersion measures $\lambda _{\alpha }$’s in (63) will not satisfy (49), because the Pythagorean theorem is not satisfied in $L_{1}.$

In TSVD, the optimization problem (50), (51) or (52) can be accomplished by two algorithms. The first one is based on complete enumeration (53); this can be applied, with the present state of desktop computing power, say, if $ min(I,J)\simeq 25.$ The second one is based on iterating the transition formulas (56), (57) and (58), similar to Wold (1966) NIPALS (nonlinear iterative partial alternating least squares) algorithm, also named criss-cross regression by Gabriel and Zamir (1979). It is easy to show that this is also an ascent algorithm. The criss-cross algorithm is nonlinear and can be summarized in the following way, where $\mathbf{b}$ is a starting value:

Step 1: $\mathbf{u=}sgn\mathbf{(b)}$, $\mathbf{a=Xu}$ and $\lambda (\mathbf{u)} =\left| \left| \mathbf{Xu}\right| \right| _{1};$

Step 2: $\mathbf{v=}sgn\mathbf{(a),}$ $\mathbf{b=X}^{\prime }\mathbf{v}$ and $\lambda ( \mathbf{v)}=\left| \left| \mathbf{X}^{\prime }\mathbf{v}\right| \right| _{1};$

Step 3: If $\lambda (\mathbf{v)-}\lambda (\mathbf{u)>}0\mathbf{,}$ go to Step 1; otherwise, stop.

This is an ascent algorithm; that is, it increases the value of the objective function $\lambda $ at each iteration. The convergence of the algorithm is superlinear (very fast, at most two iterations); however it could converge to a local maximum; so we restart the algorithm $I$ times using each row of $\mathbf{X}$ as a starting value. The iterative algorithm is statistically consistent in the sense that as the sample size increases there will be some observations in the direction of the principal axes, so the algorithm will find the optimal solution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choulakian, V., Simonetti, B. & Pham Gia, T. Some new aspects of taxicab correspondence analysis. Stat Methods Appl 23, 401–416 (2014). https://doi.org/10.1007/s10260-014-0259-6

Download citation

Accepted: 20 February 2014
Published: 08 March 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s10260-014-0259-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Some new aspects of taxicab correspondence analysis

Abstract

Access this article

Similar content being viewed by others

Multiple correspondence analysis: one only or several techniques?

Joint Correspondence Analysis Versus Multiple Correspondence Analysis: A Solution to an Undetected Problem

Toward Robust and Fast Two-Dimensional Linear Discriminant Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Singular value decomposition

1.2 Taxicab singular value decomposition

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Some new aspects of taxicab correspondence analysis

Abstract

Access this article

Similar content being viewed by others

Multiple correspondence analysis: one only or several techniques?

Joint Correspondence Analysis Versus Multiple Correspondence Analysis: A Solution to an Undetected Problem

Toward Robust and Fast Two-Dimensional Linear Discriminant Analysis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Singular value decomposition

1.2 Taxicab singular value decomposition

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation