Skip to main content
Log in

Consistency of test-based method for selection of variables in high-dimensional two-group discriminant analysis

  • Original Paper
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

This paper is concerned with selection of variables in two-group discriminant analysis with the same covariance matrix. We propose a test-based method (TM) drawing on the significance of each variable. Sufficient conditions for the test-based method to be consistent are provided when the dimension and the sample size are large. For the case that the dimension is larger than the sample size, a ridge-type method is proposed. Our results and tendencies therein are explored numerically through a Monte Carlo simulation. It is pointed that our selection method can be applied for high-dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Akaike, H. (1973). Information theory and an extension of themaximum likelihood principle. In B. N. Petrov & F. Csáki (Eds.), 2nd International Symposium on Information Theory (pp. 267–281). Budapest: Akadémiai Kiadó.

    Google Scholar 

  • Clemmensen, L., Hastie, T., Witten, D. M., & Ersbell, B. (2011). Sparse discriminant analysis. Technometrics, 53, 406–413.

    Article  MathSciNet  Google Scholar 

  • Fujikoshi, Y. (1985). Selection of variables in two-group discriminant analysis by error rate and Akaike’s information criteria. Journal of Multivariate Analysis, 17, 27–37.

    Article  MathSciNet  Google Scholar 

  • Fujikoshi, Y. (2000). Error bounds for asymptotic approximations of the linear discriminant function when the sample size and dimensionality are large. Journal of Multivariate Analysis, 73, 1–17.

    Article  MathSciNet  Google Scholar 

  • Fujikoshi, Y., & Sakurai, T. (2016). High-dimensional consistency of rank estimation criteria in multivariate linear model. Journal of Multivariate Analysis, 149, 199–212.

    Article  MathSciNet  Google Scholar 

  • Fujikoshi, Y., Ulyanov, V. V., & Shimizu, R. (2010). Multivariate statistics: high-dimensional and large-sample approximations. Hobeken, NJ: Wiley.

    Book  Google Scholar 

  • Fujikoshi, Y., Sakurai, T., & Yanagihara, H. (2014). Consistency of high-dimensional AIC-type and \(\text{ C }_p\)-type criteria in multivariate linear regression. Journal of Multivariate Analysis, 144, 184–200.

    Article  Google Scholar 

  • Hao, N., Dong, B. & Fan, J. (2015). Sparsifying the Fisher linear discriminant by rotation. Journal of the Royal Statistical Society: Series B, 77, 827–851.

    Article  MathSciNet  Google Scholar 

  • Hyodo, M., & Kubokawa, T. (2014). A variable selection criterion for linear discriminant rule and its optimality in high dimensional and large sample data. Journal of Multivariate Analysis, 123, 364–379.

    Article  MathSciNet  Google Scholar 

  • Ito, T. & Kubokawa, T. (2015). Linear ridge estimator of high-dimensional precision matrix using random matrix theory. Discussion Paper Series, CIRJE-F-995.

  • Kubokawa, T., & Srivastava, M. S. (2012). Selection of variables in multivariate regression models for large dimensions. Communication in Statistics-Theory and Methods, 41, 2465–2489.

    Article  MathSciNet  Google Scholar 

  • McLachlan, G. J. (1976). A criterion for selecting variables for the linear discriminant function. Biometrics, 32, 529–534.

    Article  MathSciNet  Google Scholar 

  • Nishii, R., Bai, Z. D., & Krishnaia, P. R. (1988). Strong consistency of the information criterion for model selection in multivariate analysis. Hiroshima Mathematical Journal, 18, 451–462.

    Article  MathSciNet  Google Scholar 

  • Rao, C. R. (1973). Linear statistical inference and its applications (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  • Sakurai, T., Nakada, T., & Fujikoshi, Y. (2013). High-dimensional AICs for selection of variables in discriminant analysis. Sankhya, Series A, 75, 1–25.

    Article  MathSciNet  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension od a model. Annals of Statistics, 6, 461–464.

    Article  MathSciNet  Google Scholar 

  • Tiku, M. (1985). Noncentral chi-square distribution. In S. Kotz & N. L. Johnson (Eds.), Encyclopedia of Statistical Sciences, vol. 6 (pp. 276–280). New York: Wiely.

    Google Scholar 

  • Van Wieringen, W. N., & Peeters, C. F. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Computational Statistics & Data Analysis, 103, 284–303.

    Article  MathSciNet  Google Scholar 

  • Witten, D. W., & Tibshirani, R. (2011). Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society: Series B, 73, 753–772.

    Article  MathSciNet  Google Scholar 

  • Yamada, T., Sakurai, T. & Fujikoshi, Y. (2017). High-dimensional asymptotic results for EPMCs of W- and Z- rules. Hiroshima Statistical Research Group, 17–12.

  • Yanagihara, H., Wakaki, H., & Fujikoshi, Y. (2015). A consistency property of the AIC for multivariate linear models when the dimension and the sample size are large. Electronic Journal of Statistics, 9, 869–897.

    Article  MathSciNet  Google Scholar 

  • Zhao, L. C., Krishnaiah, P. R., & Bai, Z. D. (1986). On determination of the number of signals in presence of white noise. Journal of Multivariate Analysis, 20, 1–25.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank two referees for careful reading of our manuscript and many helpful comments which improved the presentation of this paper. The first author’s research is partially supported by the Ministry of Education, Science, Sports, and Culture, a Grant-in-Aid for Scientific Research (C), 16K00047, 2016–2018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasunori Fujikoshi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs of Theorems 1, 2 and 3

Appendix: Proofs of Theorems 12 and 3

1.1 Preliminary lemmas

First, we study distributional results related to the test statistics \(\mathrm{T}_{d,i}\) in (5). For a notational simplicity, consider a decomposition of \({\varvec{y}}=({\varvec{y}}_1', {\varvec{y}}_2')', \ {\varvec{y}}_1; \ p_1\times 1, \ {\varvec{y}}_2; \ p_2 \times 1\). Similarly, decompose \(\varvec{\beta }=(\varvec{\beta }_1', \varvec{\beta }_2')'\), and

$$\begin{aligned} {\mathsf {S}}= \left( \begin{array}{cc} {\mathsf {S}}_{11} &{} {\mathsf {S}}_{12} \\ {\mathsf {S}}_{21} &{} {\mathsf {S}}_{22} \end{array} \right) , \quad {\mathsf {S}}_{12}; \ p_1 \times p_2. \end{aligned}$$

Let \(\lambda\) be the likelihood ratio criterion for testing a hypothesis \(\varvec{\beta }_2={\varvec{0}}\), then

$$\begin{aligned} -2 \log \lambda = n \log \left\{ 1 + \frac{g^2 (D^2 - D_1^2)}{n-2 + g^2 D_1^2}, \right\} \end{aligned}$$
(12)

where \(g=\left\{ (n_1n_2)/n\right\} ^{1/2}\). The following lemma (see, e.g., Fujikoshi et al. 2010) is used.

Lemma 1

Let \(D_1\) and D be the sample Mahalanobis distances based on \({\varvec{y}}_1\) and \({\varvec{y}}\), respectively. Let \(D_{2\cdot 1}^2=D^2-D_1^2\). Similarly, the corresponding population quantities are expressed as \(\varDelta _1\), \(\varDelta\) and \(\varDelta _{2\cdot 1}^2\). Then, it holds that

$$\begin{aligned}&\mathrm{(1)} \ D_1^2=(n-2)g^{-2}R, \quad R=\chi _{p_1}^2(g^2\varDelta _1^2)\left\{ \chi _{n-p_1-1}^2\right\} ^{-1}. \\&\mathrm{(2)} \ D_{2\cdot 1}^2 = (n-2) g^{-2} \chi _{p_2}^2 \left( g^2 \varDelta _{2\cdot 1}^2 \cdot \frac{1}{1+R} \right) \left\{ \chi _{n-p-1}^2\right\} ^{-1} (1+R).\\&\mathrm{(3)} \ \frac{g^2 (D^2 - D_1^2)}{n-2 + g^2 D_1^2} = \chi _{p_2}^2 ( g^2 \varDelta _{2\cdot 1}^2 (1+R)^{-1})\{\chi _{n-p-1}^2\}^{-1} \end{aligned}$$

Here, \(\chi _{p_1}^2(\cdot )\), \(\chi _{n-p_1-1}^2\), \(\chi _{p_2}^2(\cdot )\), and \(\chi _{n-p-1}^2\) are independent Chi-square variates.

Related to the conditional distribution of the right-hand side of (3) with \(p_2=1\) and \(m=n-p-1\) in Lemma 1, consider the random variable defined by

$$\begin{aligned} V=\frac{\chi _1^2(\lambda ^2)}{\chi _m^2}-\frac{1+\lambda ^2}{m-2}, \end{aligned}$$
(13)

where \(\chi _1^2(\lambda ^2)\) and \(\chi _m^2\) are independent. We can express V as

$$\begin{aligned} V=U_1U_2+(m-2)^{-1}U_1+(1+\lambda ^2)U_2, \end{aligned}$$
(14)

in terms of the centralized variables \(U_1\) and \(U_2\) defined by

$$\begin{aligned} U_1=\chi _1^2(1+\lambda ^2) -(1+\lambda ^2), \quad U_2=\frac{1}{\chi _m^2}-\frac{1}{m-2}. \end{aligned}$$
(15)

It is well known (see, e.g., Tiku 1985) that

$$\begin{aligned}&\mathrm{E}(U_1)=0,\\&\mathrm{E}(U_1^2)=2(1+2\lambda ^2), \\&\mathrm{E}(U_1^3)=8(1+3\lambda ^2), \\&\mathrm{E}(U_1^4)=48(1+4\lambda ^2)+4(1+3\lambda ^2)^2. \end{aligned}$$

Furthermore

$$\begin{aligned} \mathrm{E}\left( U_2^k\right) =&\sum _{i=0}^k {}_k C_i \mathrm{E}\left\{ \left( \frac{1}{\chi _m^2}\right) ^i\right\} \left( -\frac{1}{m-2}\right) ^{k-i}\\ =&\sum _{i=1}^k {}_kC_i\frac{1}{(m-2) \cdots (m-2i)}\left( -\frac{1}{m-2}\right) ^{k-i} +\left( -\frac{1}{m-2}\right) ^k. \end{aligned}$$

These give the first four moments of V. In particular, we use the following results.

Lemma 2

Let V be the random variable defined by (14). Suppose that \(\lambda ^2=\mathrm{O}(m)\). Then

$$\begin{aligned}&\mathrm{E}(V)=0,\quad \mathrm{E}(V^2)=\frac{2(m-3-2\lambda ^2+\lambda ^4)}{(m-2)^2(m-4)}=\mathrm{O}(m^{-1}), \\&\mathrm{E}(V^3)=\mathrm{O}(m^{-2}), \quad \mathrm{E}(V^4)=\mathrm{O}(m^{-2}). \end{aligned}$$

1.2 Proof of Theorem 1

First, we show “\(\mathrm{[F1]} \rightarrow 0\)”. Let \(i \in j_*\). Then, \((-i) \notin {{{\mathcal {F}}}}_+\), and hence

$$\begin{aligned} \varDelta _{(-i)}^2 < \varDelta ^2, \quad \varDelta _{ \{i\} \cdot (-i)}^2 > 0. \end{aligned}$$

Using (12) and Lemma 1(3)

$$\begin{aligned} \mathrm{T}_{d, i}= n \log \left\{ 1 + \frac{\chi _1^2 ( g^2 \varDelta _{\{i\} \cdot (-i)}^2 (1+R_i)^{-1})}{\chi _{n-p-1}^2} \right\} - d, \end{aligned}$$

where \(R_i = \chi _{p-1}^2 (g^2 \varDelta _{(-i)}^2)\left\{ \chi _{n-p}^2\right\} ^{-1}\). Here, since \(j_*\) is finite, by showing

$$\begin{aligned} \mathrm{T}_{d, i} \overset{p}{\rightarrow } t_i > 0 \quad \text {or} \quad \mathrm{T}_{d, i} \overset{p}{\rightarrow } \infty , \end{aligned}$$

we obtain \(P (\mathrm{T}_{d, i} \le 0) \rightarrow 0\), and hence, “\(\mathrm{[F1]} \rightarrow 0\)”. It is easily seen that

$$\begin{aligned} R_i \sim \frac{p+g^2 \varDelta _{(-i)}^2}{n-p}, \end{aligned}$$

where \(g=\{(n_1n_2)/n\}^{1/2}\) and “ \(\sim\)” means asymptotically equivalent, and hence

$$\begin{aligned} (1+R_i)^{-1} \sim \frac{n-p}{n+g^2\varDelta _{(-i)}^2}. \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} \frac{1}{n} \mathrm{T}_{d, i} \rightarrow \lim \log \left( 1 + \frac{g^2 \varDelta _{\{i\} \cdot (-i)}^2}{n + g^2 \varDelta _{(-i)}^2} \right) > 0, \end{aligned}$$

which implies our assertion.

Next, consider to show “\(\mathrm{[F2]} \rightarrow 0\)”. For any \(i \notin j_*\), \(\varDelta ^2=\varDelta _{(-i)}^2\). Therefore, using Lemma 1(3), we have

$$\begin{aligned} \mathrm{T}_{d, i}= n \log \left( 1 + \frac{\chi _1^2}{\chi _{n-p-1}^2} \right) - d, \end{aligned}$$
(16)

whose distribution does not depend on i. Here, \(\chi _1^2\) and \(\chi _{n-p-1}^2\) are independent Chi-square variates with 1 and \(n-p-1\) degrees of freedom. This implies that

$$\begin{aligned} \mathrm{T}_{d,i}> 0 \Leftrightarrow \frac{\chi _1^2}{\chi _{n-p-1}^2} > e^{d/n} - 1. \end{aligned}$$

Noting that \(\mathrm{E}[ \chi _1^2/ \chi _{n-p-1}^2 ] = (n-p-3)^{-1}\), let

$$\begin{aligned} U = \frac{\chi _1^2}{\chi _{n-p-1}^2} - \frac{1}{n-p-3}. \end{aligned}$$

Then, since \(e^{d/n} - 1 - \frac{1}{n-p-3}>h\)

$$\begin{aligned} P ( \mathrm{T}_{d,i}> 0 )&= P \left( U> e^{d/n} - 1 - \frac{1}{n-p-3} \right) \\&\le P \left( U > h \right) . \end{aligned}$$

Furthermore, using Markov inequality, we have

$$\begin{aligned} P( \mathrm{T}_{d,i}> 0 )&\le P(|U| > h)\\&\le h^{-2\ell } \mathrm{E}(U^{2\ell }), \quad \ell = 1, 2, \ldots \end{aligned}$$

Furthermore, it is easily seen that

$$\begin{aligned} \mathrm{E}( U^{2\ell } ) = \mathrm{O}(n^{-2\ell }), \end{aligned}$$

using, e.g., Theorem 16.2.2 in Fujikoshi et al. (2010). When \(h = O(n^{-a})\)

$$\begin{aligned} h^{-2\ell } \mathrm{E}( U^{2\ell } ) = \mathrm{O}(n^{-2(1-a)\ell }). \end{aligned}$$

Choosing \(\ell\) such that \(\ell > (1-a)^{-1}\), we have “\(\mathrm{[F2]} \rightarrow 0\)”.

1.3 Proof of Theorem 2

First, note that in the proof of “\(\mathrm{[F2]} \rightarrow 0\)” in Theorem 1, Assumption A3 is not used. This implies the assertion “\(\mathrm{[F2]} \rightarrow 0\)” in Theorem 2.

Now, we consider to show “\(\mathrm{[F1]} \rightarrow 0\)” when \(p_*=\mathrm{O}(p)\) and \(\varDelta ^2=\mathrm{O}(p)\). In this case, \(p_*\) tends to \(\infty\). Based on the proof in Theorem 1, we can express \(\mathrm{T}_{d, i}\) for \(i \in j_*\) as

$$\begin{aligned} \mathrm{T}_{d, i}= n \log \left\{ 1 + \frac{\chi _1^2 ({\widehat{\lambda }}_i^2)}{\chi _{n-p-1}^2} \right\} - d, \end{aligned}$$

where \({\widehat{\lambda }}_i^2=g^2\varDelta _{\{i\} \cdot (-i)}^2 (1+R_i)^{-1}\) and \(R_i = \chi _{p-1}^2 (g^2 \varDelta _{(-i)}^2)\left\{ \chi _{n-p}^2\right\} ^{-1}\). Note that \(\chi _1^2\) and \(\chi _{n-p-1}^2\) are independent of \(R_i\), and hence of \({\widehat{\lambda }}_i^2\). Then, we have

$$\begin{aligned} P(T_{d, i} \le 0)=P({\widehat{V}} \le {\widehat{h}}), \end{aligned}$$
(17)

where

$$\begin{aligned} {\widehat{V}}&= \frac{\chi _1^2 ({\widehat{\lambda }}_i^2)}{\chi _{n-p-1}^2}- \frac{1+{\widehat{\lambda }}_i^2}{n-p-3}, \\ {\widehat{h}}&=e^{d/n}-1-(1+{\widehat{\lambda }}_i^2)/(n-p-3). \end{aligned}$$

Considering the conditional distribution of the right-hand side in (17), we have

$$\begin{aligned} P({\widehat{V}} \le {\widehat{h}})= \mathrm{E}_{{\widehat{\lambda }}^2_i}\left\{ Q({\widehat{\lambda }}^2_i)\right\} , \end{aligned}$$
(18)

where

$$\begin{aligned} Q(\lambda ^2_i)&=P({\widehat{V}} \le {\widehat{h}} \ | \ {\widehat{\lambda }}_i^2=\lambda _i^2) \\&=P({\widetilde{V}} \le {\widetilde{h}}). \end{aligned}$$

Here

$$\begin{aligned} {\widetilde{V}}&= \frac{\chi _1^2 (\lambda _i^2)}{\chi _{n-p-1}^2}- \frac{1+\lambda _i^2}{n-p-3}, \\ {\widetilde{h}}&=e^{d/n}-1-(1+\lambda _i^2)/(n-p-3). \end{aligned}$$

Using Assumption A6, it can be seen that

$$\begin{aligned} {\widehat{\lambda }}_i^2 \sim (1-c)c^{-1}\theta _i^2p \equiv \lambda _{i0}^2,\quad \mathrm{and} \quad {\widehat{\lambda }}_i^2 = \mathrm{O}(p^b). \end{aligned}$$

Now, we consider the probability \(P({\widetilde{V}} \le {\widetilde{h}})\) when \(\lambda _i^2=\lambda _{i0}^2\). From assumption \(r < b\), for large n, \({\widetilde{h}} < 0\). Therefor, for large n, we have

$$\begin{aligned} P({\widetilde{V}} \le {\widetilde{h}})&\le P(|{\widetilde{V}}| \ge |{\widetilde{h}}|) \\&\le |{\widetilde{h}}|^{-4} \mathrm{E}({\widetilde{V}}^4). \end{aligned}$$

From Lemma 2, \(\mathrm{E}({\widetilde{V}}^4)=\mathrm{O}(n^{-2})\). Noting that \({\widetilde{h}}=\mathrm{O}(n^{-(1-b)})\), we have

$$\begin{aligned} |{\widetilde{h}}|^{-4} \mathrm{E}({\widetilde{V}}^4)=\mathrm{O}(n^{4(1-b)-2}), \end{aligned}$$

whose order is \(\mathrm{O}(n^{-(1+3\delta )})\) if we choose b as \(b > (3/4)(1+\delta )\). Therefore, we have \(P(\mathrm{T}_{d,i} \le 0)=\mathrm{O}(n^{-(1+3\delta )})\), which implies “\(\mathrm{[F1]} \rightarrow 0\)”.

1.4 Proof of Theorem 3

The assertion “\(\mathrm{[F1]} \rightarrow 0\)” follows from the proof of “\(\mathrm{[F1]} \rightarrow 0\)” in Theorem 1. For a proof of “\(\mathrm{[F2]} \rightarrow 0\)”, it is enough to show that

$$\begin{aligned} \mathrm{for} \ i \notin j_*, \quad \mathrm{T}_{d,i} \rightarrow \ -\infty . \end{aligned}$$

since p has been fixed. From (16), the limiting distribution of \(\mathrm{T}_{d,i}\) is “\(\chi _1^2-d\)”. This implies “\(\mathrm{[F2]} \rightarrow 0\)”.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fujikoshi, Y., Sakurai, T. Consistency of test-based method for selection of variables in high-dimensional two-group discriminant analysis. Jpn J Stat Data Sci 2, 155–171 (2019). https://doi.org/10.1007/s42081-019-00032-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-019-00032-4

Keywords

Navigation