Skip to main content
Log in

Inference on High-Dimensional Mean Vectors with Fewer Observations Than the Dimension

  • Published:
Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Abstract

We focus on inference about high-dimensional mean vectors when the sample size is much fewer than the dimension. Such data situation occurs in many areas of modern science such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. First, we give a given-radius confidence region for mean vectors. This inference can be utilized as a variable selection of high-dimensional data. Next, we give a given-width confidence interval for squared norm of mean vectors. This inference can be utilized in a classification procedure of high-dimensional data. In order to assure a prespecified coverage probability, we propose a two-stage estimation methodology and determine the required sample size for each inference. Finally, we demonstrate how the new methodologies perform by using a microarray data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahn J, Marron JS, Muller KM, Chi Y-Y (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94:760–766

    Article  MathSciNet  MATH  Google Scholar 

  • Aoshima M (2005) Statistical inference in two-stage sampling. Trans Am Math Soc 215:125–145

    MathSciNet  Google Scholar 

  • Aoshima M, Mukhopadhyay N (1998) Fixed-width simultaneous confidence intervals for multinormal means in several intraclass correlation models. J Multivar Anal 66(1):46–63

    Article  MathSciNet  MATH  Google Scholar 

  • Aoshima M, Takada Y (2004) Asymptotic second-order efficiency for multivariate two-stage estimation of a linear function of normal mean vectors. Seq Anal 23(3):333–353

    Article  MathSciNet  MATH  Google Scholar 

  • Aoshima M, Takada Y, Srivastava MS (2002) A two-stage procedure for estimating a linear function of k multinormal mean vectors when covariance matrices and unknown. J Stat Plan Inference 100:109–119

    Article  MathSciNet  MATH  Google Scholar 

  • Aoshima M, Yata K (2010) Asymptotic second-order consistency for two-stage estimation methodologies and its applications. Ann Inst Stat Math 62:571–600

    Article  MathSciNet  Google Scholar 

  • Aoshima M, Yata K (2011) Two-stage procedures for high-dimensional data. Seq Anal (Editor’s special invited paper), to appear

  • Bai Z, Sarandasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329

    MATH  Google Scholar 

  • Bradley RC (2005) Basic properties of strong mixing conditions. A survey and some open questions. Probab Surv 2:107–144 (electronic)

    MATH  Google Scholar 

  • Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778

    Article  Google Scholar 

  • Ghosh M, Mukhopadhyay N, Sen PK (1997) Sequential estimation. Wiley, New York

    Book  MATH  Google Scholar 

  • Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc Ser B 67:427–444

    Article  MathSciNet  MATH  Google Scholar 

  • Kolmogorov AN, Rozanov YA (1960) On strong mixing conditions for stationary Gaussian processes. Theory Probab Appl 5:204–208

    Article  MathSciNet  Google Scholar 

  • Mukhopadhyay N, Duggan WT (1997) Can a two-stage procedure enjoy second-order properties? Sankhyā Ser A 59:435–448

    MathSciNet  MATH  Google Scholar 

  • Mukhopadhyay N, Duggan WT (1999) On a two-stage procedure having second-order properties with applications. Ann Inst Stat Math 51:621–636

    Article  MathSciNet  MATH  Google Scholar 

  • Pollard KS, Dudoit S, van der Laan MJ (2005) Multiple testing procedures: R multitest package and applications to genomics. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 249–271

    Chapter  Google Scholar 

  • Srivastava MS (2005) Some tests concerning the covariance matrix in high dimensional data. J Jpn Stat Soc 35:251–272

    Google Scholar 

  • Stein C (1945) A two-sample test for a linear hypothesis whose power is independent of the variance. Ann Math Stat 16:243–258

    Article  MATH  Google Scholar 

  • Yata K (2010) Effective two-stage estimation for a linear function of high-dimensional gaussian means. Seq Anal 29:463–482

    Article  MathSciNet  MATH  Google Scholar 

  • Yata K, Aoshima M (2009a) Double shrink methodologies to determine the sample size via covariance structures. J Stat Plan Inference 139:81–99

    Article  MathSciNet  MATH  Google Scholar 

  • Yata K, Aoshima M (2009b) PCA consistency for non-gaussian data in high dimension, low sample size context. Commun Stat, Theory Methods (Special issue honoring Zacks S, ed Mukhopadhyay N) 38:2634–2652.

    MathSciNet  MATH  Google Scholar 

  • Yata K, Aoshima M (2010a) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. J Multivar Anal 101:2060–2077

    Article  MathSciNet  MATH  Google Scholar 

  • Yata K, Aoshima M (2010b) Intrinsic dimensionality estimation of high dimension, low sample size data with d-asymptotics. Commun Stat, Theory Method (Special issue honoring Akahira M, ed Aoshima M) 39:1511–1521.

    MathSciNet  MATH  Google Scholar 

  • Yata K, Aoshima M (2011) Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. J Mult Anal, revised

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuyoshi Yata.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yata, K., Aoshima, M. Inference on High-Dimensional Mean Vectors with Fewer Observations Than the Dimension. Methodol Comput Appl Probab 14, 459–476 (2012). https://doi.org/10.1007/s11009-011-9233-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11009-011-9233-z

Keywords

Mathematics Subject Classifications (2010)

Navigation