Abstract
We focus on inference about high-dimensional mean vectors when the sample size is much fewer than the dimension. Such data situation occurs in many areas of modern science such as genetic microarrays, medical imaging, text recognition, finance, chemometrics, and so on. First, we give a given-radius confidence region for mean vectors. This inference can be utilized as a variable selection of high-dimensional data. Next, we give a given-width confidence interval for squared norm of mean vectors. This inference can be utilized in a classification procedure of high-dimensional data. In order to assure a prespecified coverage probability, we propose a two-stage estimation methodology and determine the required sample size for each inference. Finally, we demonstrate how the new methodologies perform by using a microarray data set.
Similar content being viewed by others
References
Ahn J, Marron JS, Muller KM, Chi Y-Y (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94:760–766
Aoshima M (2005) Statistical inference in two-stage sampling. Trans Am Math Soc 215:125–145
Aoshima M, Mukhopadhyay N (1998) Fixed-width simultaneous confidence intervals for multinormal means in several intraclass correlation models. J Multivar Anal 66(1):46–63
Aoshima M, Takada Y (2004) Asymptotic second-order efficiency for multivariate two-stage estimation of a linear function of normal mean vectors. Seq Anal 23(3):333–353
Aoshima M, Takada Y, Srivastava MS (2002) A two-stage procedure for estimating a linear function of k multinormal mean vectors when covariance matrices and unknown. J Stat Plan Inference 100:109–119
Aoshima M, Yata K (2010) Asymptotic second-order consistency for two-stage estimation methodologies and its applications. Ann Inst Stat Math 62:571–600
Aoshima M, Yata K (2011) Two-stage procedures for high-dimensional data. Seq Anal (Editor’s special invited paper), to appear
Bai Z, Sarandasa H (1996) Effect of high dimension: by an example of a two sample problem. Stat Sin 6:311–329
Bradley RC (2005) Basic properties of strong mixing conditions. A survey and some open questions. Probab Surv 2:107–144 (electronic)
Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103:2771–2778
Ghosh M, Mukhopadhyay N, Sen PK (1997) Sequential estimation. Wiley, New York
Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc Ser B 67:427–444
Kolmogorov AN, Rozanov YA (1960) On strong mixing conditions for stationary Gaussian processes. Theory Probab Appl 5:204–208
Mukhopadhyay N, Duggan WT (1997) Can a two-stage procedure enjoy second-order properties? Sankhyā Ser A 59:435–448
Mukhopadhyay N, Duggan WT (1999) On a two-stage procedure having second-order properties with applications. Ann Inst Stat Math 51:621–636
Pollard KS, Dudoit S, van der Laan MJ (2005) Multiple testing procedures: R multitest package and applications to genomics. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New York, pp 249–271
Srivastava MS (2005) Some tests concerning the covariance matrix in high dimensional data. J Jpn Stat Soc 35:251–272
Stein C (1945) A two-sample test for a linear hypothesis whose power is independent of the variance. Ann Math Stat 16:243–258
Yata K (2010) Effective two-stage estimation for a linear function of high-dimensional gaussian means. Seq Anal 29:463–482
Yata K, Aoshima M (2009a) Double shrink methodologies to determine the sample size via covariance structures. J Stat Plan Inference 139:81–99
Yata K, Aoshima M (2009b) PCA consistency for non-gaussian data in high dimension, low sample size context. Commun Stat, Theory Methods (Special issue honoring Zacks S, ed Mukhopadhyay N) 38:2634–2652.
Yata K, Aoshima M (2010a) Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix. J Multivar Anal 101:2060–2077
Yata K, Aoshima M (2010b) Intrinsic dimensionality estimation of high dimension, low sample size data with d-asymptotics. Commun Stat, Theory Method (Special issue honoring Akahira M, ed Aoshima M) 39:1511–1521.
Yata K, Aoshima M (2011) Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. J Mult Anal, revised
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yata, K., Aoshima, M. Inference on High-Dimensional Mean Vectors with Fewer Observations Than the Dimension. Methodol Comput Appl Probab 14, 459–476 (2012). https://doi.org/10.1007/s11009-011-9233-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-011-9233-z