Sparse Principal Component Analysis with Missing Observations

Lounici, Karim

doi:10.1007/978-3-0348-0490-5_20

Karim Lounici⁵

Part of the book series: Progress in Probability ((PRPR,volume 66))

1498 Accesses
15 Citations

Abstract

In this paper, we study the problem of sparse Principal Component Analysis (PCA) in the high dimensional setting with missing observations. Our goal is to estimate the first principal component when we only have access to partial observations. Existing estimation techniques are usually derived for fully observed data sets and require a prior knowledge of the sparsity of the first principal component in order to achieve good statistical guarantees. Our contributions is essentially theoretical in nature. First, we establish the first information-theoretic lower bound for the sparse PCA problem with missing observations. Second, we study the properties of a BIC type estimator that does not require any prior knowledge on the sparsity of the unknown first principal component or any imputation of the missing observations and adapts to the unknown sparsity of the first principal component. Third, if the covariance matrix of interest admits a sparse first principal component and is in addition approximately low-rank, then we can derive a completely datadriven choice of the regularization parameter and the resulting BIC estimator will also enjoy optimal statistical performances (up to a logarithmic factor).

Mathematics Subject Classification (2010). Primary 62H12.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Q. Berthet and P. Rigollet. Optimal detection of sparse principal components in high dimension. ArXiv:1202.5070 (2012).
Google Scholar
A. d’Aspremont, F. Bach, and L. El Ghaoui. Optimal solutions for sparse principal component analysis. Journal of Machine Learning Research 9 (2008), 1269–1294.
Google Scholar
M. Hourani and I.M.M. El Emary. Microarray missing values imputation methods: Critical analysis review. Comput. Sci. Inf. Syst. 6(2) (2009), 165–190.
Article Google Scholar
I.M. Johnstone and A.Y. Lu. On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104(486) (2009), 682–693.
Article MathSciNet Google Scholar
I.T. Jolliffe, N.T. Trendafilov, and M. Uddin. A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12(3) (2003), 531–547.
Article MathSciNet Google Scholar
R. Jörnsten, H.Y. Wang, W.J. Welsh, and M. Ouyang. Dna microarray data imputation and significance analysis of differential expression. Bioinformatics 21(22) (2005), 4155–4161.
Article Google Scholar
M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre. Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11 (2010), 517–553.
MathSciNet MATH Google Scholar
K. Lounici. High dimensional covariance matrix estimation with missing observations. ArXiv:1201.2577 (2012).
Google Scholar
P. Massart. Concentration inequalities and model selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, with a foreword by Jean Picard.
Google Scholar
B. Moghaddam, Y. Weiss, and S. Avidan. Spectral bounds for sparse pca: Exact and greedy algorithms. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, 915–922. MIT Press, Cambridge, MA, 2006.
Google Scholar
B. Nadler. Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann. Statist. 36(6) (2008), 2791–2817.
Article MathSciNet MATH Google Scholar
D. Paul. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17(4) (2007), 1617–1642.
MathSciNet MATH Google Scholar
D. Paul and I. Johnstone. Sparse principal component analysis for high dimensional data. ArXiv:1202.1242 (2007).
Google Scholar
H. Shen and J.Z. Huang. Sparse principal component analysis via regularized low rank matrix approximation. Multivariate Anal. 99(6) (2008), 1015–1034.
Article MathSciNet MATH Google Scholar
A.B. Tsybakov. Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009. Revised and extended from the 2004 French original, translated by Vladimir Zaiats.
Google Scholar
R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. ArXiv:1011.3027v7 (2011).
Google Scholar
V.Q. Vu and J. Lei. Minimax rates of estimation for sparse pca in high dimensions. In 15th International Conference on Artificial Intelligence and Statistics (AISTATS) (2012).
Google Scholar
M. Zongming. Sparse principal component analysis and iterative thresholding. ArXiv:1112.2432 (2011).
Google Scholar
H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. J. Comput. Graph. Statist. 15(2) (2006), 265–286.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics, Georgia Institute of Technology, Atlanta, GA, 30332-0160, USA
Karim Lounici

Authors

Karim Lounici
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karim Lounici .

Editor information

Editors and Affiliations

Georgia Institute of Technology, Atlanta, 30332, USA
Christian Houdré
University of Delaware, Newark, 19716, USA
David M. Mason
University of Tennessee, Knoxville, 37934, USA
Jan Rosiński
University of Washington, Seattle, 98195, USA
Jon A. Wellner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lounici, K. (2013). Sparse Principal Component Analysis with Missing Observations. In: Houdré, C., Mason, D., Rosiński, J., Wellner, J. (eds) High Dimensional Probability VI. Progress in Probability, vol 66. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-0490-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-0348-0490-5_20
Published: 01 April 2013
Publisher Name: Birkhäuser, Basel
Print ISBN: 978-3-0348-0489-9
Online ISBN: 978-3-0348-0490-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics