Abstract
We propose a method for estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and low-rank structures in covariance matrix estimation is that we do not require the low-rank component to be sparse. A principled framework for estimating the compressed dimension using Stein’s Unbiased Risk Estimation theory is demonstrated. Experimental simulation results demonstrate the efficacy and scalability of our proposed approach.
Similar content being viewed by others
References
Anderson, T. (1984). Multivariate Statistical Analysis. Wiley, New York.
Bahmani, S. and Romberg, J. (2015). Sketching for simultaneously sparse and low-rank covariance matrices. In 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP). IEEE, pp. 357–360.
Bai, J., Shi, S., et al. (2011). Estimating high dimensional covariance matrices and its applications. Ann. Econ. Financ.12, 199–215.
Basu, S., Michailidis, G., et al. (2015). Regularized estimation in sparse high-dimensional time series models. Ann. Stat.43, 1535–1567.
Bhattacharya, A. and Dunson, D.B. (2011). Sparse bayesian infinite factor models. Biometrika98, 291.
Bickel, P.J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Stat., 2577–2604.
Bickel, P.J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Stat., 199–227.
Bioucas-Dias, J.M., Cohen, D. and Eldar, Y.C. (2014). Covalsa: Covariance estimation from compressive measurements using alternating minimization. In 2014 Proceedings of the 22nd European on Signal Processing Conference (EUSIPCO). IEEE, pp. 999–1003.
Bunea, F. and Xiao, L. (2015). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fpca. Bernoulli21, 1200–1230.
Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. and Kohane, I.S. (2000). Discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl. Acad. Sci.97, 12182–12186.
Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc.106, 672–684.
Cai, T.T., Ma, Z., Wu, Y., et al. (2013). Sparse pca: Optimal rates and adaptive estimation. Ann. Stat.41, 3074–3110.
Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields161, 781–815.
Cai, T.T., Ren, Z., Zhou, H.H., et al. (2016). Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Statist.10, 1–59.
Chen, X., Xu, M., Wu, W.B., et al. (2013). Covariance and precision matrix estimation for high-dimensional time series. Ann. Stat.41, 2994–3021.
Chen, Y., Chi, Y. and Goldsmith, A.J. (2015). Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory61, 4034–4059.
Dasarathy, G., Shah, P., Bhaskar, B.N. and Nowak, R.D. (2015). Sketching sparse matrices, covariances, and graphs via tensor products. IEEE Trans. Inf. Theory61, 1373–1388.
d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl.30, 56–66.
Davenport, M.A. and Romberg, J. (2016). An overview of low-rank matrix recovery from incomplete observations. IEEE J. Selected Topics Signal Process.10, 608–622.
Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Am. Stat. Assoc.81, 461–470.
Efron, B. (2004). The estimation of prediction error. J. Am. Stat. Assoc.99, 467.
Engle, R. and Watson, M. (1981). A one-factor multivariate time series model of metropolitan wage rates. J. Am. Stat. Assoc.76, 774–781.
Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Statist. Soc. Series B (Statist. Methodol.)75, 603–680.
Fan, J., Liao, Y. and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. Econ. J.19, C1–C32.
Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econ.147, 186–197.
Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivar. Anal.98, 227–255.
Furrer, R., Genton, M.G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat.15, 502–523.
Goldfarb, D. and Iyengar, G. (2003). Robust portfolio selection problems. Math. Oper. Res.28, 1–38.
Guhaniyogi, R. and Dunson, D.B. (2013). Bayesian compressed regression. arXiv:http://arXiv.org/abs/1303.0642.
Hamill, T.M., Whitaker, J.S. and Snyder, C. (2001). Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Weather. Rev.129, 2776–2790.
Hoff, P.D. (2009). A hierarchical eigenmodel for pooled covariance estimation. J. R. Statist. Soc. Series B (Statist. Methodol.)71, 971–992.
Houtekamer, P.L. and Mitchell, H.L. (2001). A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Weather. Rev.129, 123–137.
Huang, J.Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 85–98.
Ideker, T. and Sharan, R. (2008). Protein networks in disease. Genome Res.18, 644–652.
Jimenez-Sanchez, G., Childs, B. and Valle, D. (2001). Human disease genes. Nature409, 853–855.
Johnstone, I.M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist, 295–327.
Johnstone, I.M. and Lu, A.Y. (2012). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc.
Karoui, N.E. (2008a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Stat., 2717–2756.
Karoui, N.E. (2008b). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Stat., 2757–2790.
Kaufman, C.G., Schervish, M.J. and Nychka, D.W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Am. Stat. Assoc.103, 1545–1555.
Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist.37, 4254.
Ledoit, O., Santa-Clara, P. and Wolf, M. (2003). Flexible multivariate Garch modeling with an application to international stock markets. Rev. Econ. Statist.85, 735–747.
Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal.88, 365–411.
Li, D. and Zou, H. (2014). Asymptotic properties of sure information criteria for large covariance matrices. arXiv:http://arXiv.org/abs/1406.6514.
Li, D. and Zou, H. (2016). Sure information criteria for large covariance matrix estimation and their asymptotic properties. IEEE Trans. Inf. Theory62, 2153–2169.
Ma, Z., et al. (2013). Sparse principal component analysis and iterative thresholding. Ann. Stat.41, 772–801.
Marzetta, T.L., Tucci, G.H. and Simon, S.H. (2011). A random matrix-theoretic approach to handling singular covariance estimates. IEEE Trans. Inform. Theory57, 6256–6271.
McMurry, T.L. and Politis, D.N. (2010). Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. J. Time Ser. Anal.31, 471–482.
Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist, 1436–1462.
Pati, D., Bhattacharya, A., Pillai, N.S. and Dunson, D.B. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat.42, 1102–1130.
Pourahmadi, M. (2011). Covariance estimation: The glm and regularization perspectives. Stat. Sci., 369–387.
Qiao, H. and Pal, P. (2015). Generalized nested sampling for compressing low rank Toeplitz matrices. IEEE Signal Process. Lett.22, 1844–1848.
Ravikumar, P., Wainwright, M.J., Raskutti, G., Yu, B., et al. (2011). High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electron. J. Statist.5, 935–980.
Rothman, A.J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc.104, 177–186.
Schäfer, J., Strimmer, K., et al. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genetics Molecul. Biol.4, 32.
Shen, H. and Huang, J.Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal.99, 1015–1034.
Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. J. Am. Stat. Assoc.97, 1141–1153.
Stein, C., et al. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley symposium on mathematical statistics and probability, vol 1, pp. 197–206.
Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist., 1135–1151.
Werner, K., Jansson, M. and Stoica, P. (2008). On estimation of covariance matrices with kronecker product structure. IEEE Trans. Signal Process.56, 478–491.
Witten, D.M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008.
Wu, W.B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika90, 831–844.
Wu, W.B. and Pourahmadi, M. (2009). Banding sample autocovariance matrices of stationary processes. Stat. Sin., 1755–1768.
Xiao, H., Wu, W.B., et al. (2012). Covariance matrix estimation for stationary time series. Ann. Stat.40, 466–493.
Xiao, L. and Bunea, F. (2014). On the theoretic and practical merits of the banding estimator for large covariance matrices. arXiv:http://arXiv.org/abs/1402.0844.
Yi, F. and Zou, H. (2013). Sure-tuned tapering estimation of large covariance matrices. Comput. Statist. Data Anal.58, 339–351.
Yuan, M. and Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 19–35.
Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist.15, 265–286.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proof of Lemma 3.1
We obtain unbiased estimators of Eqs. 3.2, 3.3, and 3.4. Suppose \(\{X_{i}\}_{i = 1}^{n}\) is a random sample from N(μ,Σ) where, without loss of generality, we let μ = 0. We have,
and,
Equations A.1 and A.2 are obtained from Yi and Zou (2013). Solving (A.1) and (A.2) simultaneously, we obtain unbiased estimators for \({\sigma _{ij}^{2}}\) and σiiσjj below
An unbiased estimator of \({\text {Var}}({\widetilde {\sigma }}_{ij}^{s})\) is given by
Substituting (A.3) and (A.4) in (A.5) gives (3.2).
Equation 3.3 is obtained from Eq. 3.2 trivially. To obtain (3.4), note that
Substituting the unbiased estimators of \({\sigma }_{ij}^{2}\) and σiiσjj in Eq. A.6, obtained from Eqs. A.3 and A.4, gives us Eq. 3.4. This completes the proof.
Appendix B: Proof of Theorem 3.2
We analyze each term in Eq. 3.1 one at a time. Consider \(\|\widehat {{\Sigma }}_{\text {CD}}(k) - {\widehat {{\Sigma }}}\|_{F}^{2}\), a natural unbiased estimator of \({\mathbb {E}}\|\widehat {{\Sigma }}_{\text {CD}}(k) - {\widehat {{\Sigma }}}\|_{F}^{2}\). Then
where (B.1) is obtained by noting that \({\displaystyle \sum \limits _{i\ne j}^{p}}{\widetilde {\sigma }}_{ij}^{2} = {\displaystyle \sum \limits _{i = 1}^{p}}{\displaystyle \sum \limits _{\underset {j \ne i}{j = 1}}^{p}}{\widetilde {\sigma }}_{ij}^{2} = \|{\widetilde {{\Sigma }}}\circ {\widetilde {{\Sigma }}}\|_{F} \;- \;\text {Tr}({\widetilde {{\Sigma }}}\circ {\widetilde {{\Sigma }}})\). Consider optimism, the third term on the right hand side of Eq. 3.1. Then
Using Lemma 3.1, we have
where (B.3) is obtained by writing \({\displaystyle \sum \limits _{i = 1}^{p}}{\widetilde {\sigma }}_{ii}{\displaystyle \sum \limits _{\underset {j \ne i}{j = 1}}^{p}}{\widetilde {\sigma }}_{jj} = {{\text {Tr}}({\widetilde {{\Sigma }}})}^{2} - {\text {Tr}}({\widetilde {{\Sigma }}} \circ {\widetilde {{\Sigma }}})\) and \({\displaystyle \sum \limits _{i = 1}^{p}}{\widetilde {\sigma }}_{ii}^{2} = {\text {Tr}}({\widetilde {{\Sigma }}} \circ {\widetilde {{\Sigma }}})\).
The proof is completed by combining (B.1) and (B.3) to obtain an unbiased estimator of the Frobenius risk of \({\widehat {{\Sigma }}}_{CD}(k)\).
Rights and permissions
About this article
Cite this article
Sabnis, G., Pati, D. & Bhattacharya, A. Compressed Covariance Estimation with Automated Dimension Learning. Sankhya A 81, 466–481 (2019). https://doi.org/10.1007/s13171-018-0134-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13171-018-0134-x