Skip to main content
Log in

Compressed Covariance Estimation with Automated Dimension Learning

  • Published:
Sankhya A Aims and scope Submit manuscript

Abstract

We propose a method for estimating a covariance matrix that can be represented as a sum of a low-rank matrix and a diagonal matrix. The proposed method compresses high-dimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and low-rank structures in covariance matrix estimation is that we do not require the low-rank component to be sparse. A principled framework for estimating the compressed dimension using Stein’s Unbiased Risk Estimation theory is demonstrated. Experimental simulation results demonstrate the efficacy and scalability of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anderson, T. (1984). Multivariate Statistical Analysis. Wiley, New York.

    Google Scholar 

  • Bahmani, S. and Romberg, J. (2015). Sketching for simultaneously sparse and low-rank covariance matrices. In 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP). IEEE, pp. 357–360.

  • Bai, J., Shi, S., et al. (2011). Estimating high dimensional covariance matrices and its applications. Ann. Econ. Financ.12, 199–215.

    Google Scholar 

  • Basu, S., Michailidis, G., et al. (2015). Regularized estimation in sparse high-dimensional time series models. Ann. Stat.43, 1535–1567.

    MathSciNet  MATH  Google Scholar 

  • Bhattacharya, A. and Dunson, D.B. (2011). Sparse bayesian infinite factor models. Biometrika98, 291.

    MathSciNet  MATH  Google Scholar 

  • Bickel, P.J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Stat., 2577–2604.

    MathSciNet  MATH  Google Scholar 

  • Bickel, P.J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Stat., 199–227.

    MathSciNet  MATH  Google Scholar 

  • Bioucas-Dias, J.M., Cohen, D. and Eldar, Y.C. (2014). Covalsa: Covariance estimation from compressive measurements using alternating minimization. In 2014 Proceedings of the 22nd European on Signal Processing Conference (EUSIPCO). IEEE, pp. 999–1003.

  • Bunea, F. and Xiao, L. (2015). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fpca. Bernoulli21, 1200–1230.

    MathSciNet  MATH  Google Scholar 

  • Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R. and Kohane, I.S. (2000). Discovering functional relationships between rna expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl. Acad. Sci.97, 12182–12186.

    Google Scholar 

  • Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc.106, 672–684.

    MathSciNet  MATH  Google Scholar 

  • Cai, T.T., Ma, Z., Wu, Y., et al. (2013). Sparse pca: Optimal rates and adaptive estimation. Ann. Stat.41, 3074–3110.

    MathSciNet  MATH  Google Scholar 

  • Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields161, 781–815.

    MathSciNet  MATH  Google Scholar 

  • Cai, T.T., Ren, Z., Zhou, H.H., et al. (2016). Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Statist.10, 1–59.

    MathSciNet  MATH  Google Scholar 

  • Chen, X., Xu, M., Wu, W.B., et al. (2013). Covariance and precision matrix estimation for high-dimensional time series. Ann. Stat.41, 2994–3021.

    MathSciNet  MATH  Google Scholar 

  • Chen, Y., Chi, Y. and Goldsmith, A.J. (2015). Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory61, 4034–4059.

    MathSciNet  MATH  Google Scholar 

  • Dasarathy, G., Shah, P., Bhaskar, B.N. and Nowak, R.D. (2015). Sketching sparse matrices, covariances, and graphs via tensor products. IEEE Trans. Inf. Theory61, 1373–1388.

    MathSciNet  MATH  Google Scholar 

  • d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl.30, 56–66.

    MathSciNet  MATH  Google Scholar 

  • Davenport, M.A. and Romberg, J. (2016). An overview of low-rank matrix recovery from incomplete observations. IEEE J. Selected Topics Signal Process.10, 608–622.

    Google Scholar 

  • Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Am. Stat. Assoc.81, 461–470.

    MathSciNet  MATH  Google Scholar 

  • Efron, B. (2004). The estimation of prediction error. J. Am. Stat. Assoc.99, 467.

    MATH  Google Scholar 

  • Engle, R. and Watson, M. (1981). A one-factor multivariate time series model of metropolitan wage rates. J. Am. Stat. Assoc.76, 774–781.

    Google Scholar 

  • Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Statist. Soc. Series B (Statist. Methodol.)75, 603–680.

    MathSciNet  MATH  Google Scholar 

  • Fan, J., Liao, Y. and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. Econ. J.19, C1–C32.

    MathSciNet  Google Scholar 

  • Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econ.147, 186–197.

    MathSciNet  MATH  Google Scholar 

  • Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivar. Anal.98, 227–255.

    MathSciNet  MATH  Google Scholar 

  • Furrer, R., Genton, M.G. and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat.15, 502–523.

    MathSciNet  Google Scholar 

  • Goldfarb, D. and Iyengar, G. (2003). Robust portfolio selection problems. Math. Oper. Res.28, 1–38.

    MathSciNet  MATH  Google Scholar 

  • Guhaniyogi, R. and Dunson, D.B. (2013). Bayesian compressed regression. arXiv:http://arXiv.org/abs/1303.0642.

  • Hamill, T.M., Whitaker, J.S. and Snyder, C. (2001). Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Weather. Rev.129, 2776–2790.

    Google Scholar 

  • Hoff, P.D. (2009). A hierarchical eigenmodel for pooled covariance estimation. J. R. Statist. Soc. Series B (Statist. Methodol.)71, 971–992.

    MathSciNet  MATH  Google Scholar 

  • Houtekamer, P.L. and Mitchell, H.L. (2001). A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Weather. Rev.129, 123–137.

    Google Scholar 

  • Huang, J.Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika, 85–98.

    MathSciNet  MATH  Google Scholar 

  • Ideker, T. and Sharan, R. (2008). Protein networks in disease. Genome Res.18, 644–652.

    Google Scholar 

  • Jimenez-Sanchez, G., Childs, B. and Valle, D. (2001). Human disease genes. Nature409, 853–855.

    Google Scholar 

  • Johnstone, I.M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist, 295–327.

    MathSciNet  MATH  Google Scholar 

  • Johnstone, I.M. and Lu, A.Y. (2012). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc.

  • Karoui, N.E. (2008a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Stat., 2717–2756.

  • Karoui, N.E. (2008b). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Stat., 2757–2790.

  • Kaufman, C.G., Schervish, M.J. and Nychka, D.W. (2008). Covariance tapering for likelihood-based estimation in large spatial data sets. J. Am. Stat. Assoc.103, 1545–1555.

    MathSciNet  MATH  Google Scholar 

  • Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist.37, 4254.

    MathSciNet  MATH  Google Scholar 

  • Ledoit, O., Santa-Clara, P. and Wolf, M. (2003). Flexible multivariate Garch modeling with an application to international stock markets. Rev. Econ. Statist.85, 735–747.

    Google Scholar 

  • Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal.88, 365–411.

    MathSciNet  MATH  Google Scholar 

  • Li, D. and Zou, H. (2014). Asymptotic properties of sure information criteria for large covariance matrices. arXiv:http://arXiv.org/abs/1406.6514.

  • Li, D. and Zou, H. (2016). Sure information criteria for large covariance matrix estimation and their asymptotic properties. IEEE Trans. Inf. Theory62, 2153–2169.

    MathSciNet  MATH  Google Scholar 

  • Ma, Z., et al. (2013). Sparse principal component analysis and iterative thresholding. Ann. Stat.41, 772–801.

    MathSciNet  MATH  Google Scholar 

  • Marzetta, T.L., Tucci, G.H. and Simon, S.H. (2011). A random matrix-theoretic approach to handling singular covariance estimates. IEEE Trans. Inform. Theory57, 6256–6271.

    MathSciNet  MATH  Google Scholar 

  • McMurry, T.L. and Politis, D.N. (2010). Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. J. Time Ser. Anal.31, 471–482.

    MathSciNet  MATH  Google Scholar 

  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist, 1436–1462.

    MathSciNet  MATH  Google Scholar 

  • Pati, D., Bhattacharya, A., Pillai, N.S. and Dunson, D.B. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat.42, 1102–1130.

    MathSciNet  MATH  Google Scholar 

  • Pourahmadi, M. (2011). Covariance estimation: The glm and regularization perspectives. Stat. Sci., 369–387.

    MathSciNet  MATH  Google Scholar 

  • Qiao, H. and Pal, P. (2015). Generalized nested sampling for compressing low rank Toeplitz matrices. IEEE Signal Process. Lett.22, 1844–1848.

    Google Scholar 

  • Ravikumar, P., Wainwright, M.J., Raskutti, G., Yu, B., et al. (2011). High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence. Electron. J. Statist.5, 935–980.

    MathSciNet  MATH  Google Scholar 

  • Rothman, A.J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc.104, 177–186.

    MathSciNet  MATH  Google Scholar 

  • Schäfer, J., Strimmer, K., et al. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genetics Molecul. Biol.4, 32.

    MathSciNet  Google Scholar 

  • Shen, H. and Huang, J.Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal.99, 1015–1034.

    MathSciNet  MATH  Google Scholar 

  • Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. J. Am. Stat. Assoc.97, 1141–1153.

    MathSciNet  MATH  Google Scholar 

  • Stein, C., et al. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley symposium on mathematical statistics and probability, vol 1, pp. 197–206.

  • Stein, C.M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist., 1135–1151.

    MathSciNet  MATH  Google Scholar 

  • Werner, K., Jansson, M. and Stoica, P. (2008). On estimation of covariance matrices with kronecker product structure. IEEE Trans. Signal Process.56, 478–491.

    MathSciNet  MATH  Google Scholar 

  • Witten, D.M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008.

  • Wu, W.B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika90, 831–844.

    MathSciNet  MATH  Google Scholar 

  • Wu, W.B. and Pourahmadi, M. (2009). Banding sample autocovariance matrices of stationary processes. Stat. Sin., 1755–1768.

  • Xiao, H., Wu, W.B., et al. (2012). Covariance matrix estimation for stationary time series. Ann. Stat.40, 466–493.

    MathSciNet  MATH  Google Scholar 

  • Xiao, L. and Bunea, F. (2014). On the theoretic and practical merits of the banding estimator for large covariance matrices. arXiv:http://arXiv.org/abs/1402.0844.

  • Yi, F. and Zou, H. (2013). Sure-tuned tapering estimation of large covariance matrices. Comput. Statist. Data Anal.58, 339–351.

    MathSciNet  MATH  Google Scholar 

  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the gaussian graphical model. Biometrika, 19–35.

    MathSciNet  MATH  Google Scholar 

  • Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist.15, 265–286.

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gautam Sabnis.

Appendices

Appendix A: Proof of Lemma 3.1

We obtain unbiased estimators of Eqs. 3.2, 3.3, and 3.4. Suppose \(\{X_{i}\}_{i = 1}^{n}\) is a random sample from N(μ,Σ) where, without loss of generality, we let μ = 0. We have,

$$ {\mathbb{E}}(({\widetilde{\sigma}}_{ij}^{s})^{2}) = \frac{n}{n-1}{\sigma_{ij}^{2}} + \frac{\sigma_{ii}\sigma_{jj}}{n-1} $$
(A.1)

and,

$$ {\mathbb{E}}({\widetilde{\sigma}}_{ii}^{s}{\widetilde{\sigma}}_{jj}^{s}) = \frac{n + 1}{n-1}{\sigma_{ii}\sigma_{jj}} + \frac{2(n + 2)}{n(n-1)}{\sigma_{ij}^{2}}. $$
(A.2)

Equations A.1 and A.2 are obtained from Yi and Zou (2013). Solving (A.1) and (A.2) simultaneously, we obtain unbiased estimators for \({\sigma _{ij}^{2}}\) and σiiσjj below

$$ {\mathbb{E}}\left[\frac{n(n^{2} - 1)}{n^{3} + n^{2} - 2n -4}({\widetilde{\sigma}}_{ij}^{s})^{2} - \frac{n(n - 1)}{n^{3} + n^{2} - 2n -4}{\widetilde{\sigma}}_{ii}^{s}{\widetilde{\sigma}}_{jj}^{s}\right] = {\sigma_{ij}^{2}}. $$
(A.3)
$$ {\mathbb{E}}\left[\frac{2(n-1)(n + 2)}{2n + 4 - n^{3} - n^{2}}({\widetilde{\sigma}}_{ij}^{s})^{2} - \frac{n^{2}(n - 1)}{2n + 4 - n^{3} - n^{2}}{\widetilde{\sigma}}_{ii}^{s}{\widetilde{\sigma}}_{jj}^{s}\right] = {\sigma_{ii}}{\sigma_{jj}}. $$
(A.4)

An unbiased estimator of \({\text {Var}}({\widetilde {\sigma }}_{ij}^{s})\) is given by

$$ \widehat{\text{Var}}({\widetilde{\sigma}}_{ij}^{s}) = \frac{{\widehat{\sigma}}_{ij}^{2} + {\widehat{\sigma_{ii}\sigma_{jj}}}}{n - 1}. $$
(A.5)

Substituting (A.3) and (A.4) in (A.5) gives (3.2).

Equation 3.3 is obtained from Eq. 3.2 trivially. To obtain (3.4), note that

$$ {\text{Cov}}({\widetilde{\sigma}}_{jj}^{s}, {\widetilde{\sigma}}_{ii}^{s}) = \frac{2(n + 2)}{n(n - 1)}{\sigma_{ij}^{2}} + \frac{2}{n - 1}{\sigma_{ii}}{\sigma_{jj}}. $$
(A.6)

Substituting the unbiased estimators of \({\sigma }_{ij}^{2}\) and σiiσjj in Eq. A.6, obtained from Eqs. A.3 and A.4, gives us Eq. 3.4. This completes the proof.

Appendix B: Proof of Theorem 3.2

We analyze each term in Eq. 3.1 one at a time. Consider \(\|\widehat {{\Sigma }}_{\text {CD}}(k) - {\widehat {{\Sigma }}}\|_{F}^{2}\), a natural unbiased estimator of \({\mathbb {E}}\|\widehat {{\Sigma }}_{\text {CD}}(k) - {\widehat {{\Sigma }}}\|_{F}^{2}\). Then

$$\begin{array}{@{}rcl@{}} \|\widehat{{\Sigma}}_{\text{CD}}(k) - {\widehat{{\Sigma}}}\|_{F}^{2} & =& \displaystyle\sum\limits_{i,j}^{p}{\left[(\eta - 1){\widehat{\sigma}}_{ij} + {\gamma}I(i = j)\right]}^{2} \\ & = &{\displaystyle\sum\limits_{i \ne j}^{p}}{(\eta - 1)^{2}{\widehat{\sigma}}_{ij}^{2}} + \displaystyle\sum\limits_{i=j}^{p}\left[(\eta - 1){\widehat{\sigma}}_{ii} + \gamma\right]^{2} \\ & =& (\eta - 1)^{2}\|{\widehat{{\Sigma}}} \circ {\widehat{{\Sigma}}} \|_{F} + p{\gamma^{2}} + 2{\gamma}(\eta - 1){\text{Tr}}({\widehat{{\Sigma}}}), \end{array} $$
(B.1)

where (B.1) is obtained by noting that \({\displaystyle \sum \limits _{i\ne j}^{p}}{\widetilde {\sigma }}_{ij}^{2} = {\displaystyle \sum \limits _{i = 1}^{p}}{\displaystyle \sum \limits _{\underset {j \ne i}{j = 1}}^{p}}{\widetilde {\sigma }}_{ij}^{2} = \|{\widetilde {{\Sigma }}}\circ {\widetilde {{\Sigma }}}\|_{F} \;- \;\text {Tr}({\widetilde {{\Sigma }}}\circ {\widetilde {{\Sigma }}})\). Consider optimism, the third term on the right hand side of Eq. 3.1. Then

$$\begin{array}{@{}rcl@{}} {\text{optimism}} & =& {\displaystyle\sum\limits_{i \ne j}^{p}} {\eta}{\text{Var}}({\widehat{\sigma}}_{ij}) + {\displaystyle\sum\limits_{i = 1}^{p}}\left\{\eta{\text{Var}}({\widehat{\sigma}}_{ii}) + \gamma\;{\text{cov}}\left( {\displaystyle\sum\limits_{\underset{l \ne i}{l = 1}}^{p}}{\widehat{\sigma}}_{ll} + {\widehat{\sigma}}_{ii}, {\widehat{\sigma}}_{ii}\right)\right\} \\ & =& {\displaystyle\sum\limits_{i \ne j}^{p}} {\eta}{\text{Var}}({\widehat{\sigma}}_{ij}) + {\displaystyle\sum\limits_{i = 1}^{p}}\left\{(\eta + \gamma){\text{Var}}({\widehat{\sigma}}_{ii}) + \gamma{\displaystyle\sum\limits_{\underset{l \ne i}{l = 1}}^{p}}{\text{cov}}\left( {\widehat{\sigma}}_{ll}, {\widehat{\sigma}}_{ii}\right)\right\}. \end{array} $$
(B.2)

Using Lemma 3.1, we have

$$\begin{array}{@{}rcl@{}} {\widehat{\text{optimism}}} & =& {\eta}a_{n}{\displaystyle\sum\limits_{i\ne j}^{p}}{\widetilde{\sigma}}_{ij}^{2} + {\eta}b_{n}{\displaystyle\sum\limits_{i = 1}^{p}}{\widetilde{\sigma}}_{ii}{\displaystyle\sum\limits_{j \ne i}^{p}}{\widetilde{\sigma}}_{jj} + c_{n}(\gamma + \eta){\displaystyle\sum\limits_{i = 1}^{p}}{\widetilde{\sigma}}_{ii}^{2} + d_{n}{\gamma}{\displaystyle\sum\limits_{i \ne l}^{p}}{\widetilde{\sigma}}_{il}^{2} + e_{n}{\displaystyle\sum\limits_{i \ne l}^{p}}{\widetilde{\sigma}}_{ii}{\widetilde{\sigma}}_{ll} \\ & =& 2\left\{(a_{n}\eta + d_{n}\gamma)\left( \|{\widetilde{{\Sigma}}}\circ{\widetilde{{\Sigma}}}\|_{F} \;- \;\text{Tr}({\widetilde{{\Sigma}}}\circ{\widetilde{{\Sigma}}})\right) + (b_{n}\eta + c_{n}\gamma) \right. \\ && \left.\left( {{\text{Tr}}({\widetilde{{\Sigma}}})}^{2} - {\text{Tr}}({\widetilde{{\Sigma}}} \circ {\widetilde{{\Sigma}}})\right) + c_{n}(\eta + \gamma){\text{Tr}}\left( {\widetilde{{\Sigma}}} \circ {\widetilde{{\Sigma}}}\right)\right\}, \end{array} $$
(B.3)

where (B.3) is obtained by writing \({\displaystyle \sum \limits _{i = 1}^{p}}{\widetilde {\sigma }}_{ii}{\displaystyle \sum \limits _{\underset {j \ne i}{j = 1}}^{p}}{\widetilde {\sigma }}_{jj} = {{\text {Tr}}({\widetilde {{\Sigma }}})}^{2} - {\text {Tr}}({\widetilde {{\Sigma }}} \circ {\widetilde {{\Sigma }}})\) and \({\displaystyle \sum \limits _{i = 1}^{p}}{\widetilde {\sigma }}_{ii}^{2} = {\text {Tr}}({\widetilde {{\Sigma }}} \circ {\widetilde {{\Sigma }}})\).

The proof is completed by combining (B.1) and (B.3) to obtain an unbiased estimator of the Frobenius risk of \({\widehat {{\Sigma }}}_{CD}(k)\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sabnis, G., Pati, D. & Bhattacharya, A. Compressed Covariance Estimation with Automated Dimension Learning. Sankhya A 81, 466–481 (2019). https://doi.org/10.1007/s13171-018-0134-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13171-018-0134-x

Keywords and phrases.

AMS (2000) subject classification.

Navigation