NOVELIST estimator of large correlation and covariance matrices and their inverses
- 27 Downloads
Abstract
We propose a “NOVEL Integration of the Sample and Thresholded covariance” (NOVELIST) estimator to estimate the large covariance (correlation) and precision matrix. NOVELIST estimator performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log \( p/n\rightarrow 0\), and its improved version when \(p/n \rightarrow 0\). In empirical comparisons with several popular estimators, the NOVELIST estimator performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes. Real-data applications are presented.
Keywords
Covariance regularisation High-dimensional covariance Long memory Non-sparse modelling Singular sample covariance High dimensionalityMathematics Subject Classification
62G05 62H121 Introduction
Estimating the covariance matrix and its inverse, also known as the concentration or precision matrix, has always been an important part of multivariate analysis and arises prominently, for example, in financial risk management (Markowitz 1952; Longerstaey et al. 1996), linear discriminant analysis (Fisher 1936; Guo et al. 2007), principal component analysis (Pearson 1901; Croux and Haesbroeck 2000) and network science (Jeong et al. 2001; Gardner et al. 2003). Naturally, this is also true of the correlation matrix, and the following discussion applies to it, too. The sample covariance matrix is a straightforward and often used estimator of the covariance matrix. However, when the dimension p of the data is larger than the sample size n, the sample covariance matrix is singular. Even if p is smaller than but of the same order of magnitude as n, the number of parameters to estimate is \(p(p+1)/2\), which can significantly exceed n. In this case, the sample covariance matrix is not reliable, and alternative estimation methods are needed.
We would categorise the most commonly used alternative covariance estimators into two broad classes. Estimators in the first class rely on various structural assumptions on the underlying true covariance. One prominent example is ordered covariance matrices, often appearing in time-series analysis, spatial statistics and spatio-temporal modelling; these assume that there is a metric on the variable indices. Bickel and Levina (2008a) use banding to achieve consistent estimation in this context. Furrer and Bengtsson (2007) and Cai et al. (2010) regularise estimated ordered covariance matrices by tapering. Cai et al. (2010) derive the optimal estimation rates for the covariance matrix under the operator and Frobenius norms, a result which implies sub-optimality of the convergence rate of the banding estimator of Bickel and Levina (2008a) in the operator norm. The estimator of Cai et al. (2010) only achieves the optimal rate if the bandwidth parameter is chosen optimally; however, the optimal bandwidth depends crucially on the underlying unknown covariance matrix, and therefore, this estimator’s optimality is only oracular. The banding technique is also applied to the estimated Cholesky factorisation of the covariance matrix (Bickel and Levina 2008a; Wu and Pourahmadi 2003).
Another important example of a structural assumption on the true covariance or precision matrices is sparsity; it is often made, e.g. in the statistical analysis of genetic regulatory networks (Gardner et al. 2003; Jeong et al. 2001). El Karoui (2008) and Bickel and Levina (2008b) regularise the estimated sparse covariance matrix by universal thresholding. Adaptive thresholding, in which the threshold is a random function of the data (Cai and Liu 2011; Fryzlewicz 2013), leads to more natural thresholding rules and hence, potentially, more precise estimation. The Lasso penalty is another popular way to regularise the covariance and precision matrices (Zou 2006; Rothman et al. 2008; Friedman et al. 2008). Focusing on model selection rather than parameter estimation, Meinshausen and Bühlmann (2006) propose the neighbourhood selection method. One other commonly occurring structural assumption in covariance estimation is the factor model, often used, e.g. in financial applications. Fan et al. (2008) impose sparsity on the covariance matrix via a factor model. Fan et al. (2013) propose the POET estimator, which assumes that the covariance matrix is the sum of a part derived from a factor model, and a sparse part.
Estimators in the second broad class do not assume a specific structure of the covariance or precision matrices, but shrink the sample eigenvalues of the sample covariance matrix towards an assumed shrinkage target (Ledoit and Wolf 2012). A considerable number of shrinkage estimators have been proposed along these lines. Ledoit and Wolf (2004) derive an optimal linear shrinkage formula, which imposes the same shrinkage intensity on all sample eigenvalues but leave the sample eigenvectors unchanged. Nonlinear shrinkage is considered in Ledoit and Péché (2011) and Ledoit and Wolf (2012, 2015). Lam (2016) introduces a Nonparametric Eigenvalue-Regularised Covariance Matrix Estimator (NERCOME) through subsampling of the data, which is asymptotically equivalent to the nonlinear shrinkage method of Ledoit and Wolf (2012). Shrinkage can also be applied on the sample covariance matrix directly. Ledoit and Wolf (2003) propose a weighted average estimator of the covariance matrix with a single-index factor target. Schäfer and Strimmer (2005) review six different shrinkage targets. Naturally related to the shrinkage approach is Bayesian estimation of the covariance and precision matrices. Evans (1965), Chen (1979) and Dickey et al. (1985) use possibly the most natural covariance matrix prior, the inverted Wishart distribution. Other notable references include Leonard and John (2012) and Alvarez et al. (2014).
The POET method of Fan et al. (2013) proposes to estimate the covariance matrix as the sum of a non-sparse, low-rank matrix coming from the factor model part, and a certain sparse matrix, added on to ensure invertibility of the resulting covariance estimator. In this paper, we are motivated by the general idea of building a covariance estimator as the sum of a non-sparse and a sparse part. By following this route, the resulting estimator can be hoped to perform well in estimating both non-sparse and sparse covariance matrices if the amount of sparsity is chosen well. At the same time, the addition of the sparse part can guarantee stable invertibility of the estimated covariance, a prerequisite for the successful estimation of the precision matrix. On the other hand, we wish to move away from the heavy modelling assumptions used by the POET estimator; indeed, our empirical results presented later suggest that POET can underperform if the factor model assumption does not hold.
Motivated by this observation, this paper proposes a simple, practically assumption-free estimator of the covariance and correlation matrices, termed NOVELIST (NOVEL Integration of the Sample and Thresholded covariance/correlation estimators). NOVELIST arises as the linear combination of two parts: the sample covariance (correlation) estimator, which is always non-sparse and has low rank if \(p > n\), and its thresholded version, which is sparse. The inclusion of the sparse thresholded part means that NOVELIST can always be made stably invertible. NOVELIST can be viewed as a shrinkage estimator where the sample covariance (correlation) matrix is shrunk towards a flexible, nonparametric, sparse target. By selecting the appropriate amount of contribution of either of the two components, NOVELIST can adapt to a wide range of underlying covariance structures, including sparse but also non-sparse ones. In the paper, we show consistency of the NOVELIST estimator in the operator norm uniformly under a class of covariance matrices introduced by Bickel and Levina (2008b), as long as \( \log p/n\rightarrow 0\), and offer an improved version of this result if \(p/n \rightarrow 0\). The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis, which is unfamiliar to some practitioners. In our simulation studies, NOVELIST performs well in estimating both covariance and precision matrices for a wide range of underlying covariance structures, benefitting from the flexibility in the selection of its shrinkage intensity and thresholding level.
The rest of the paper is organised as follows. In Sect. 2, we introduce the NOVELIST estimator and its properties. Section 3 discusses the case where the two components of the NOVELIST estimator are combined in a non-convex way. Section 4 describes the procedure for selecting its parameters. Section 5 shows empirical improvements of NOVELIST. Section 6 exhibits practical performance of NOVELIST in comparison with the state of the art. Section 7 presents real-data performance in portfolio optimisation problems and concludes the paper, and proofs appear in “Appendix” section. The R package “novelist” is available on CRAN.
2 Method, motivation and properties
2.1 Notation and method
NOVELIST is a shrinkage estimator, in which the shrinkage target is assumed to be sparse. The degree of shrinkage is controlled by the \(\delta \) parameter and the amount of sparsity in the target by the \(\lambda \) parameter. Numerical results shown in Fig. 1 suggest that the eigenvalues of the NOVELIST estimator arise as a certain nonlinear transformation of the eigenvalues of the sample correlation (covariance) matrix, although the application of NOVELIST avoids explicit eigenanalysis.
2.2 Motivation: link to ridge regression
From formula (3), NOVELIST penalises the regression coefficients in a pairwise manner which can be interpreted as follows: for a given threshold \(\lambda \), we place a penalty on the products \(\beta _i\beta _j\) of those coefficients of \(\beta \) for which the sample correlation between \({\tilde{\varvec{X}}}_i\) and \({\tilde{\varvec{X}}}_j\), the ith and jth column of \({\tilde{\varvec{X}}}\) (respectively), exceeds \(\lambda \). In other words, if the sample correlation is high, we penalise the product of the corresponding \(\beta \)’s, hoping that the resulting estimated \(\beta _i\) and \(\beta _j\) are not simultaneously large.
2.3 Asymptotic properties of NOVELIST
2.3.1 Consistency of the NOVELIST estimators
Next, we establish consistency of the NOVELIST estimator in the operator norm, \(\mid \mid A\mid \mid _2^2=\lambda _{\max }(AA^{\hbox {T}})\), where \(\lambda _{\max }()\) is the largest eigenvalue operator.
Proposition 1
Proposition 2
The proofs are given in “Appendix” section. The NOVELIST estimators of the correlation and covariance matrices and their inverses yield the same convergence rate.
We now discuss the optimal asymptotic \({\delta }\) under the settings of Propositions 1 and 2. Proposition 1 can be thought of as a “large p” setting, while Proposition 2 applies to moderately large and small p.
2.3.2 Optimal \(\delta \) and rate of convergence in Proposition 1
Proposition 1 corresponds to “large p” scenarios, in which p can be thought of as being O(n) or larger (indeed, the case \(p = o(n)\) is covered by Proposition 2). For such a large p, the pre-condition for the consistency of the NOVELIST estimator is that \(\delta \rightarrow 1\), i.e. that the estimator asymptotically degenerates to the thresholding estimator. To see this, take \(p = n^{1+\varDelta }\) with \(\varDelta \ge 0\). If \(\delta \not \rightarrow 1\), the error in part (A) of formula (6) would be of order \(n^{1/2+\varDelta }\sqrt{\log \,{n^{1+\varDelta }}}\) and therefore would not converge to zero.
Scenario 1
\(q=0\), \(s_0(p)=o((n/\log \,p)^{1/2})\).
When \(q=0\), the uniformity class of correlation matrices controls the maximum number of nonzero entries in each row. The typical example is the moving-average (MA) autocorrelation structure in time series.
Scenario 2
\(q\ne 0\), \(s_0(p)\le C\) as \(p\rightarrow \infty \).
A typical example of this scen is the auto-regressive (AR) autocorrelation structure.
We now show a scen in which NOVELIST is inconsistent, under the setting of Proposition 1. Consider the long-memory autocorrelation matrix, \(\rho _{ij} \sim \, \mid i-j\mid ^{-\alpha }\), \(0 < \alpha \le 1\), for which \(s_0(p) = \max _{1\le i\le p} \sum _{j=1}^p \max (1, |i-j|) ^{-\alpha q} = O(p^{1-\alpha q})\). Take \(q \ne 0\). Note a sufficient condition for \({\tilde{\delta }}\) to tend to 1 is that \((\log \,p)^{(1/2)} n^{-1/2} p^\alpha \rightarrow \infty \). This more easily happens for larger \(\alpha \)’s, i.e. for “less long”-memory processes. However, considering the implied rate of convergence, we have \(s_0(p)(\log \,p / n)^{(1-q)/2} = p^{1-\alpha q} (\log \,p / n)^{(1-q)/2}\), which is divergent even if \(\alpha = 1\).
2.3.3 Optimal \(\delta \) and rate of convergence in Proposition 2
Scenario 3
p fixed (and hence \(q = 0\)).
Note that in the case of p being fixed or bounded in n, one can take \(q = 0\) (to obtain as fast a rate for the thresholding part as possible) as the implied \(s_0(p)\) will also be bounded in n. In this case, we have \({\tilde{\delta }} \rightarrow 1\) (and hence NOVELIST degenerates to the thresholding estimator with its corresponding speed of convergence), but the speed at which \({\tilde{\delta }}\) approaches 1 is extremely slow (\(O(\log ^{-1/2}n)\)).
Scenario 4
\(p \rightarrow \infty \) with n, and \(q = 0\).
In this case, the quantity \(\{ (p + \log \,n) / \log \,p \}^{1/2}\) acts as a transition phase: if \(s_0(p)\) is of a larger order, then we have \({\tilde{\delta }}\rightarrow 0\); if it is of a smaller order, then \({\tilde{\delta }} \rightarrow 1\); if it is of this order and if \({\tilde{\delta }}\) has a limit, then its limit lies in (0, 1). Therefore, NOVELIST will be closer to the sample covariance (correlation) if the truth is dense (i.e. if \(s_0(p)\) is large), and closer to the thresholding estimator if \(s_0(p)\) is small.
Scenario 5
\(p \rightarrow \infty \) with n, and \(q \ne 0\).
Here, the transition-phase quantity is \(\frac{(p + \log \,n)^{1/2}}{(\log \,p)^\frac{1-q}{2}n^{q/2}}\) and conclusions analogous to those of the preceding Scenario can be formed.
In the context of Scenario 5, we now revisit the long-memory example from before. The most “difficult” case still included in the setting of Proposition 2 is when p is “almost” the size of n; therefore, we assume \(p = n^{1-\varDelta }\), with \(\varDelta \) being a small positive constant. Neglecting the logarithmic factors, the transition-phase quantity \(\frac{(p + \log \,n)^{1/2}}{(\log \,p)^\frac{1-q}{2}n^{q/2}}\) reduces to \(n^\frac{1-\varDelta -q}{2}\). We have \(s_0(p) = O(n^{(1-\varDelta )(1-\alpha q)})\), and therefore \(s_0(p)\) is of a larger order than \(n^\frac{1-\varDelta -q}{2}\) if \(\alpha < \frac{1-\varDelta +q}{2q(1-\varDelta )}\); in this case, \({\tilde{\delta }} \rightarrow 0\), and the NOVELIST estimator degenerates to the sample covariance (correlation) estimator, which is consistent in this setting at the rate of \(n^{-\varDelta /2}\) (neglecting the log-factors). The other case, \(\alpha \ge \frac{1-\varDelta +q}{2q(1-\varDelta )}\), is impossible as we must have \(\alpha \le 1\). Therefore, the NOVELIST estimator is consistent for the long-memory model under the setting of Proposition 2, i.e. when \(p = o(n)\) (and degenerates to the sample covariance estimator). This is in contrast to the setting of Proposition 1, where, as argued before, the consistency of NOVELIST in the long-memory model cannot be shown.
3 \(\delta \) outside [0, 1]
Some authors (Ledoit and Wolf 2003; Schäfer and Strimmer 2005; Savic and Karlsson 2009), more or less explicitly, discuss the issue of the shrinkage intensity (for other shrinkage estimators) falling within versus outside the interval [0, 1]. Ledoit and Wolf (2003) “expect” it to lie between zero and one, Schäfer and Strimmer (2005) truncate it at zero or one, and Savic and Karlsson (2009) view negative shrinkage as a “useful signal for possible model misspecification”. We are interested in the performance of the NOVELIST estimator with \(\delta \not \in [0,1]\) and have reasons to believe that \(\delta \not \in [0,1]\) may be a good choice in certain scenarios.
4 Empirical choices of \((\lambda , \delta )\) and algorithm
The choices of the shrinkage intensity (for shrinkage estimators) and the thresholding level (for thresholding estimators) are intensively studied in the literature. Bickel and Levina (2008b) propose a cross-validation method for choosing the threshold value for their thresholding estimator. However, NOVELIST requires simultaneous selection of the two parameters \(\lambda \) and \(\delta \), which makes straight cross-validation computationally intensive. Ledoit and Wolf (2003) and Schäfer and Strimmer (2005) give an analytic solution to the problem of choosing the optimal shrinkage level, under the Frobenius norm, for any shrinkage estimator. Since NOVELIST can be viewed as a shrinkage estimator, we borrow strength from this result and proceed by selecting the optimal shrinkage intensity \(\delta ^*(\lambda )\) in the sense of Ledoit and Wolf (2003) for each \(\lambda \), and then perform cross-validation to select the best pair \((\lambda ', \delta ^*(\lambda '))\). This process significantly accelerates computation.
Cai and Liu (2011) and Fryzlewicz (2013) use adaptive thresholding for covariance matrices, in order to make thresholding insensitive to changes in the variance of the individual variables. This, effectively, corresponds to thresholding sample correlations rather than covariances. In the same vein, we apply NOVELIST to sample correlation matrices. We use soft thresholding as it often exhibits better and more stable empirical performance than hard thresholding, which is partly due to its being a continuous operation. Let \({\hat{\varSigma }}\) and \({\hat{R}}\) be the sample covariance and correlation matrices computed on the whole dataset, and let \(T=\{T_{ij}\}\) be the soft thresholding estimator of the correlation matrix. The algorithm proceeds as follows.
For estimating the covariance matrix,
The first equality comes from Ledoit and Wolf (2003), and the second follows because of the fact that our shrinkage target T is the soft thresholding estimator with threshold \(\lambda \) (applied to the off-diagonal entries only).
- 1.
For each \(\lambda \), obtain the NOVELIST estimator of the correlation matrix \({\hat{R}}^{N^{(z)}}_A(\lambda )={\hat{R}}^{N}({\hat{R}}^{(z)}_{A}, \lambda , \delta ^*(\lambda ))\), and of the covariance matrix \({\hat{\varSigma }}^{N^{(z)}}_A(\lambda )=\hat{D_A}{\hat{R}}^{N^{(z)}}_A(\lambda )\hat{D_A}\), where \(\hat{D_A}=(\text {diag}\) \(({{\hat{\varSigma }}^{(z)}_{A}}))^{1/2}\).
- 2.
Compute the spectral norm error \({\text {Err}}(\lambda )^{(z)}=\mid \mid {\hat{\varSigma }}^{N^{(z)}}_A(\lambda )-{\hat{\varSigma }}_{B}^{(z)}\mid \mid ^2_2\).
- 3.
Repeat steps 1 and 2 for each z and obtain the averaged error \({\text {Err}}(\lambda )=\frac{1}{Z}\sum _{z=1}^{Z} {\text {Err}}(\lambda )^{(z)}\). Find \({\lambda }'=\min _{\lambda }{\text {Err}}(\lambda )\), then obtain the optimal pair \(({\lambda }', {\delta }')=({\lambda }', \delta ^*({\lambda }')).\)
- 4.Compute the cross-validated NOVELIST estimators of the correlation and covariance matrices as$$\begin{aligned} {\hat{R}}^{N}_{cv}&={\hat{R}}^{N}({\hat{R}}, \lambda ^{'}, \delta ^{'}), \end{aligned}$$(13)where \({\hat{D}}=(\text {diag}({\hat{\varSigma }}))^{1/2}.\)$$\begin{aligned} {\hat{\varSigma }}^{N}_{cv}&={\hat{D}}{\hat{R}}^{N} _{cv}{\hat{D}}, \end{aligned}$$(14)
5 Empirical improvements of NOVELIST
5.1 Fixed parameters
5.2 Principal-component-adjusted NOVELIST
6 Simulation study
In this section, we investigate the performance of the NOVELIST estimator of covariance and precision matrices based on optimal and data-driven choices of \((\lambda , \delta )\) for seven different models and in comparison with five popular competitors. According to the algorithm in Sect. 4, the NOVELIST estimator of the correlation is obtained first; the corresponding estimator of the covariance follows by formula (13) and the inverse of the covariance estimator is obtained by formula (16). In all simulations, the sample size \(n=100\), and the dimension \(p \in \{10, 100, 200, 500\}\). We perform \(N=50\) repetitions.
6.1 Simulation models
We use the following models for \(\varSigma \).
(A) \(\textit{Identity}\) \(\sigma _{ij}=1\mathbb {I}\{i=j\}\), for \(1\le i,j\le p\).
\(\mathbf {Y}=\{Y_1, Y_2, Y_3\}^{\hbox {T}}\) is a three-dimensional factor, generated independently from the standard normal distribution, i.e. \(\mathbf {Y}\sim \mathbf {\mathcal {N}(0,\mathcal {I}_3)}\),
\(\mathbf {B}=\{\beta _{ij}\}\) is the coefficient matrix, \(\beta _{ij}\overset{i.i.d.}{\sim }U(0,1)\), \(1\le i \le p\), \(1\le j \le 3\),
\(\mathbf {E}=\{\epsilon _1, \epsilon _2,\cdot \cdot \cdot , \epsilon _p\}^{\hbox {T}}\) is p-dimensional random noise, generated independently from the standard normal distribution, \(\mathbf {\epsilon }\sim \mathbf {\mathcal {N}(0,1)}\).
Based on this model, we have \(\sigma _{ij}={\left\{ \begin{array}{ll} \sum _{k=1}^{3} \beta _{ik}^2+1 &{} \text {if} \ i= j;\\ \sum _{k=1}^{3} \beta _{ik}\beta _{jk} &{} \text {if} \ i\ne j. \end{array}\right. }\).
The models can be broadly divided into three groups. (A)–(C) and (G) are sparse, (D) is non-sparse, and (E) and (F) are highly non-sparse. In models (B), (C) (F) and (G), the covariance matrix equals the correlation matrix. In order to depart from the case of equal variances, we also work with modified versions of these models, denoted by (B*), (C*) (F*) and (G*), in which the correlation matrix \(\{\rho _{ij}\}\) is generated as in (B), (C) (F) and (G), respectively, and which have unequal variances independently generated as \(\sigma _{ii}\sim \chi ^2_{5}\). As a result, in the “starred” models, we have \(\sigma _{ij}=\rho _{ij}\sqrt{\sigma _{ii}\sigma _{jj}}\), \(i,j \in (1,p)\).
The performance of the competing estimators is presented in two parts. In the first part, we compare the estimators with optimal parameters identified with the knowledge of the true covariance matrix. These include (a) the soft thresholding estimator \(T_s\), which applies the soft thresholding operator to the off-diagonal entries of \({\hat{R}}\) only, as described in Sect. 2.1, (b) the banding estimator B (Section 2.1 in Bickel and Levina (2008a)), (c) the optimal NOVELIST estimator \({\hat{\varSigma }}^{N}_{opt}\) and (d) the optimal PC-adjusted NOVELIST estimator \(\hat{\varSigma }^N_{\hbox {opt.rem}}\) . In the second part, we compare the data-driven estimators including (e) the linear shrinkage estimator S [Target D in Table 2 from Schäfer and Strimmer (2005)], which estimates the correlation matrix by “shrinkage of the sample correlation towards the identity matrix” and estimates the variances by “shrinkage of the sample variances towards their median”, (f) the POET estimator P (Fan et al. 2013), (g) the cross-validated NOVELIST estimator \({\hat{\varSigma }}^{N}_{cv}\), (h) the PC-adjusted NOVELIST \(\hat{\varSigma }^N_{\hbox {rem}}\) and (i) the nonlinear shrinkage estimator NS (Ledoit and Wolf 2015). The sample covariance matrix \({\hat{\varSigma }}\) is also listed for reference. We use the R package \(\textit{corpcor}\) to compute S and the R package \(\textit{POET}\) to compute P. In the latter, we use \(k=7\) as suggested by the authors and use soft thresholding in NOVELIST and POET as it tends to offer better empirical performance. We use \(Z=50\) for \({\hat{\varSigma }}^{N}_{cv}\) and extend the interval for \(\delta \) to \([-0.5, 1.5]\). \({\hat{\varSigma }}^{N}_{cv}\) with fixed parameters are only considered for estimating precision matrix under model (E), (F) and (F*) when \(p=100, 200, 500\). We use \(K=1\) for \(\hat{\varSigma }^N_{\hbox {opt.rem}}\) and \(\hat{\varSigma }^N_{\hbox {rem}}\). NS is performed by using the commercial package SNOPT for Matlab (Ledoit and Wolf 2015).
6.2 Simulation results
- 1.
The higher the degree of sparsity, the closer \(\delta ^*\) is to 1. The \(\delta ^*\) parameter tends to be close to 1 or slightly larger than 1 for the sparse group, around 0.5 for the non-sparse group and about 0 or negative for the highly non-sparse group.
- 2.
\(\delta ^*\) moves closer to 1 as p increases. This is especially true for the sparse group.
- 3.
Unsurprisingly, the choice of \(\lambda \) is less important when \(\delta \) is closer to 0.
- 4.
Occasionally, \(\delta ^* \not \in [0,1]\). In particular, for the AR(1) and seasonal models, \(\delta ^* \in (1, 1.5]\), while in the highly non-sparse group, \(\delta ^*\) can take negative values, which is a reflection of the fact that \(\hat{\varSigma }^N_{\hbox {opt}}\) attempts to reduce the effect of the strongly misspecified sparse target.
Choices of \((\lambda ^*, \delta ^*)\) and \((\lambda ^{'}, \delta ^{'})\) for \({\hat{\varSigma }}^N\) (50 replications)
\(\hat{\varSigma }^N_{\hbox {opt}}\) | \(\hat{\varSigma }^N_{cv}\) | \(\hat{\varSigma }^N_{\hbox {opt}}\) | \(\hat{\varSigma }^N_{cv}\) | |||||
---|---|---|---|---|---|---|---|---|
\(\lambda ^*\) | \(\delta ^*\) | \(\lambda ^{'}\) | \(\delta ^{'}\) | \(\lambda ^*\) | \(\delta ^*\) | \(\lambda ^{'}\) | \(\delta ^{'}\) | |
\(p=10,\, n=100\) | \(p=100,\, n=100\) | |||||||
(A) Identity | (0.50,1.00) | 1.00 | 0.60 | 1.00 | (0.50,1.00) | 1.00 | 0.60 | 1.00 |
(B) MA(1) | 0.15 | 1.00 | 0.25 | 0.80 | 0.20 | 1.00 | 0.20 | 0.95 |
(B*) MA(1)* | 0.15 | 0.95 | 0.30 | 0.65 | 0.15 | 1.00 | 0.30 | 0.90 |
(C) AR(1) | 0.50 | 0.00 | 0.40 | 0.15 | 0.15 | 0.50 | 0.10 | 0.70 |
(C*) AR(1)* | 0.50 | 0.05 | 0.40 | 0.00 | 0.30 | 0.60 | 0.30 | 0.85 |
(D) Non-sparse | 0.40 | 0.50 | 0.55 | 0.40 | 0.45 | 0.60 | 0.35 | 0.80 |
(E) Factor | 0.40 | 0.00 | 0.65 | 0.10 | 0.20 | \(-\) 0.15 | 0.50 | 0.05 |
(F) FGN | 0.50 | \(-\) 0.05 | 0.50 | 0.00 | 0.30 | \(-\) 0.10 | 0.55 | 0.05 |
(F*) FGN* | 0.50 | \(-\) 0.05 | 0.50 | 0.00 | 0.40 | \(-\) 0.05 | 0.65 | 0.05 |
(G) Seasonal | 0.15 | 0.75 | 0.15 | 0.70 | 0.10 | 1.30 | 0.05 | 1.50 |
(G*) Seasonal* | 0.25 | 0.75 | 0.20 | 0.65 | 0.10 | 1.30 | 0.05 | 1.50 |
\(p=200,\, n=100\) | \(p=500,\, n=100\) | |||||||
(A) Identity | 0.55 | 1.00 | 0.60 | 1.00 | 0.55 | 1.00 | 0.60 | 1.00 |
(B) MA(1) | 0.25 | 1.00 | 0.20 | 1.00 | 0.30 | 1.00 | 0.25 | 1.00 |
(B*) MA(1)* | 0.25 | 1.00 | 0.25 | 0.95 | 0.25 | 1.00 | 0.20 | 1.00 |
(C) AR(1) | 0.05 | 1.00 | 0.05 | 1.00 | 0.10 | 1.10 | 0.05 | 0.80 |
(C*) AR(1)* | 0.05 | 1.10 | 0.05 | 1.30 | 0.10 | 0.95 | 0.10 | 1.10 |
(D) Non-sparse | 0.30 | 0.65 | 0.55 | 0.40 | 0.40 | 0.75 | 0.40 | 0.90 |
(E) Factor | 0.10 | \(-\) 0.10 | 0.60 | 0.05 | 0.20 | \(-\) 0.10 | 0.50 | 0.05 |
(F) FGN | 0.30 | 0.05 | 0.65 | 0.10 | 0.35 | 0.10 | 0.40 | 0.10 |
(F*) FGN* | 0.25 | 0.05 | 0.50 | 0.05 | 0.15 | \(-\) 0.10 | 0.35 | 0.10 |
(G) Seasonal | 0.10 | 1.10 | 0.05 | 1.50 | 0.10 | 1.30 | 0.10 | 1.20 |
(G*) Seasonal* | 0.10 | 1.10 | 0.05 | 1.50 | 0.10 | 1.30 | 0.10 | 1.20 |
Performance of cross-validated choices of \((\lambda , \delta )\) Table 1 shows that the cross-validated choices of the parameter \((\lambda ^{'}, \delta ^{'})\) for \({\hat{\varSigma }}^N_{cv}\) are close to the optimal \(( \lambda ^{*}, \delta ^{*})\) for most models when \(p=10\), but there are bigger discrepancies between \((\lambda ^{'}, \delta ^{'})\) and \(( \lambda ^{*}, \delta ^{*})\) as p increases, especially for the highly non-sparse group. Again, Fig. 4, which only includes representative models from each sparsity category, shows that the choices of \((\lambda ^{'}, \delta ^{'})\) are consistent with \(( \lambda ^{*}, \delta ^{*})\) in most of the cases. For models (A) and (C), cross-validation works very well: the vast majority of \((\lambda ^{'}, \delta ^{'})\) lead to the error lying in the 1st decile of the possible error range, whereas for models (D) and (G) with \(p = 10\), in the 1st or 2nd decile.
However, as given in Tables 4 and 5, the performance of cross-validation in estimating \(\varSigma ^{-1}\) with highly non-sparse covariance structures, such as in factor models and long-memory autocovariance structures, is less good (a remedy to this was described in Sect. 5.1).
Comparison with competing estimators For the estimators with the optimal parameters (Tables 2, 3), NOVELIST performs the best for \(p=10\) for both \(\varSigma \) and \(\varSigma ^{-1}\) and beats the competitors across the non-sparse and highly non-sparse model classes when \(p=100\), 200 and 500. The banding estimator beats NOVELIST in covariance matrix estimation in the homoscedastic sparse models by a small margin in the higher-dimensional cases. For the identity matrix, banding, thresholding and the optimal NOVELIST attain the same results. Optimal PC-adjusted NOVELIST achieves better relative results for estimating \(\varSigma ^{-1}\) than for \(\varSigma \).
Average operator norm error to \(\varSigma \) for competing estimators with optimal parameters (50 replications)
\({\hat{\varSigma }}\) | \(T_s\) | B | \(\hat{\varSigma }^N_{\hbox {opt}}\) | \(\hat{\varSigma }^N_{\hbox {opt.rem}}\) | \({\hat{\varSigma }}\) | \(T_s\) | B | \(\hat{\varSigma }^N_{\hbox {opt}}\) | \(\hat{\varSigma }^N_{\hbox {opt.rem}}\) | |
---|---|---|---|---|---|---|---|---|---|---|
\(p=10,\, n=100\) | \(p=100,\, n=100\) | |||||||||
(A) Identity | 0.578 | – | 2.946 | – | ||||||
(B) MA(1) | 0.623 | 0.447 | 0.435 | – | 3.055 | 0.670 | 0.668 | – | ||
(B*) MA(1)* | 1.400 | 1.008 | 0.988 | – | 6.458 | 1.890 | 1.800 | – | ||
(C) AR(1) | 1.148 | 0.762 | 1.072 | – | 6.112 | 4.977 | 4.703 | – | ||
(C*) AR(1)* | 2.010 | 1.707 | 2.004 | – | 16.338 | 8.786 | – | |||
(D) Non-sparse | 3.483 | 2.954 | 3.127 | – | 25.844 | 11.302 | 11.539 | – | ||
(E) Factor | 1.811 | 1.462 | 1.742 | 1.221 | 14.350 | 13.675 | 13.993 | |||
(F) FGN | 1.110 | 0.751 | 0.970 | 0.711 | 7.824 | 6.777 | 7.478 | 7.033 | ||
(F*) FGN* | 2.239 | 1.617 | 2.108 | 1.683 | 15.666 | 13.383 | 15.147 | 13.782 | ||
(G) Seasonal | 0.850 | 0.564 | 0.797 | – | 4.290 | 2.493 | 2.460 | – | ||
(G*) Seasonal* | 1.664 | 1.228 | 1.594 | – | 6.694 | 3.028 | 2.959 | – | ||
\(p=200,\, n=100\) | \(p=500, \, n=100\) | |||||||||
(A) Identity | 4.661 | – | 9.321 | – | ||||||
(B) MA(1) | 4.886 | 0.717 | 0.716 | – | 9.828 | 0.761 | 0.761 | – | ||
(B*) MA(1)* | 10.727 | 1.884 | 1.881 | – | 21.233 | 2.041 | 2.041 | – | ||
(C) AR(1) | 10.291 | 6.922 | 6.768 | – | 17.877 | 9.311 | 9.261 | – | ||
(C*) AR(1)* | 20.277 | 14.943 | – | 39.241 | 18.780 | 18.728 | – | |||
(D) Non-sparse | 26.729 | 10.990 | 11.240 | – | 50.915 | 13.917 | 13.284 | – | ||
(E) Factor | 31.183 | 28.053 | 29.819 | 82.451 | 65.234 | 73.807 | ||||
(F) FGN | 14.732 | 12.729 | 13.877 | 15.881 | 35.041 | 30.201 | 31.272 | 30.782 | ||
(F*) FGN* | 32.370 | 26.692 | 29.862 | 28.983 | 68.154 | 66.833 | 66.320 | 55.998 | ||
(G) Seasonal | 6.913 | 2.961 | – | 13.157 | 3.582 | 3.460 | – | |||
(G*) Seasonal* | 14.709 | 6.427 | 6.350 | – | 27.627 | 7.873 | 7.538 | – |
Average operator norm error to \(\varSigma ^{-1}\) for competing estimators with optimal parameters (50 replications)
\({\hat{\varSigma }}\) | \(T_s\) | B | \(\hat{\varSigma }^N_{\hbox {opt}}\) | \(\hat{\varSigma }^N_{\hbox {opt.rem}}\) | \({\hat{\varSigma }}\) | \(T_s\) | B | \(\hat{\varSigma }^N_{\hbox {opt}}\) | \(\hat{\varSigma }^N_{\hbox {opt.rem}}\) | |
---|---|---|---|---|---|---|---|---|---|---|
\(p=10,\, n=100\) | \(p=100,\, n=100\) | |||||||||
(A) Identity | 0.917 | – | – | – | ||||||
(B) MA(1) | 1.177 | 0.681 | 0.656 | – | – | – | ||||
(B*) MA(1)* | 0.626 | 0.489 | 0.732 | – | – | 0.846 | – | |||
(C) AR(1) | 9.078 | 7.751 | 9.078 | – | – | 14.313 | 18.064 | – | ||
(C*) AR(1)* | 4.491 | 2.736 | 4.491 | – | – | 8.915 | 7.298 | – | ||
(D) Non-sparse | 0.378 | 0.256 | 0.297 | – | – | 2.670 | 2.775 | – | ||
(E) Factor | 0.846 | 0.403 | 0.610 | 0.400 | – | 0.712 | 0.715 | 0.653 | ||
(F) FGN | 2.995 | 1.727 | 2.980 | – | 3.585 | 4.650 | 3.112 | |||
(F*) FGN* | 1.571 | 1.193 | 1.212 | – | 2.029 | 2.038 | 1.948 | |||
(G) Seasonal | 2.688 | 1.538 | 2.685 | – | – | 3.806 | 5.444 | – | ||
(G*) Seasonal* | 1.340 | 1.091 | 1.726 | – | – | 2.526 | 4.345 | – | ||
\(p=200, \, n=100\) | \(p=500, \,n=100\) | |||||||||
(A) Identity | – | – | – | – | ||||||
(B) MA(1) | – | 1.358 | 1.530 | – | – | 1.405 | 1.562 | – | ||
(B*) MA(1)* | – | 1.100 | 0.850 | – | – | 1.040 | 1.145 | – | ||
(C) AR(1) | – | 15.023 | 18.122 | – | – | 15.622 | 18.136 | – | ||
(C*) AR(1)* | – | 14.509 | 20.358 | – | – | 18.392 | 23.740 | – | ||
(D) Non-sparse | – | 2.460 | 2.016 | – | – | 5.986 | 5.896 | – | ||
(E) Factor | – | 0.711 | 0.711 | 0.677 | – | 0.744 | 0.744 | 0.730 | ||
(F) FGN | – | 3.972 | 4.658 | 3.317 | – | 4.267 | 4.737 | 3.527 | ||
(F*) FGN* | – | 2.974 | 4.096 | 2.083 | – | 4.426 | 5.674 | 2.250 | ||
(G) Seasonal | – | 4.029 | 5.469 | – | – | 4.188 | 5.477 | – | ||
(G*) Seasonal* | – | 3.328 | 4.885 | – | – | 3.726 | 5.479 | – |
Average operator norm error to \(\varSigma \) for competing estimators with data-driven parameters (50 replications)
\({\hat{\varSigma }}\) | S | P | \(\hat{\varSigma }^N_{cv}\) | \(\hat{\varSigma }^N_{\hbox {rem}}\) | NS | \({\hat{\varSigma }}\) | S | P | \(\hat{\varSigma }^N_{cv}\) | \(\hat{\varSigma }^N_{\hbox {rem}}\) | NS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
\(p=10,\, n=100\) | \(p=100,\, n=100\) | |||||||||||
(A) Identity | 0.578 | 0.823 | 0.263 | – | 0.116 | 2.946 | 3.657 | 0.446 | – | |||
(B) MA(1) | 0.623 | 0.732 | 0.493 | – | 0.481 | 3.055 | 3.730 | – | ||||
(B*) MA(1)* | 1.400 | 1.546 | – | 1.191 | 6.458 | 1.985 | 8.015 | – | 2.449 | |||
(C) AR(1) | 1.148 | 1.135 | 1.153 | – | 6.112 | 6.257 | – | 5.892 | ||||
(C*) AR(1)* | 2.190 | 2.291 | 2.114 | – | 2.190 | 16.338 | 8.878 | 19.468 | – | 12.095 | ||
(D) Non-sparse | 3.483 | 3.120 | 3.860 | – | 25.844 | 12.453 | 29.355 | – | ||||
(E) Factor | 1.811 | 1.793 | 1.866 | 1.741 | 1.763 | 17.681 | 16.497 | 16.438 | 15.285 | |||
(F) FGN | 1.110 | 1.020 | 1.021 | 1.024 | 0.980 | 7.824 | 7.798 | 7.799 | 7.732 | 7.554 | ||
(F*) FGN* | 2.239 | 2.218 | 2.221 | 2.222 | 2.227 | 15.666 | 15.611 | 16.561 | ||||
(G) Seasonal | 0.850 | 0.852 | – | 4.290 | 3.200 | 4.826 | – | 3.098 | ||||
(G*) Seasonal* | 1.664 | 1.647 | 1.652 | – | 6.694 | 4.268 | 7.171 | – | 6.979 | |||
\(p=200,\, n=100\) | \(p=500,\, n=100\) | |||||||||||
(A) Identity | 4.661 | 5.414 | 0.443 | – | 9.321 | 10.076 | 0.468 | – | ||||
(B) MA(1) | 4.886 | 5.615 | 0.744 | – | 0.694 | 9.828 | 10.566 | 0.819 | – | 0.683 | ||
(B*) MA(1)* | 10.727 | 2.094 | 12.458 | – | 2.729 | 21.233 | 23.034 | – | 3.004 | |||
(C) AR(1) | 10.291 | 8.123 | 11.446 | 8.217 | – | 17.877 | 12.785 | 18.496 | – | |||
(C*) AR(1)* | 20.277 | 18.172 | 23.721 | – | 18.751 | 39.241 | 26.571 | 40.903 | – | 24.581 | ||
(D) Non-sparse | 26.729 | 11.920 | 30.108 | – | 50.915 | 13.758 | 54.462 | – | ||||
(E) Factor | 31.183 | 34.237 | 33.224 | 33.194 | ||||||||
(F) FGN | 14.732 | 14.376 | 14.640 | 14.593 | 14.125 | 35.041 | 34.344 | 31.296 | 30.992 | 36.299 | ||
(F*) FGN* | 32.370 | 32.188 | 68.154 | 84.958 | 75.546 | 75.377 | 74.432 | |||||
(G) Seasonal | 6.913 | 4.126 | 7.403 | – | 4.016 | 13.157 | 4.994 | 13.722 | – | 4.949 | ||
(G*) Seasonal* | 14.709 | 9.225 | 15.855 | – | 9.064 | 27.627 | 11.030 | 28.949 | – | 11.132 |
Average operator norm error to \(\varSigma ^{-1}\) for competing estimators with data-driven parameters (50 replications)
\({\hat{\varSigma }}\) | S | P | \(\hat{\varSigma }^N_{cv}\) | \(\hat{\varSigma }^N_{\hbox {rem}}\) | NS | \({\hat{\varSigma }}\) | S | P | \(\hat{\varSigma }^N_{cv}\) | \(\hat{\varSigma }^N_{\hbox {rem}}\) | NS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
\(p=10,\, n=100\) | \(p=100,\, n=100\) | |||||||||||
(A) Identity | 0.917 | 4.472 | 0.469 | – | 0.146 | – | 0.882 | 0.472 | – | 0.109 | ||
(B) MA(1) | 1.123 | 6.474 | – | – | 1.403 | 1.439 | – | 1.405 | ||||
(B*) MA(1)* | 0.626 | 0.526 | 4.892 | – | – | 1.358 | 0.993 | – | 1.748 | |||
(C) AR(1) | 9.078 | 7.309 | 40.142 | 8.574 | – | – | 13.410 | 15.704 | – | |||
(C*) AR(1)* | 4.941 | 5.390 | 27.593 | 4.841 | – | – | 12.508 | 13.649 | – | 13.446 | ||
(D) Non-sparse | 0.378 | 0.500 | 1.705 | – | – | – | ||||||
(E) Factor | 0.846 | 1.142 | 1.806 | 0.864 | – | – | 2.603 | 0.893 | 1.608 | – | ||
(0.854) | (0.695) | (0.526) | ||||||||||
(F) FGN | 2.995 | 16.530 | 2.097 | – | – | 4.565 | 3.060 | 4.212 | – | 3.122 | ||
(2.081) | (3.159) | |||||||||||
(F*) FGN* | 1.571 | 10.284 | 2.017 | – | – | 4.474 | 2.965 | 3.431 | – | 4.432 | ||
(2.001) | ||||||||||||
(G) Seasonal | 2.688 | 1.897 | 13.175 | 2.103 | 2.115 | – | 4.229 | 4.721 | – | |||
(G*) Seasonal* | 1.340 | 1.284 | 8.436 | – | 1.219 | – | 3.510 | 3.799 | – | 4.538 | ||
\(p=200,\, n=100\) | \(p=500,\, n=100\) | |||||||||||
(A) Identity | – | 0.930 | 0.529 | – | 0.136 | – | 0.923 | 0.601 | – | 0.139 | ||
(B) MA(1) | – | 1.449 | – | 1.463 | – | 1.540 | – | |||||
(B*) MA(1)* | – | 1.293 | 1.256 | – | 1.906 | – | 1.914 | 1.221 | – | 2.463 | ||
(C) AR(1) | – | 15.066 | 17.128 | – | – | 17.700 | – | |||||
(C*) AR(1)* | – | 17.480 | 18.286 | – | 19.037 | – | 22.833 | 23.053 | – | 23.740 | ||
(D) Non-sparse | – | 2.842 | – | 3.206 | – | 6.171 | – | |||||
(E) Factor | – | 3.701 | 0.892 | 1.450 | – | – | 5.672 | 0.962 | 4.106 | – | ||
(0.710) | (0.546) | (0.937) | (0.558) | |||||||||
(F) FGN | – | 9.397 | 3.552 | 5.670 | – | 3.434 | – | 8.621 | 3.933 | 6.652 | – | 3.752 |
(3.582) | (4.364) | |||||||||||
(F*) FGN* | – | 6.649 | 2.765 | 4.024 | – | 5.519 | – | 6.241 | 3.083 | 5.442 | – | 6.519 |
(2.589) | ||||||||||||
(G) Seasonal | – | 4.676 | 5.019 | – | 4.526 | – | 5.045 | 5.256 | – | 5.001 | ||
(G*) Seasonal* | – | 4.540 | 4.643 | – | 6.068 | – | 5.632 | 5.254 | – | 6.988 |
7 Portfolio selection
In this section, we apply the NOVELIST algorithm and the competing methods to share portfolios composed of the constituents of the FTSE 100 index. Similar competitions were previously conducted to compare the performance of different covariance matrix estimators (Ledoit and Wolf 2003; Lam 2016). We compare the performance for risk minimisation purposes. The data were provided by Bloomberg.
Standard deviation of minimum variance portfolios in percentage (daily and 5-min returns)
STD (daily returns) | STD (5-min returns) | |
---|---|---|
Sample | 1.256 | 10.675 |
Linear shrinkage | 0.851 | 7.809 |
Nonlinear shrinkage | 0.733 | 7.670 |
POET | 0.760 | 7.253 |
NOVELIST | 0.709 | 6.987 |
PC-adjusted NOVELIST | 0.715 | 8.577 |
Following the advice from Sect. 5.1, we apply fixed parameters for both NOVELIST and PC-adjusted NOVELIST. Table 6 shows the results. NOVELIST has the lowest risk for both daily and 5-min portfolios, followed by PC-adjusted NOVELIST and nonlinear shrinkage in the low-frequency case and by POET and nonlinear shrinkage in the high-frequency case. In summary, NOVELIST offers the best option in terms of risk minimisation.
8 Discussion
As many other covariance (correlation) matrix estimators which incorporate thresholding, the NOVELIST estimator is not guaranteed to be positive definite in finite samples. To remedy this, our advice is similar to other authors’ (e.g. Cai et al. 2010; Fan et al. 2013; Bickel and Levina 2008b): we propose to diagonalise the NOVELIST estimator and replace any eigenvalues that fall under a certain small positive threshold by the value of that threshold. How to choose the threshold is, of course, an important matter, and we do not believe there is a generally accepted solution in the literature, partly because the value of the “best” such threshold will necessarily be problem-dependent. Denoting the such corrected estimator by \({\hat{\varSigma }}^N(\zeta )\) (in the covariance case) and \({\hat{R}}^N(\zeta )\) (in the correlation case), where \(\zeta \) is the eigenvalue threshold, one possibility would be to choose the lowest possible \(\zeta \) for which the matrix \({\hat{\varSigma }}^N ({\hat{\varSigma }}^N(\zeta ))^{-1}\) (and analogously for the correlation case) resembles the identity matrix, in a certain user-specified sense.
We also note that either part of the NOVELIST estimator can be replaced by a banding-type estimator, for example, as defined by Cai et al. (2010). In this way, we would depart from the particular construction of the NOVELIST estimator towards the more general idea of using convex combinations of two (or more) covariance estimators, which is conceptually and practically appealing but lies outside the scope of the current work.
To summarise, the flexible control of the degree of shrinkage and thresholding offered by NOVELIST means that it is able to offer competitive performance across most models, and in situations in which it is not the best, it tends not to be much worse than the best performer. We recommend NOVELIST as a simple, good all-round covariance, correlation and precision matrix estimator ready for practical use across a variety of models and data dimensionalities.
Notes
Acknowledgements
Piotr Fryzlewicz’s work has been supported by the Engineering and Physical Sciences Research Council Grant No. EP/L014246/1.
References
- Alvarez I, Niemi J, Simpson M (2014) Bayesian inference for a covariance matrix. PreprintGoogle Scholar
- Bickel P, Levina E (2008a) Regularized estimation of large covariance matrices. Ann Stat 36:199–227MathSciNetCrossRefMATHGoogle Scholar
- Bickel P, Levina E (2008b) Covariance regularization by thresholding. Ann Stat 36:2577–2604MathSciNetCrossRefMATHGoogle Scholar
- Cai T, Liu W (2011) Adaptive thresholding for sparse covariance matrix estimation. J Am Stat Assoc 106:672–684MathSciNetCrossRefMATHGoogle Scholar
- Cai TT, Zhang C, Zhou HH (2010) Optimal rates of convergence for covariance matrix estimation. Ann Stat 38:2118–2144MathSciNetCrossRefMATHGoogle Scholar
- Chen C (1979) Bayesian inference for a normal dispersion matrix and its application to stochastic multiple regression analysis. J R Stat Soc Ser B 41:235–248MathSciNetMATHGoogle Scholar
- Croux C, Haesbroeck G (2000) Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87:603–618MathSciNetCrossRefMATHGoogle Scholar
- Dickey JM, Lindley DV, Press SJ (1985) Bayesian estimation of the dispersion matrix of a multivariate normal distribution. Commun Stat Theory Methods 14:1019–1034MathSciNetCrossRefMATHGoogle Scholar
- El Karoui N (2008) Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann Stat 36:2717–2756MathSciNetCrossRefMATHGoogle Scholar
- Evans IG (1965) Bayesian estimation of parameters of a multivariate normal distribution. J R Stat Soc Ser B 27:279–283MathSciNetMATHGoogle Scholar
- Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360MathSciNetCrossRefMATHGoogle Scholar
- Fan J, Fan Y, Lv J (2008) High dimensional covariance matrix estimation using a factor model. J Econ 147:186–197MathSciNetCrossRefMATHGoogle Scholar
- Fan J, Liao Y, Mincheva M (2013) Large covariance estimation by thresholding principal orthogonal complements. J R Stat Soc Ser B 75:603–680MathSciNetCrossRefGoogle Scholar
- Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188CrossRefGoogle Scholar
- Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–441CrossRefMATHGoogle Scholar
- Fryzlewicz P (2013) High-dimensional volatility matrix estimation via wavelets and thresholding. Biometrika 100:921–938MathSciNetCrossRefMATHGoogle Scholar
- Furrer R, Bengtsson T (2007) Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants. J Multivar Anal 98:227–255CrossRefMATHGoogle Scholar
- Gardner TS, di Bernardo D, Lorenz D, Collins JJ (2003) Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301:102–105CrossRefGoogle Scholar
- Golub GH, Van Loan CF (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, BaltimoreMATHGoogle Scholar
- Guo YQ, Hastie T, Tibshirani R (2007) Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8:86–100CrossRefMATHGoogle Scholar
- Jeong H, Mason SP, Barabási A-L, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411:41–42CrossRefGoogle Scholar
- Lam C (2016) Nonparametric eigenvalue-regularized precision or covariance matrix estimation. Ann Stat 44:928–953CrossRefMATHGoogle Scholar
- Lam C, Feng P (2017) Integrating regularized covariance matrix estimators. PreprintGoogle Scholar
- Ledoit O, Péché S (2011) Eigenvectors of some large sample covariance matrix ensembles. Probab Theory Relat Fields 151:233–264MathSciNetCrossRefMATHGoogle Scholar
- Ledoit O, Wolf M (2003) Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J Empir Finance 10:603–621CrossRefGoogle Scholar
- Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411MathSciNetCrossRefMATHGoogle Scholar
- Ledoit O, Wolf M (2012) Nonlinear shrinkage and estimation of large-dimensional covariance matrices. Ann Stat 4:1024–1060MathSciNetCrossRefMATHGoogle Scholar
- Ledoit O, Wolf M (2015) Spectrum estimation: a unified framework for covariance matrix estimation and PCA in large dimensions. J Multivar Anal 139:360–384MathSciNetCrossRefMATHGoogle Scholar
- Leonard T, John SJH (2012) Bayesian inference for a covariance matrix. Ann Stat 20:1669–1696MathSciNetCrossRefMATHGoogle Scholar
- Longerstaey J, Zangari A, Howard S (1996) Risk metrics\(^{TM}\)-technical document. Technical document. J.P. Morgan, New YorkGoogle Scholar
- Markowitz H (1952) Portfolio selection. J Finance 7:77–91Google Scholar
- Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34:1436–1462MathSciNetCrossRefMATHGoogle Scholar
- Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572CrossRefMATHGoogle Scholar
- Rothman AJ, Bickel P, Levina E, Zhu J (2008) Sparse permutation invariant covariance estimation. Electron J Stat 2:494–515MathSciNetCrossRefMATHGoogle Scholar
- Rothman AJ, Levina E, Zhu J (2009) Generalized thresholding of large covariance matrices. J Am Stat Assoc 104:177–186MathSciNetCrossRefMATHGoogle Scholar
- Savic RM, Karlsson MO (2009) Importance of shrinkage in empirical bayes estimates for diagnostics: problems and solutions. Am Assoc Pharm Sci 11:558–569Google Scholar
- Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomic. Stat Appl Genet Mol Biol 4:1544–6115MathSciNetCrossRefGoogle Scholar
- Wu WB, Pourahmadi M (2003) Nonparametric estimation in the gaussian graphical model. Biometrika 90:831–844MathSciNetCrossRefMATHGoogle Scholar
- Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429MathSciNetCrossRefMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.