1 Introduction

Markowitz (1952, 1959) developed the theoretical foundation for the modern portfolio theoryFootnote 1 providing investors with a tool to solve a key issue of how to distribute their wealth among a set of available assets. The problem was postulated as a choice of a portfolio mean return and variance of portfolio returns. This led to two principles where an investor for a given level of portfolio variance maximizes the portfolio return, or likewise for a given portfolio return minimizes the portfolio variance. Hence, according to Markowitz, an investor under such constraints needs only to be concerned about two moments of the assets multivariate distribution: the mean vector and the covariance matrix. In practical situations, these two quantities are unknown and must be estimated in order to perform the optimization. Many authors have shown that this optimization procedure often fails in practice (Bai et al. 2009; Best and Grauer 1991; Kempf et al. 2002; Merton 1980). Some authors go so far as to argue that the estimation error (sampling variance) dominates the procedure to the extent that the equally weighted non-random portfolio performs better than those optimized from data (Frankfurter et al. 1971; DeMiguel et al. 2009; Michaud 1989).

The estimation problem becomes particularly serious when the numbers of assets (say p) are close to the number of observations (n). This is mainly so because the sample covariance matrix becomes stochastically unstable and may not even be invertible. It then comes natural that the standard “plug-in” estimators, defined by simply replacing the unknown mean vector and covariance matrix by the standard text book estimators, should be replaced by some improved covariance estimator. A significant number of improvements over the plug-in estimator have been developed over the last decades (Frost and Savarino 1986; Ledoit and Wolf 2003, 2004). The vast majority of these fall into two categorizes or families of estimators. The first category is based on the fact that the sample covariance matrix is a poor approximation of the true covariance matrix and, therefore, the estimation problem is concerned with developing improved estimators of the covariance. These improved estimators are then simply substituted for the unknown parameter within Markowitz’s optimal weight function. The other approach, which appears to be the more common one during recent years, relies on principles developed by Stein (1956) and James and Stein (1961). With such estimators, the standard estimator is weighted (“shrunken”) toward a non-random target quantity. This type of estimators seems to have found a new renaissance in portfolio estimation theory, for which they appear to be particularly well suited. A sample of some recent developments includes Bodnar et al. (2018), Frahm and Memmel (2010), Golosnoy and Okhrin (2007), Kempf and Memmel (2006), Okhrin and Schmid (2007). Each of these methods naturally has its own merits and uses.

Investors want to know the basic properties and the relative risk of favoring one estimator over another before applying any specific method. There is, however, no consensus about how the concept of “risk” should be defined. From a statistical point of view, risk refers to some measure of the difference between a quantity of interest (which could be either random or fixed) and our inference target. Risk is usually expressed through moments of differences, such as the mean squared error (MSE), but could also involve forecast bias, or angles between true and estimated vectors. It is obvious that the optimality properties of any estimator depend on the specific risk measure, or quality criteria, being used to describe it. Indeed, it is well known that the ranking of estimators’ performance, such as estimators of the inverse covariance matrix, may change or even reverse when evaluated with alternative loss functions (Haff 1979; Muirhead 1982).

While most recent papers in portfolio optimization theory have been concerned with the extremely important problem of developing efficient estimators of portfolio weights and related quantities, this paper focuses on the risks associated with these estimators. Specifically, the purpose of this paper is to compare and contrast a number of risk measures of the GMVP estimator to give investors and developers of statistical methods a fair understanding of their differences and similarities and, hence, a foundation for determining the weight estimator that is best suited for a given specific problem.

The paper proceeds as follows. In Sect. 2, the problem of minimum-variance portfolio estimation is stated, Sect. 3 introduces the concept of risk function, and Sect. 4 classifies different GMVP estimators. Section 5 describes the Monte Carlo study design and provides a discussion of the derived results. Section 6 outlines two empirical applications, and Sect. 7 summarizes the findings and concludes.

2 Preliminaries

We consider the problem of constructing an investment portfolio \(\Pi \), defined as a weighted sum of risky assets, \({\mathbf {R}}={\left( {{R_{1}},\ldots ,{R_{p}}}\right) ^{\prime }}\). In order to construct a portfolio of assets, we define a vector of weights \({\mathbf {v}}\in {\mathbb {R}^{p}}\) under the common constraint \({\mathbf {v}^{\prime }}{\mathbf {1}}=1\), where \({\mathbf {1}}\text {:}\,p\times 1\) is a vector of ones. The mean and variance of \({\mathbf {R}}\) are defined by \(\varvec{\mu }:=E\left[ {\mathbf {R}}\right] \) and \({\varvec{\Sigma }}:=\mathrm{Cov}\left[ {\mathbf {R}}\right] \), respectively, where \({\varvec{\Sigma }}\in {\mathbb {R}^{p\times p}}\) is positive definite by assumption. The variance of the portfolio excess return is uniquely minimized by the global minimum-variance portfolio (GMVP) which is given by solving the following minimization problem:

$$\begin{aligned} \mathop {\mathbf {w}}\limits _{\left( {p\times 1}\right) }:=\mathop {\arg \min }\limits _{{\mathbf {v'1}} =1}\left( {\mathbf {v^{\prime }}} {\varvec{\Sigma }} {\mathbf {v}}\right) , \end{aligned}$$
(1)

and the well-known solution to (1) is given by:

$$\begin{aligned} \mathop {\mathbf {w}}\limits _{\left( {p\times 1}\right) }=\frac{{{ {\varvec{\Sigma }}^{-1}}{\mathbf {1}}}}{{{\mathbf {1}}^{\prime }{{\varvec{\Sigma }}^{-1}}{\mathbf {1}}}}. \end{aligned}$$
(2)

The expected return and the return variance of the global minimum-variance (MV) portfolio are given by

$$\begin{aligned} {\mu _\mathrm{MV}}={{\varvec{\mu }^{\prime }}}{\mathbf {w}}=\frac{{{\varvec{\mu }^{{{\mathbf {\prime }}}}}}{\varvec{\Sigma }^{-1}}{\mathbf {1}}}{{\mathbf {1^{\prime }}}{\varvec{\Sigma }^{-1}}{\mathbf {1}}} \end{aligned}$$

and

$$\begin{aligned} \mathop {\sigma _\mathrm{MV}^{2}}\limits _{}={\mathbf {w^{\prime }}}{\varvec{\Sigma }} {\mathbf {w}}=\frac{{\mathbf {1}}}{{\mathbf {1^{\prime }}}{{\varvec{\Sigma }}^{-1}}{\mathbf {1}}}. \end{aligned}$$

The weight vector \(\mathbf {w}\) depends on the unknown parameter \({{\varvec{\Sigma }}^{-1}}\) which needs to be estimated from data. The classical, or plug-in estimator, of the weight vector \(\mathbf {w}\) is obtained by replacing \({\varvec{\Sigma }}\) in (2) by the inverse of the sample covariance matrix \({\mathbf {S}}:={n^{-1}}\sum \nolimits _{i=1}^{n}{\left( {{{\mathbf {R}}_{i}}- {\bar{\mathbf {R}}}}\right) {{\left( {{{\mathbf {R}}_{i}}-{\bar{\mathbf {R}}}}\right) }^{\prime }}}\), where \({\bar{\mathbf {R}}}:={n^{-1}}\sum \nolimits _{i=1}^{n}{{\mathbf {R}}_{i}}\). We define this estimator as

$$\begin{aligned} {{\hat{\mathbf {w}}}_{\text {I}}}=\frac{{{{\mathbf {S}}^{-1}}{\mathbf {1}}}}{{{\mathbf {1}}^{\prime }{{\mathbf {S}}^{-1}}{\mathbf {1}}}}. \end{aligned}$$
(3)

The distribution of \({\hat{\mathbf {w}}}_{\text {I}}\) when sampling from a Gaussian distribution is well established (Okhrin and Schmid 2006; Bodnar and Zabolotskyy 2017). In particular, the estimator \({\mathbf {S}}^{-1}\) is, by the law of large numbers, a consistent estimator of \({\varvec{\Sigma }}^{-1}\) and the consistency of (3) follows directly. However, it is well known that \({\mathbf {S}}^{-1}\) is a poor approximation of \({{\varvec{\Sigma }}}^{-1}\) when the number of assets p is large relative to the number of observations n, and as a consequence \({\hat{\mathbf {w}}}_{\text {I}}\) will not adequately approximate \({\mathbf {w}}\). Because of this, a number of authors have suggested improved estimators of the GMVP weights. An obvious solution is to simply replace \({\mathbf {S}}^{-1}\) with a more efficient estimator of \({\varvec{\Sigma }}^{-1}\). Another important family of improved estimators is given by Stein-type estimators, which, in terms of our portfolio estimation problem, are of the form \({\hat{\mathbf {w}}}_{S}=a{\hat{\mathbf {w}}}+b{{\mathbf {w}}_{0}}\), where a and b are constants, and \({\mathbf {w}}_{0}\) is a pre-determined reference portfolio, which is usually defined non-random. If \(\left( {\hat{\mathbf {w}}}-{\mathbf {w}}_{0}\right) \) is small, the improvement in terms of the mean squared error \(E\left[ \left( {\hat{\mathbf {w}}}-{\mathbf {w}}\right) ^{\prime }\left( {\hat{\mathbf {w}}}-{\mathbf {w}}\right) \right] \) may be considerable.

3 The risk of portfolio estimators

Generally speaking, there are several ways to view the weight vector \(\mathbf {w}\) and the formulation of the inference problem. When deriving statistical estimators and inference procedures for portfolio weights, there is no consensus regarding which quantity to optimize. For example, since \({\mathbf {w}}\in {\mathbb {R}^{p}}\), it is natural to think of estimators in terms of estimating a parameter vector. Alternatively, upon noting that \({\mathbf {w}}=\left( {{\mathbf {1}}^{\prime }{{\varvec{\Sigma }}^{-1}}{\mathbf {1}}}\right) ^{-1}{\varvec{\Sigma }}^{-1}{\mathbf {1}}\) only depends on the unknown parameter \(\varvec{\Sigma }^{-1}\in \mathbb {R}^{p\times p}\), the inference problem may be thought of as one concerned with estimating a matrix, a problem rather different from that of estimating a vector. Yet another view of the inference problem is the out-of-sample prediction variance (Frahm and Memmel 2010). It is obvious that the properties of any estimator \({\hat{\mathbf {w}}}\) will depend on the quality criteria, or risk function, being used to judge it and that no single estimator can optimize all relevant properties simultaneously. In fact, the performance ranking of estimators of \({\mathbf {w}}\) may be changed or even reversed when evaluated on alternative loss functions (Haff 1979; Muirhead 1982). An investor searching the literature for “the best” estimator of the GMVP is likely to end up with a battery of proposed estimators, each being “optimal” in some sense. In this section, we will present and discuss similarities and differences between a number of risk functions for the GMVP problem, some of which are commonly used while others appears to be new in the GMVP context.

The \({L_{2}}\)-norm risk is commonly used to assess estimators of \({\varvec{\Sigma }}^{-1}\) alone, i.e., without respect to the bigger problem in which it is an ingredient. It is defined as follows:

$$\begin{aligned} {\mathfrak {R}_{0}}\left( {{\hat{\varvec{\Sigma }}}}^{-1}\right) :={p^{-1}}trE\left[ \left( {{\hat{{\varvec{\Sigma }}}}}^{-1}-{\varvec{\Sigma }^{-1}}\right) ^{2}\right] . \end{aligned}$$

The \({\mathfrak {R}_{0}}\left( {{\hat{{\varvec{\Sigma }}}}}^{-1}\right) \) risk is clearly naïve, in the sense that it is only indirectly related to the actual problem of estimating the optimal weight vector \(\mathbf {w}\). In other words, an estimator of \({\varvec{\Sigma }^{-1}}\) which is optimal with respect to \({\mathfrak {R}_{0}}\) need not perform very well when substituted into Eq. (2).

A risk function that is more adequate for the portfolio weight problem may be derived as follows: The denominator of \({\mathbf {w}}=\frac{{\varvec{\Sigma }^{-1}}{\mathbf {1}}}{{\mathbf {1'}}{\varvec{\Sigma }^{-1}}{\mathbf {1}}}\) is merely a scaling factor used to impose the length-one condition. The quantity \({\varvec{\Sigma }^{-1}}\mathbf {1}\) contains all necessary information for determining \({\mathbf {w}}\) and the task of estimating \({\mathbf {w}}\) reduces to that of estimating \({\varvec{\Sigma }^{-1}}{\mathbf {1}}\). A risk function for the minimum-variance portfolio estimator may accordingly be defined by

$$\begin{aligned} {\mathfrak {R}_{1}}\left( {{\hat{\varvec{\Sigma }}},{\varvec{\Omega }}}\right) :&={p^{-1}}E\left[ \left( {{\hat{{\varvec{\Sigma }}}}^{-1}}\mathbf {1}-{\varvec{\Sigma }}\mathbf {1}\right) ^{\prime }{\varvec{\Omega }}\left( {{{{\hat{{\varvec{\Sigma }}}}}^{-1}}{\mathbf {1}}-{{\varvec{\Sigma }}^{-1}}{\mathbf {1}}}\right) \right] \\&={p^{-1}}{\mathbf {1'}}\left\{ {E\left[ {\left( {{{\hat{{\varvec{\Sigma }}}}}^{-1}}-{{\varvec{\Sigma }}^{-1}}\right) {\varvec{\Omega }}\left( {{\hat{{\varvec{\Sigma }}}}}^{-1}-{\varvec{\Sigma }}^{-1}\right) }\right] }\right\} {\mathbf {1}} \end{aligned}$$

where \({\varvec{\Omega }}\) is a positive semi-definite non-random matrix. Two important special cases are given by

$$\begin{aligned} {\mathfrak {R}_{1}}\left( {{{\hat{{\varvec{\Sigma }}}}}^{-1}},{\mathbf {I}}\right) ={p^{-1}}{\mathbf {1^{\prime }}}\left\{ {E\left[ {\left( {{\hat{{\varvec{\Sigma }}}}}^{-1}-{{\varvec{\Sigma }}^{-1}}\right) }^{2}\right] }\right\} {\mathbf {1}} \end{aligned}$$

and

$$\begin{aligned} {\mathfrak {R}_{1}}\left( {{\hat{{\varvec{\Sigma }}}}}^{-1},{\mathbf {11^{\prime }}}\right) ={p^{-1}}E\left[ {\left( {\mathbf {1^{\prime }}}\left( {{\hat{{\varvec{\Sigma }}}}}^{-1}-{\varvec{\Sigma }}^{-1}\right) {\mathbf {1}}\right) }^{2}\right] . \end{aligned}$$

Frahm and Memmel (2010) utilize a somewhat different risk function defined by

$$\begin{aligned} {\mathfrak {R}_{2}}\left( {\hat{\mathbf {w}}},{\varvec{\Omega }}\right) :={p^{-1}}E\left[ {\left( {\hat{\mathbf {w}}}-{\mathbf {w}}\right) }^{\prime }{\varvec{\Omega }}\left( {\hat{\mathbf {w}}}-\mathbf {w}\right) \right] . \end{aligned}$$

The \(\mathfrak {R}_{2}\) risk function has been used frequently in the context of estimating mean value vectors and in regression analysis (Anderson 2003; Efron and Morris 1976; James and Stein 1961; Muirhead 1982; Serdobolskii 2000; Srivastava 2002). This risk function explicitly evaluates the second-order moment properties (variance plus squared bias) of an estimator \({\hat{\mathbf {w}}}\). It allows us to conveniently split mean square errors in different directions to \({\mathbf {w}}\) by appropriate choice of \(\varvec{\Omega }\).

Yet another risk criterion, the out-of-sample variance of \({\hat{\mathbf {w}}^{\prime }\mathbf {R}_{m}}\), advocated by Frahm and Memmel (2010), is somewhat different from \({\mathfrak {R}_{1}}\) and \({\mathfrak {R}_{2}}\). For some \({\mathbf {R}_{m}}\) not included in the estimate of \({\hat{\mathbf {w}}}\), this variance is determined by

$$\begin{aligned} {\mathfrak {R}_{3}}\left( {\hat{\mathbf {w}}}\right) =Var \left( {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{\mathbf {R}_{m}}}\right) . \end{aligned}$$
(4)

Following Frahm and Memmel (2010), this variance may be decomposed as follows:

$$\begin{aligned} \begin{gathered} {\mathfrak {R}_{3}}\left( {{\hat{\mathbf {w}}},{\mathbf {R}_{m}}}\right) =\mathrm{Var}\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{\mathbf {R}_{m}}}\right] ={E_{{\hat{\mathbf {w}}}}}\left[ {\mathrm{Var}\left[ {\left. {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{\mathbf {R}_{m}}}\right| {\hat{\mathbf {w}}}}\right] }\right] +\mathrm{Var}{_{{\hat{\mathbf {w}}}}}\left[ {E\left[ {\left. {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{\mathbf {R}_{m}}}\right| {\hat{\mathbf {w}}}}\right] }\right] \\ ={E_{{\hat{\mathbf {w}}}}}\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{{\varvec{\Sigma }}_{R}}{\hat{\mathbf {w}}}}\right] +\mathrm{Var}{_{{\hat{\mathbf {w}}}}}\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{{ {\varvec{\mu }}}_{R}}}\right] ={E_{{\hat{\mathbf {w}}}}}\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{{\varvec{\Sigma }}_{R}}{\hat{\mathbf {w}}}}\right] +{{ {\varvec{\mu }}}_{R}}^{\prime }\mathrm{Cov}\left[ {\hat{\mathbf {w}}}\right] {{\varvec{{\mu }}}_{R}}\\ =\frac{1}{{{\mathbf {1^{\prime }}}{\varvec{\Sigma }^{-1}}{\mathbf {1}}}}+E\left[ {{{\left( {{\hat{\mathbf {w}}}-{\mathbf {w}}}\right) }^{\prime }}{{\varvec{\Sigma }}_{R}}\left( {{\hat{\mathbf {w}}}-{\mathbf {w}}}\right) }\right] +{{ {\varvec{\mu }}}_{R}}^{\prime }\mathrm{Cov}\left[ {\hat{\mathbf {w}}}\right] {{ {\varvec{\mu }}}_{R}}.\\ \end{gathered} \end{aligned}$$

The out-of-sample variance may accordingly be decomposed into three terms. The first one, \(\frac{1}{{{\mathbf {1^{\prime }}}{{\varvec{\Sigma }}^{-1}}{\mathbf {1}}}}\), is the risk due to randomness of assets and hence not subject to estimation issues. Frahm and Memmel (2010) argue that the third term, \({{\varvec{\mu }}}_{R}^{\prime }\mathrm{Cov}\left[ {\hat{\mathbf {w}}}\right] {{ {\varvec{\mu }}}_{R}}\), is negligible in most practical situations, and hence that the term \({E_{{\hat{\mathbf {w}}}}}\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{{\varvec{\Sigma }}_{R}}{\hat{\mathbf {w}}}}\right] \) is the one of main interest to us. The decomposition of \({\mathfrak {R}_{3}}\) specified above is, however, not necessarily the most versatile one. An important concern with \({\mathfrak {R}_{3}}\) is that it is a measure of the variance of \({\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}\mathbf {R}_{m}\) but does not depend on the actual value that \(\mathbf {R}_{m}\) assigns.

A different expression of the out-of-sample variance may be obtained by conditioning on \(\mathbf {R}_{m}\). We define the conditional out-of-sample variance as follows:

$$\begin{aligned} {\mathfrak {R}_{4}}\left( {\hat{\mathbf {w}}},{\mathbf {r}_{m}}\right) :&=E\left[ {\left( {\left. {\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}\mathbf {R}_{m}-{\mathbf {w^{\prime }}}{\mathbf {R}_{m}}}\right] }\right| \mathbf {R}_{m}=\mathbf {r}_{\mathrm {m}}}\right) }^{2}\right] \\&=\mathrm{Var}\left[ {\left. {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{\mathbf {R}_{m}}}\right| \mathbf {R}_{m}=\mathbf {r}_{m}}\right] +\mathrm{Bias}^{2}\left[ {\left. {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{\mathbf {R}_{m}}}\right| \mathbf {R}_{m}=\mathbf {r}_{{\mathrm {m}}}}\right] . \end{aligned}$$

Although \({\mathfrak {R}_{3}}\) and \({\mathfrak {R}_{4}}\) both describe the out-of-sample variance of \({\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}{\mathbf {R}_{m}}\), the difference between them is that the latter conditions on \({\mathbf {R}_{m}}={\mathbf {r}_{m}}\) and also emphasizes the bias. It therefore allows for investigation of the portfolio risk when \({\mathbf {R}_{m}}\) assigns some specific value. The risk of a portfolio as a function of some estimator \({\hat{\mathbf {w}}}\) will be different when \({\mathbf {R}_{m}}\) assigns a value close to \(E\left[ {\mathbf {R}_{m}}\right] \) compared to a scenario when \({\mathbf {R}_{m}}\) assigns a value far from \(E\left[ {\mathbf {R}_{m}}\right] \). Hence, \({\mathfrak {R}_{4}}\) allows us to investigate the risk when, for example, the market reacts dramatically to a specific event. This matter becomes particularly important when using an estimator which is biased because \(E\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}\mathbf {R}_{m}}\right] ={E_{\mathbf {r}}}\left[ {E\left[ {\left. {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}\mathbf {R}_{m}}\right| \mathbf {R}_{m}=\mathbf {r}_{m}}\right] }\right] =E\left[ {\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}}\right] {{\varvec{{\mu }}}_{R}}\) regardless of a possible bias in \({\hat{\mathbf {w}}}\). In contrast, the forecast bias conditioned on \(E\left[ {{\hat{\mathbf{w}}^{{{\mathbf {\prime }}}}-{\mathbf {w}}^{{\mathbf {\prime }}}}}\right] \mathbf {r}_{m}\) may be considerable since it increases linearly with \(\mathbf {r}_{m}\). The concept of explicitly conditioning on a specific return value \(\mathbf {r}_{m}\) provides us with the possibility of assessing the behavior of a weight estimator under certain scenarios. For example, investors may be particularly interested in cases when the market is turbulent (\(\left\| \mathbf {r}_{m}\right\| \) is large relative to \(\sigma _{R}^{2}\)), the market reacts additively to an event (say \(\mathbf {r}_{m}\mapsto \mathbf {a}+\mathbf {r}_{m}\), \(\mathbf {a}\in \mathbb {R}_{+}^{p}\)) or when the market is stable (\(\mathbf {r_{\mathrm {m}}}\approx E\left[ \mathbf {R}_{m}\right] \)).

Remark (i) (Directional risks) Any estimator of the GMVP may be decomposed into components orthogonal and parallel to \({\mathbf {w}}\): Let \({\mathbf{A}}^{+}\) denote the Moore–Penrose pseudoinverse of some matrix \(\mathbf {A}\). Then, the component of \({\hat{\mathbf {w}}}\) parallel to \(\mathbf {w}\) is given by \({\mathbf{v}}=\left( {{{\mathbf{w}}^{+}} {\hat{\mathbf{w}}}}\right) {\mathbf{w}}=\left( \frac{\mathbf{w'}\hat{\mathbf{w}}}{{\mathbf{w'w}}}\right) {\mathbf{w}}\) and the component orthogonal to \(\mathbf {w}\) is determined by \({\mathbf{u}}={{\varvec{\Omega }}_{\bot }}{\hat{\mathbf{w}}}\) where \({\varvec{\Omega }}_{\bot }=\left( {{\mathbf {I}}-\left( {\frac{{\mathbf {ww^{\prime }}}}{{\mathbf {w^{\prime }w}}}}\right) }\right) \) is a projection matrix (Rao 2008, pp. 46–47). We can thus decompose \({\hat{\mathbf {w}}}\) according to \({\hat{\mathbf {w}}}={\mathbf {v}}+{\mathbf {u}}\), where \({\mathbf {v}}\) is parallel to \({\mathbf {w}}\) and \({\mathbf {u}}\) is orthogonal to \({\mathbf {w}}\). Some special cases of the above-defined risk functions in the direction orthogonal to the GMVP, which is our direction of main interest, are then given by \({\mathfrak {R}_{1}}\left( {{{\hat{{\varvec{\Sigma }}}}^{-1}},{{\varvec{\Omega }}_{\bot }}}\right) \), \({\mathfrak {R}_{2}}\left( {{{\hat{{\varvec{\Sigma }}}}^{-1}},{{\varvec{\Omega }}_{\bot }}}\right) \), \({\mathfrak {R}_{3}}\left( {\mathbf{u}}\right) \) and \({\mathfrak {R}_{4}}\left( {{\mathbf{u}},{\mathbf {r}_{m}}}\right) \). Although the risk in a certain direction to \({\mathbf {w}}\) alone is of limited interest, it does provide some insight into the relative performance of one estimator to another. For example, it may be shown that \(E\left[ \left( {\hat{\mathbf {w}}}_{I}-{\mathbf {w}}\right) ^{\prime }{\varvec{\Omega }}_{\bot }\left( {\hat{\mathbf {w}}}_{I}-{\mathbf {w}}\right) \right] =\frac{1}{\left( {n-p+1}\right) }\frac{{1}}{{{\mathbf {1^{\prime }}}{{\varvec{\Sigma }}}^{-1}{\mathbf {1}}}}\left( {\mathrm{tr}\left\{ {\varvec{\Sigma }}^{-1}\right\} -\left( \frac{{\mathbf {1^{\prime }}}{\varvec{\Sigma }}^{-3}{\mathbf {1}}}{{\mathbf {1^{\prime }}}{{\varvec{\Sigma }}}^{-2}{\mathbf {1}}}\right) }\right) ,\) whereas \(\left( {\mathbf {w}}_{0}-{\mathbf {w}}\right) ^{\prime }{\varvec{\Omega }_{\bot }}\left( {\mathbf {w}}_{0}-{\mathbf {w}}\right) ={p^{-1}}\left( {1-\frac{{{p^{-1}}{\left( {\mathbf {1^{\prime }}}{\varvec{\Sigma }}^{-1}{\mathbf {1}}\right) }^{2}}}{{\mathbf {1^{\prime }}}{{\varvec{\Sigma }}^{-2}}{\mathbf {1}}}}\right) \) (see Appendix A).

Remark (ii) (Implicit Covariance Matrix) There always exists a p.d. diagonal matrix \({\varvec{\Lambda }}\) such that \({\mathbf{P1}}= {{{\varvec{\Lambda }} \mathbf{P}}}{{\mathbf{w}}_{0}}\) or, equivalently, \({{\mathbf{w}}_{0}}={\mathbf{P}}{{\varvec{\Lambda }}^{-1}}{\mathbf{P^{\prime }1}}\), where \({{\mathbf{w}}_{0}}\) is any reference portfolio (Frahm and Memmel 2010, Theorem 8). The Stein-type estimator defined by \({{\hat{\mathbf{w}}}_{S}}=\left( {1-\alpha }\right) {{\hat{\mathbf{w}}}_{I}}+\alpha {{\mathbf{w}}_{0}}\) is therefore associated with an “implicit” covariance matrix estimator in the sense that there exists a matrix \({\hat{{\varvec{\Sigma }}}}_{S}^{-1}\) such that \({{\hat{\mathbf{w}}}_{S}}= \frac{\hat{{\varvec{\Sigma }}}_{S}^{-1}{} \mathbf{1}}{\mathbf{1}^{\mathbf {\prime }}{\hat{\varvec{\Sigma }}}_{S}^{-1}{} \mathbf{1}}\), where \({\hat{{\varvec{\Sigma }}}}_{S}^{-1}=\left( {1-\alpha }\right) {{\mathbf{S}}^{-1}}+ \alpha {\varvec{\Sigma }}_{0}^{-1}\), \(0\le \alpha \le 1\), and \({\hat{{\varvec{\Sigma }}}}_{S}^{-1}\) may, or may not, be positive definite. If we define our implicit covariance matrix by \({\varvec{\Sigma }}_{S}^{-1}=\left( {1-\alpha }\right) {{\mathbf{S}}^{-1}}+\alpha {\varvec{\Sigma }}_{0}^{-1}=\left( {1-\alpha }\right) {{\mathbf{S}}^{-1}}+\alpha {p^{-1}}{\mathbf{I}}\left( {{\mathbf{1^{\prime }}}{{\mathbf{S}}^{-1}}{\mathbf{1}}}\right) \), we obtain the identity \({\hat{\mathbf{w}}_{s}=\frac{{{{\hat{{\varvec{\Sigma }}}}_{S}}^{-1}{\mathbf{1}}}}{{{\mathbf{1^{\prime }}}{{\hat{{\varvec{\Sigma }}}}_{S}}^{-1}{\mathbf{1}}}}=\frac{{\left\{ {\left( {1-\alpha }\right) {{\mathbf{S}}^{-1}}+\alpha {p^{-1}}{\mathbf{I}}\left( {{\mathbf{1^{\prime }}}{{\mathbf{S}}^{-1}}{\mathbf{1}}}\right) }\right\} {\mathbf{1}}}}{{{\mathbf{1'}}\left\{ {\left( {1-\alpha }\right) {{\mathbf{S}}^{-1}}+\alpha {p^{-1}}{\mathbf{I}}\left( {{\mathbf{1^{\prime }}}{{\mathbf{S}}^{-1}}{\mathbf{1}}}\right) }\right\} {\mathbf{1}}}}=}\)\(\left( {1-\alpha }\right) \frac{{{{\mathbf{S}}^{-1}}{\mathbf{1}}}}{{{\mathbf{1'}}{{\mathbf{S}}^{-1}}{\mathbf{1}}}}+\alpha {p^{-1}}{\mathbf{1}}=\left( {1-\alpha }\right) {{\hat{\mathbf{w}}}_{I}}+\alpha {{\mathbf{w}}_{0}}\). This identity allows us to investigate the risk of \({{\hat{\mathbf{w}}}_{S}}\) with respect to any risk function designed for estimators of the (inverse) covariance matrices. For example, although \({\mathfrak {R}_{0}}\left( {{\hat{{\varvec{\Sigma }}}}^{-1}}\right) \) is a function of \({{\hat{{\varvec{\Sigma }}}}^{-1}}\) and does not involve an explicit weight estimator, it is nevertheless possible to evaluate \({{\hat{\mathbf{w}}}_{S}}\) via \({\mathfrak {R}_{0}}\left( {{\hat{{\varvec{\Sigma }}}}_{S}^{-1}}\right) \).

4 Families of GMVP weight estimators

We will consider a few estimators for further investigation. The first one is the “standard estimator” defined in Eq. (3). The next is suggested by Frahm and Memmel (2010) and is defined as

$$\begin{aligned} {{\hat{\mathbf {w}}}_{\text {II}}}=\left( {1-{\kappa _{m}}}\right) {{\hat{\mathbf {w}}}_{\text {I}}}+{\kappa _{m}}{{\mathbf {w}}_{0}}, \end{aligned}$$
(5)

where \({{\mathbf {w}}_{0}}:={p^{-1}}{\mathbf {1}}\), \({\kappa _{m}}=\min \left[ {{\kappa _{s}},1}\right] \), \({\kappa _{s}}=\frac{{p-3}}{{n-p+2}}{\left[ {\frac{{{{\mathbf{w}}_{0}}^{\prime }{\mathbf{S}}{{\mathbf{w}}_{0}}-{{\hat{\mathbf{w}}}_{I}}^{\prime }{\mathbf{S}}{{\hat{\mathbf{w}}}_{I}}}}{{{{\hat{\mathbf{w}}}_{I}}^{\prime }{\mathbf{S}}{{\hat{\mathbf{w}}}_{I}}}}}\right] ^{-1}}\). The estimator in Eq. (5) may be thought of as a weighted average between the traditional estimator and a reference portfolio, here represented by \({{\mathbf {w}}_{0}}:={p^{-1}}{\mathbf {1}}\), although the reference portfolio could essentially be any non-random portfolio such that \({{\mathbf {w}}_{0}}^{\prime }{\mathbf {1}}=1\).

An alternative estimator within the same family was proposed by Bodnar et al. (2018) who suggested the estimator

$$\begin{aligned} {{\hat{\mathbf {w}}}_{\text {III}}}=\left( {1-{\kappa _{b}}}\right) {{\hat{\mathbf{w}}_{\mathrm {I}}}}+{\kappa _{b}}{{\mathbf{w}}_{0}} \end{aligned}$$

where \({\kappa _{b}}=1-\frac{{\left( 1-c\right) \left( {\left( {1-c}\right) {\mathbf{b^{\prime }}}{\mathbf{S}}\,{\mathbf{b1^{\prime }}}\,{\mathbf{S}}_{}^{-1}{\mathbf{1}}-1}\right) }}{{c+\left( {1-c}\right) \left( {\left( {1-c}\right) {\mathbf{b^{\prime }}}{\mathbf{S}}\,{\mathbf{b1^{\prime }}}\,{\mathbf{S}}_{}^{-1}{\mathbf{1}}-1}\right) }}\) and \(c={p/n}\).

Although the estimators \({{\hat{\mathbf{w}}_{\mathrm {II}}}}\) and \({{\hat{\mathbf{w}}_{\mathrm {III}}}}\) have shown great potential in improving the standard estimator \(\hat{\mathbf {w}}_{\mathrm {I}}\), improved estimators can be developed from a variety of different points of view. In particular, resolvent-type estimators, defined by \({{\hat{{\varvec{\Sigma }}}}}_{k}^{-1}={\left( {{\mathbf {S}}+k{\mathbf {I}}}\right) ^{-1}}\), \(k\in \mathbb {R}_{+}\), have shown great potential in estimating the precision matrix, particularly in high-dimensional settings (Holgersson and Karlsson 2012; Serdobolskii 1985).

These estimators add a small constant to the eigenvalues before inversion, thereby creating a more stable estimator. They play an important role in spectral analysis (Serdobolskii 1985) but have also proved to be efficient in more applied problems (Holgersson and Karlsson 2012). The “regularizing” coefficient k imposes a (small) bias on estimators of the precision matrix and hence offers a form of variance-bias trade-off rather different from the Stein-type estimators. Since the poor performance of the standard plug-in estimator is largely due to high sample variance in the precision matrix, the resolvent estimators are interesting candidates for improved estimation of the GMVP.

While the Stein-type estimators depend on the regularizing coefficient \(\kappa \), the estimator \({{\hat{{\varvec{\Sigma }}}}}_{k}^{-1}\) depends on the coefficient k, which usually has to be determined from data. We will use the \({\mathfrak {R}_{0}}\) risk to derive the optimal value of k, i.e., we search for the value of k which minimize \({\mathfrak {R}_{0}}\left( {{\hat{{\varvec{\Sigma }}}}}_{k}^{-1}\right) =E\left\{ {p^{-1}}tr{\left( {{\hat{{\varvec{\Sigma }}}}}_{k}^{-1}-{\varvec{\Sigma }}^{-1}\right) ^{2}}\right\} \). A consistent estimate of \({\mathfrak {R}_{0}}\left( {{\hat{{\varvec{\Sigma }}}}}_{k}^{-1}\right) \) has been derived by Serdobolskii (2000), defined as follows:

$$\begin{aligned} {\hat{\mathfrak {R}}_{0}}\left( {{\hat{{\varvec{\Sigma }}}}_{k}^{-1}}\right) =\,&{\hat{\varvec{\Lambda }}_{-2}}-2{k^{-1}}\left( {{p^{-1}}-{n^{-1}}}\right) tr\left( {{{\mathbf{S}}^{-1}} {\tilde{{\varvec{\Sigma }}}}_{k}^{-1}}\right) \\&+2{k^{-1}}{p^{-1}}{n^{-1}}{\left( {\mathrm{tr}{\tilde{{\varvec{\Sigma }}}}_{k}^{-1}}\right) ^{2}}+{k^{-2}}{p^{-1}}\mathrm{tr}{\tilde{{\varvec{\Sigma }}}}_{k}^{-2}, \end{aligned}$$

where \({\tilde{{\varvec{\Sigma }}}}_{k}^{-1}={\left( {{\mathbf{I}}+{k^{-1}}{\mathbf{S}}}\right) ^{-1}}\), and \({\hat{\varvec{\Lambda }}_{-2}}={\left( {1-p{n^{-1}}}\right) ^{2}}{p^{-1}}\mathrm{tr}\left( {{\mathbf{S}}^{-2}}\right) +\left( {1-p{n^{-1}}}\right) {p^{-1}}{n^{-1}}{\left( {\mathrm{tr}\left( {{\mathbf{S}}^{-1}}\right) }\right) ^{2}}\). The optimal value of k is defined by \(\hat{k}=\mathop {\min }\limits _{k}{\hat{\mathfrak {R}}_{0}}\left( {{\hat{{\varvec{\Sigma }}}}_{k}^{-1}}\right) \) which is obtained numerically. A feasible resolvent-type portfolio estimator is then defined by

$$\begin{aligned} {{\hat{\mathbf{w}}}_{\mathrm {IV}}}=\frac{{{\hat{\varvec{\Sigma }}}_{\hat{k}}^{-1}{\mathbf{1}}}}{{\mathbf{1}}^{\prime }{\hat{{\varvec{\Sigma }}}}_{\hat{k}}^{-1}{\mathbf{1}}}. \end{aligned}$$
(6)

Defined by Eqs. (3)–(6), we have a set of different estimators of the GMVP weights. These will be used in a Monte Carlo simulation of the next section in order to compare the risk functions and the performances of estimators.

5 Monte Carlo study

To investigate the efficiency of GMVP estimators from a risk perspective, we conduct Monte Carlo experiments using three different data generating processes (DGP I, DGP II and DGP III). DGP I is based on a multivariate normal distribution with different covariance structures and zero mean vector, and DGP II is based on a multivariate skewed t distribution with mean vector equal to a zero vector. DGP III is based on a skewed distribution with nonzero mean vector. DGP I is specified as follows:

$$\begin{aligned} {\mathbf {R}_{t}}\sim {N_{p}}\left( \mathbf {0},\varvec{\Sigma }\right) , \end{aligned}$$

where \({{\mathbf {R}}_{t}}\) is a vector of p different assets returns in time period t, \(\left\{ {\mathbf {R}_{t}}\right\} _{t=1}^{n}\) is independent and identically distributed (IID), where n corresponds to the number of observations. The specification of \(\varvec{\Sigma }\) will assign different values in the simulations. We will use a Toeplitz covariance structure given by

$$\begin{aligned} {\varvec{\Sigma }}=\left[ {\begin{array}{ccccc} 1 &{} {\phi ^{1}} &{} {\phi ^{2}} &{} \cdots &{} {\phi ^{p-1}}\\ {\phi ^{1}} &{} \ddots &{} \ddots &{} \ddots &{} \vdots \\ {\phi ^{2}} &{} \ddots &{} \ddots &{} \ddots &{} {\phi ^{2}}\\ \vdots &{} \ddots &{} \ddots &{} \ddots &{} {\phi ^{1}}\\ {\phi ^{p-1}} &{} \cdots &{} {\phi ^{2}} &{} {\phi ^{1}} &{} 1 \end{array}}\right] , \end{aligned}$$

and also a covariance matrix estimated from stock return data.

For DGP II, we first define the distribution of the vector of returns as

$$\begin{aligned} {\mathbf{R}_{t}}\sim \mathrm{Skewed}\;{t_{p}}\left( {\upsilon , {\varvec{\lambda }},{\varvec{\Omega }},{\varvec{\gamma }}}\right) , \end{aligned}$$

where \({\varvec{\lambda }}\in {\mathbb {R}{}^{p}}\) and \({\varvec{\gamma }}\in {\mathbb {R}{}^{p}}\) are parameter vectors, \({\varvec{\Omega }}\in {\mathbb {R}{}^{p\times p}}\) is positive definite and \(\upsilon >4\). When \({\varvec{\gamma }}\ne {\mathbf{0}}\), this yields a p dimensional skewed t distribution. There exists a number of different multivariate distributions in the literature that all share the name multivariate skewed t distribution (Kotz and Nadarajah 2004). In this paper, we use the larger class of multivariate normal mixture distributions to get a skewed multivariate distribution which is referred to as a skewed multivariate t distribution (Demarta and McNeil 2005). This is achieved by setting

$$\begin{aligned} {{\mathbf{R}}_{t}}={\varvec{\lambda }}+W_{t}^{-1}{\varvec{\gamma }}+W_{t}^{-0.5}{{\mathbf{X}}_{t}} \end{aligned}$$

where \({{\mathbf{X}}_{t}}\sim {N_{p}}\left( {{\mathbf{0}},{\varvec{\Omega }}}\right) \) and \(W_{t}^{-1}\sim {\mathrm{Inverse}}{\mathrm{Gamma}}\left( {{\nu /2},{\nu /2}}\right) \) which is independent of \({\mathbf{X}}_{t}\). The first moments of \({{\mathbf{R}}_{t}}\) are given by

$$\begin{aligned} E\left[ {{\mathbf{R}}_{t}}\right] ={E_{W_{t}^{-1}}}\left[ {E\left[ {{{\mathbf{R}}_{t}}\left| {W_{t}^{-1}}\right. }\right] }\right] ={\varvec{\lambda }}+\frac{\upsilon }{{\upsilon -2}}{\varvec{\gamma }}, \end{aligned}$$
$$\begin{aligned} {\mathrm{Cov}}\left[ {{\mathbf{R}}_{t}}\right]&={E_{W_{t}^{-1}}}\left[ {{\mathrm{Cov}}\left( {{{\mathbf{R}}_{t}}\left| {W_{t}^{-1}}\right. }\right) }\right] +{\mathrm{Co}}{{\mathrm{v}}_{W_{t}^{-1}}}\left( {E\left[ {{{\mathbf{R}}_{t}}\left| {W_{t}^{-1}}\right. }\right] }\right) \\&=\frac{\upsilon }{{\upsilon -2}}{\varvec{\Omega }}+\frac{{2{\upsilon ^{2}}{\beta ^{2}}}}{{{{\left( {\upsilon -2}\right) }^{2}}\left( {\upsilon -4}\right) }} {\varvec{\gamma \gamma '}}. \end{aligned}$$

(For details refer to Appendix B).

We assume that all stock returns are skewed to the same extent which in turn is achieved by setting \({\varvec{\gamma }}=\beta {{\mathbf{1}}_{p}}\). Further, it is also of interest to have the DGP centered at the zero vector and this is achieved by \({\varvec{\lambda }}=\beta {\upsilon /{\left( {\upsilon -2}\right) }}{{\mathbf{1}}_{p}}\), hence the DGP II is specified as \({{\mathbf{R}}_{t}}\sim {\mathrm{Skewed}}\;{t_{p}}\left( {\upsilon ,{\varvec{\lambda }}=\beta {\upsilon /{\left( {\upsilon -2}\right) }}{{\mathbf{1}}_{p}},{\varvec{\Omega }},{\varvec{\gamma }}=\beta {{\mathbf{1}}_{p}}}\right) \), with moment

$$\begin{aligned} E\left[ {{\mathbf{R}}_{t}}\right] ={\mathbf{0}}, \end{aligned}$$

and

$$\begin{aligned} {\mathrm{Cov}}\left( {{\mathbf{R}}_{t}}\right) =\frac{\upsilon }{{\upsilon -2}}{\varvec{\Omega }}+\frac{{2{\upsilon ^{2}}{\beta ^{2}}}}{{{{\left( {\upsilon -2}\right) }^{2}}\left( {\upsilon -4}\right) }}{\mathbf{11'}}. \end{aligned}$$

Following the procedure of Holgersson and Mansoor (2013), DGP III is specified as follows: Let \({Z_{0}}\sim \chi _{\left( 1\right) }^{2},{Q_{j}}\sim \chi _{\left( 1\right) }^{2},{U_{j}}\sim \chi _{\left( 1\right) }^{2}\) , \(j=1,\ldots ,p\), where all variables are mutually and individually independent. Then, each variable \(R_{it}\) in \(\mathbf {R}_{t}\) is equal to \(R_{it}=Z_{0t}+Q_{jt}+U_{jt}\), and hence, each variable in \(\mathbf {R}_{t}\) has a \(\chi _{\left( 3\right) }^{2}\) distribution and covariance structure: \(\mathrm{Cov}(R_{t})=\left[ \begin{array}{cccc} 6 &{} 2 &{} \cdots &{} 2\\ 2 &{} \ddots &{} \ddots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} 2\\ 2 &{} \cdots &{} 2 &{} 6 \end{array}\right] \). The specifications of DGP I, II and III are summarized in Tables 1 and 2.

Table 1 Specification of the distribution of the stochastic terms in DGP I, II III
Table 2 Design of the Monte Carlo experiments used for DGP I, II III
Table 3 Specifications of the conditioned observation \(\mathbf {R}_{t}\) used in \(\mathfrak {R}_{4}\)

Finally, as performance measures we take the five risk functions \(\left( {{\mathfrak {R}_{0}}-{\mathfrak {R}_{4}}}\right) \), and for each estimator \(\left( {\hat{\mathbf {w}}}_{\text {II}},{\hat{\mathbf {w}}}_{\text {III}},{\hat{\mathbf {w}}}_{\text {IV}}\right) \) we divide its risk by the corresponding risk for \({\hat{\mathbf {w}}}_{{\text {I}}}\) to get their relative performance. Furthermore, for \({\mathfrak {R}_{4}}\) we choose three different conditioned returns, specified in Table 3.

Table 4 MC simulation results for DGP I with \({\varvec{\varvec{\Sigma }}}\) according to Table 1 (i), \((p=100)\)

5.1 Results from Monte Carlo simulations

Based on the results from the Monte Carlo experiments displayed in Tables 4, 5, 6 and 7, the estimator \({{\hat{\mathbf {w}}}_{\text {IV}}}\) performs well if c is larger than 0.1 for both DGP I with covariance structure from real data and for DGP II. But for DGP I with a Toeplitz covariance structure, \({{\hat{\mathbf {w}}}_{\text {IV}}}\) performs well only for c close to one. On the other hand, estimator \({{\hat{\mathbf {w}}}_{\text {III}}}\) performs best among the four estimators. This holds for all investigated values of c. Furthermore, \({{\hat{\mathbf {w}}}_{\text {II}}}\) is a good estimator if c is not close to one, and its performance is close to \({{\hat{\mathbf {w}}}_{\text {III}}}\). However, as c gets close to one, \({{\hat{\mathbf {w}}}_{\text {II}}}\) is outperformed both by \({{\hat{\mathbf {w}}}_{\text {III}}}\) and \({{\hat{\mathbf {w}}}_{\text {IV}}}\).

If we examine the estimators performance under \({\mathfrak {R}_{4}}\left( {R_{t}+l}\right) \) and \({\mathfrak {R}_{4}}\left( {2R_{t}}\right) \), both of which could reflect a shock in the financial market, the result for the estimators remains in the same internal ordering as indicated by \({\mathfrak {R}_{3}}\). Thus, with regard to the discussed results, the recommendation is that one should primarily consider \({{\hat{\mathbf {w}}}_{\text {III}}}\) because it performs well regardless of the value on c, unless c is very close to one, in which case \({{\hat{\mathbf {w}}}_{\text {IV}}}\) is dominating. It should, however, be stressed that the above performance rankings are made only on basis of point estimations. While more general inferences such as interval estimation lies outside the scope of this paper, it should be mentioned that the estimators’ performances in terms of, for example, coverage rates need not correspond to their point estimation efficiency.

Table 5 MC simulation results for DGP I with \({\varvec{\Sigma }}\) according to Table 1 (ii), \((p=100)\)
Table 6 MC simulation results for DGP II with \({\varvec{\Sigma }}\) according to Table 1, \((p=100)\)
Table 7 MC simulation results for DGP III with \({\varvec{\Sigma }}\) according to Table 1, \((p=100)\)

6 Empirical study

The empirical evaluation of the investigated estimators of the weights in the GMVP is achieved through a moving window approach on two different data sets for which we use different sampling methods. In the first method (fixed sampling method), we simply apply the estimators on all available assets, and in the second approach (random sampling method), we repeatedly randomly pick a given number of assets and then evaluate the estimators performance by the one-period out-of-sample returns. The reason for applying a moving window is that the mean-variance portfolio theory was developed as a one-period model.

6.1 Fixed sample

The evaluation procedure in the fixed sampling method is as follows: For each stock listed on the stock exchange, we take n observations starting at time point \(t-n\) and ending at time point t. We then calculate monthly returns, and based on these observations, each estimator presented in this paper is used to estimate the weights of the global minimum-variance portfolio. For each estimator, the return on the GMVP is calculated for the first out-of-sample observation, i.e., the observation at time period \(t+1\) for each stock. We repeat this procedure, but the starting point is moved one step forward in time (starting at time point \(t-n+1\) and ending at time point \(t+1\)). The procedure is repeated until 10 sample returns are generated. Thus, for each estimated portfolio weight vector, the one-period out-of-sample portfolio return is calculated as:

$$\begin{aligned} {R_{t+1,j}}=\hat{\mathbf {w}}_{j}^{\prime }{{\mathbf {R}}_{t+1}},\,\;j=I,\ldots , V, \end{aligned}$$

where \({{\mathbf {R}}_{t+1}}\) is a \(p\times 1\) vector of stock excess returns observed in time period \(t+1\), \({{\hat{\mathbf {w}}}_{j}}\) is a \(p\times 1\) vector of estimated weights for the GMVP and \(\hat{\mathbf {w}}_{V}\) corresponds to an equally weighted portfolio.

The evaluation of each estimator of the GMVP weights is based on the portfolio risk measured by the standard deviations of the portfolio returns:

$$\begin{aligned} {\hat{\sigma }_{\text {portfolio},j}}{=}\sqrt{\frac{1}{{10}}\sum \nolimits _{t=1}^{10}{\left( {R_{t,j}}-{\bar{R}}_{j}\right) }^{2}},{\text { }\text {where }}{\bar{R}_{j}}{=}\frac{1}{{10}}\sum \nolimits _{t=1}^{10}{R_{t,j}},\,\;j{=}I,\ldots ,V.\nonumber \\ \end{aligned}$$
(7)

In addition, we also calculate the out-of-sample Sharpe ratio

$$\begin{aligned} \hat{SR_{j}}=\frac{\bar{R}_{j}}{{\hat{\sigma }_{\text {portfolio},j}}},\,\;j=I,\ldots ,V. \end{aligned}$$
(8)

Example 1

Stocks listed on the Nasdaq stock exchange

For this empirical application, 89 stocks with complete past values are selected from SP100 over the period 1997-04 to 2010-07 (159 monthly returns). Note that gross returns are used, i.e., the risk-free return is not subtracted. A moving window approach is employed with a length of 149 months, giving 10 out-of-sample returns.

Example 2

Stocks listed on the Stockholm stock exchange

In this empirical application, 112 stocks are selected out of 283 stocks listed on the Stockholm OMX stock exchange over the period 1997-01 to 2010-06 (161 monthly returns). The same procedure as described above is used with a moving window length of 151 months.

Fig. 1
figure 1

Smoothing coefficients for estimators \(\hat{\mathbf {w}}_{\text {II}}\), \(\hat{\mathbf {w}}_{\text {III}}\) applied to data from Nasdaq and Stockholm OMX stock exchange

The results of the empirical applications (Table 8) confirm what was already established in the Monte Carlo simulations of Sect. 5. That is, in the empirical application using data from the Nasdaq stock exchange in which c is around 0.6, the performances of \({\hat{\sigma }_{\text {portfolio},j}}\) for \(\hat{\mathbf {w}}_{\text {I}}\), \(\hat{\mathbf {w}}_{\text {II}}\), \(\hat{\mathbf {w}}_{\mathrm {III}}\) and \(\hat{\mathbf {w}}_{\text {IV}}\) are relatively close to each other. However, for the Stockholm stock exchange in which c is around 0.74, we find that both \(\hat{\mathbf {w}}_{\text {III}}\) and \(\hat{\mathbf {w}}_{\text {IV}}\) yield portfolios with much lower \({\hat{\sigma }_{\text {portfolio},j}}\) compared to \(\hat{\mathbf {w}}_{\text {I}}\) and \(\hat{\mathbf {w}}_{\text {II}}\). In both settings, \(\hat{\mathbf {w}}_{\text {I}}\) is outperformed by all other estimators. On the other hand, if we shift measure from out-of-sample standard deviation to the out-of-sample Sharpe ratio, the performance is somewhat reversed in that \(\hat{\mathbf {w}}_{\text {II}}\) performs better than \(\hat{\mathbf {w}}_{\text {IV}}\) and the equally weighted portfolio, \(\hat{\mathbf {w}}_{\text {V}}\), is then outperforming all investigated estimators. To get a better understanding of the relative behavior of the two Stein-type estimators, i.e., \(\hat{\mathbf {w}}_{\text {II}}\) and \(\hat{\mathbf {w}}_{\text {III}}\), their smoothing coefficients are displayed in Fig. 1. The values of the smoothing coefficients are approximately the same, but the estimator \(\hat{\mathbf {w}}_{\text {II}}\) tends to weight heavier toward the traditional estimator (\(\hat{\mathbf {w}}_{\text {I}}\)), while the estimator \(\hat{\mathbf {w}}_{\text {III}}\) puts less weight on the traditional estimator.

Table 8 Performance of GMVP estimators applied to portfolios from Nasdaq and Stockholm OMX stock exchange, fixed sampling method

6.2 Random samples

The evaluation procedure in the random sampling method is similar to the fixed sampling method, with the difference that we are now randomly selecting without replacement \(p=20,50,80\) stocks and apply the moving window approach. This procedure is then repeated 100 times which results in 100 time-series of 10 one-period out-of-sample returns of the GMVP for each estimator. Note that c is kept constant when shifting from the different portfolio sizes by adjusting n. Based on Eqs. (7) and (8), we calculate the average out-of-sample variance, average out-of-sample mean return and the average out-of-sample Sharpe ratio of the 100 replications:

$$\begin{aligned}&{\mathrm{Mean}}\left( {{\hat{\sigma }}_{{\mathrm{portfolio}},j}}\right) =\frac{1}{{100}}\sum \limits _{i=1}^{100}{{\hat{\sigma }}_{{\mathrm{portfolio}},i,j}},\quad j=\text {I},\ldots ,\text {V}, \\&{\mathrm{Mean}}\left( {{\bar{R}}_{j}}\right) =\frac{1}{{100}}\sum \limits _{i=1}^{100}{{\hat{\sigma }}_{{\mathrm{portfolio}},i,j}},\quad j=\text {I},\ldots ,\text {V}. \end{aligned}$$

In addition to the above we also calculate the mean out-of-sample Sharpe ratio as

$$\begin{aligned} {\mathrm{Mean}}\left( {{\mathop {SR}\limits ^{\wedge }}_{{\mathrm{portfolio}},j}}\right) =\frac{1}{{100}}\sum \limits _{i=1}^{100}{\frac{{{\bar{R}}_{i,j}}}{{{\hat{\sigma }}_{{\mathrm{portfolio}},i,j}}}},\quad j=\text {I},\ldots ,\text {V}. \end{aligned}$$

Example 3

Stocks listed on the Nasdaq stock exchange (random sampling method)

For this empirical application \(p=20,50,80\) stock are randomly selected out of 89 stocks with complete past values from SP100 over the period 1997-04 to 2010-07 (159 monthly returns). A moving window approach is employed with \(c=0.537\)\((n=159,103,47)\), giving 10 out-of-sample returns. This procedure is repeated 100 times.

Table 9 Performance of GMVP estimators applied to portfolios from Nasdaq stock exchange, random sampling method

Example 4

Stocks listed on the Stockholm stock exchange (random sampling method)

In this empirical application \(p=20,50,80\) stocks are randomly selected out of 112 stocks used in Example 2. The same procedure as described before is used where a moving window is employed with \(c=0.53\)\((n=161,104,48)\). The results of the empirical applications based on random sampling method (Tables 9 and 10) also confirm what was established in the empirical applications based on fixed sampling method. Namely, using data from the Nasdaq stock exchange in which c is around 0.5, the performances of \({{\hat{\sigma }}_{{\mathrm{portfolio}},j}}\) for \(\hat{\mathbf {w}}_{\text {I}}\), \(\hat{\mathbf {w}}_{\text {II}}\), \(\hat{\mathbf {w}}_{\mathrm {III}}\) and \(\hat{\mathbf {w}}_{\text {IV}}\) are relatively close to each other but \(\hat{\mathbf {w}}_{\text {II}}\), \(\hat{\mathbf {w}}_{\mathrm {III}}\) and \(\hat{\mathbf {w}}_{\text {IV}}\) outperform \(\hat{\mathbf {w}}_{\text {I}}\) and \(\hat{\mathbf {w}}_{\text {V}}\). Shifting the evaluation measure to the out-of-sample Sharpe ratio yields a different picture, since now \(\hat{\mathbf {w}}_{\text {V}}\) is outperforming the other estimators while \(\hat{\mathbf {w}}_{\text {I}}\) yields consistently the lowest result. But for the Stockholm stock exchange in which c is also around 0.5, we find that \(\hat{\mathbf {w}}_{\mathrm {III}}\) yield portfolios with lower \({{\hat{\sigma }}_{{\mathrm{portfolio}},j}}\) compared to \(\hat{\mathbf {w}}_{\text {IV}}\) and \(\hat{\mathbf {w}}_{\text {II}}\), and all three estimators are outperforming the regular estimator \(\hat{\mathbf {w}}_{\text {I}}\). Hence, in both settings, \(\hat{\mathbf {w}}_{\text {I}}\) is outperformed by \(\hat{\mathbf {w}}_{\text {II}}\), \(\hat{\mathbf {w}}_{\mathrm {III}}\) and \(\hat{\mathbf {w}}_{\text {IV}}\). On the other hand, if we shift measure from out-of-sample standard deviation to out-of-sample Sharpe ratio, the estimator \(\hat{\mathbf {w}}_{\text {V}}\) is still outperforming all the other estimators and the regular estimator \(\hat{\mathbf {w}}_{\text {I}}\) yields the lowest Sharpe ratio. It is also found that among \(\hat{\mathbf {w}}_{\text {II}}\), \(\hat{\mathbf {w}}_{\mathrm {III}}\) and \(\hat{\mathbf {w}}_{\text {IV}},\) the performance is similar, but \(\hat{\mathbf {w}}_{\mathrm {III}}\) has the highest Sharpe ratio.

Table 10 Performance of GMVP estimators applied to portfolios from Stockholm stock exchange, random sampling method

7 Summary

The global minimum-variance portfolio (GMVP) solution developed by Markowitz is considered to be a fundamental concept in portfolio theory. The early researchers investigating this matter usually applied a simple plug-in estimator for estimating the weights and paid very little attention to the distributional property of the estimator. More recently, the full distribution of the standard estimator has been derived (Okhrin and Schmid 2006), and it is now recognized that the standard estimator offers a poor approximation of the true GMVP. Within a relatively short period of time, a variety of improvements to the standard estimator have been developed. Naturally, each of these improvements has its pros and cons, but there does not seem to be a consensus about how to evaluate the performance, or efficiency, of GMVP estimators. Perhaps this is because there are, in fact, several possible measures one can use for assessing the properties of a portfolio estimator. In this paper, we discuss a number of different risk functions for the weight estimator. These include: risk functions of covariance matrix estimators, forecast mean square errors, directional risks and conditional risks. The risk functions are labeled with an index determined by the degree to which they are specialized for portfolio estimation: \(\mathfrak {R}_{2}\) is generally preferred over \({\mathfrak {R}_{1}}\) which is preferred over \(\mathfrak {R}_{0}\) etc. However, this ordering does not mean that \({\mathfrak {R}_{2}}\) is uniformly better than \({\mathfrak {R}_{1}}\) and \({\mathfrak {R}_{0}}\). For example, \({\mathfrak {R}_{4}}\) does not exist in closed form for the regularized portfolio estimator used in this paper. Hence, \({\hat{\mathbf {w}}_{IV}}\) has to be optimized through \(\mathfrak {R}_{0}\) rather than \({\mathfrak {R}_{4}}\). In other words, one would typically use \({\mathfrak {R}_{4}}\) or \({\mathfrak {R}_{3}}\) as a tool for deriving an estimator of \({\mathbf {w}_\mathrm{GMVP}}\), but there are settings where risk functions of lower rank-order must be used because of their simpler functional form. A selection of recent GMVP estimators is used in a Monte Carlo simulation for purposes of: (i) comparing different risk measures for a given estimator and (ii) comparing different estimators for a given risk. Moreover, a new estimator, based on a resolvent estimator, is proposed. The analysis focuses on asset data where the number of observations (n) is comparable to the number of assets (p). This case is important because investors might be reluctant to use long data sets as the economy is not expected to be stable over long time periods, and hence, investors are encountering a high-dimensional setting. The simulations are complemented by an analysis of two real data sets: One data set is drawn from the Nasdaq stock exchange, and the other one employs Stockholm OMX data. The general finding of the paper is that no estimator dominates uniformly over all risk functions. We can, however, establish that there are dominating tendencies, in the sense that some estimators tend to perform better with respect to most risk aspects. A Stein-type estimator developed by Frahm and Memmel (2010) is found to perform well in cases when \(n\gg p\), whereas another Stein-type estimator proposed by Bodnar et al. (2018) dominates when n is proportional to p. A resolvent-type estimator is found to perform surprisingly well over a large number of settings. While this paper is restricted to properties of point estimators, future research could involve more general inferential aspects, such as hypotheses testing.