# Bayes minimax competitors of preliminary test estimators in k sample problems

• Ryo Imai
• Tatsuya Kubokawa
• Malay Ghosh
Article

## Abstract

In this paper, we consider the estimation of a mean vector of a multivariate normal population where the mean vector is suspected to be nearly equal to mean vectors of $$k-1$$ other populations. As an alternative to the preliminary test estimator based on the test statistic for testing hypothesis of equal means, we derive empirical and hierarchical Bayes estimators which shrink the sample mean vector toward a pooled mean estimator given under the hypothesis. The minimaxity of those Bayesian estimators are shown, and their performances are investigated by simulation.

## Keywords

Admissibility Decision theory Empirical Bayes Hierarchical Bayes k sample problem Minimaxity Pooled estimator Preliminary test estimator Quadratic loss Shrinkage estimator Uniform prior.

## 1 Introduction

Suppose that there are k laboratories, say Laboratory L$$_1$$, $$\ldots$$, L$$_k$$, and a certain instrument is designed to measure several characteristics at each laboratory and several vector-valued measurements are recorded. Also, suppose that we want to estimate the population mean of Laboratory $$L_1$$. When similar instruments are used at k laboratories, it is suspected that k population means are nearly equal, in which case, the sample means of the other laboratories L$$_2$$, $$\ldots$$, L$$_k$$ are used to produce more efficient estimators than the sample mean based on data from only L$$_1$$. This problem was studied by Ghosh and Sinha (1988) and recently revisited by Imai et al. (2017) in the framework of simultaneous estimation of k population means.

The k sample problem described above is expressed in the following canonical form: p-variate random vectors $${\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k$$ and positive scalar random variable S are mutually independently distributed as
\begin{aligned} {\varvec{\text {X}}}_i\sim & {} \mathcal{N}_p({{\varvec{{\mu }}}}_i, {\sigma }^2{{\varvec{\text {V}}}}_i),\quad \text {for}\ i=1, \ldots , k,\nonumber \\ S/{\sigma }^2\sim & {} \chi _n^2, \end{aligned}
(1)
where the p-variate means $${{\varvec{{\mu }}}}_1, \ldots , {{\varvec{{\mu }}}}_k$$ and the scale parameter $${\sigma }^2$$ are unknown, and $${{\varvec{\text {V}}}}_1, \ldots , {{\varvec{\text {V}}}}_k$$ are $$p\times p$$ known and positive definite matrices. In this model, we consider to estimate $${{\varvec{{\mu }}}}_1$$ relative to the quadratic loss function:
\begin{aligned} L({\varvec{{\delta }}}_1, {{\varvec{{\mu }}}}_1, {\sigma }^2) = \Vert {\varvec{{\delta }}}_1-{{\varvec{{\mu }}}}_1\Vert _{{{\varvec{\text {Q}}}}}^2/{\sigma }^2 =({\varvec{{\delta }}}_1-{{\varvec{{\mu }}}}_1)^\top {{\varvec{\text {Q}}}}({\varvec{{\delta }}}_1-{{\varvec{{\mu }}}}_1)/{\sigma }^2, \end{aligned}
(2)
where $$\Vert {{\varvec{\text {a}}}}\Vert _{{\varvec{\text {A}}}}^2={{\varvec{\text {a}}}}^\top {{\varvec{\text {A}}}}{{\varvec{\text {a}}}}$$ for the transpose $${{\varvec{\text {a}}}}^\top$$ of $${{\varvec{\text {a}}}}$$, $${{\varvec{\text {Q}}}}$$ is a positive definite and known matrix, and $${\varvec{{\delta }}}_1$$ is an estimator of $${{\varvec{{\mu }}}}_1$$. Estimator $${\varvec{{\delta }}}_1$$ is evaluated by the risk function $$R({\varvec{{\omega }}}, {\varvec{{\delta }}}_1)=E[L({\varvec{{\delta }}}_1, {{\varvec{{\mu }}}}_1, {\sigma }^2)]$$ for $${\varvec{{\omega }}}=({{\varvec{{\mu }}}}_1, \ldots , {{\varvec{{\mu }}}}_k, {\sigma }^2)$$, a set of unknown parameters.
In this paper, we consider the case that the means $${{\varvec{{\mu }}}}_i$$’s are suspected to be nearly equal, namely close to the hypothesis
\begin{aligned} H_0 : {{\varvec{{\mu }}}}_1=\cdots = {{\varvec{{\mu }}}}_k. \end{aligned}
A classical approach towards solution to this problem is the development of a preliminary test estimator which uses the pooled mean estimator upon acceptance of the null hypothesis $$H_0$$, and uses separate mean estimators upon rejection of $$H_0$$. The test statistic for $$H_0$$ is
\begin{aligned} F=\sum _{i=1}^k \Vert {\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_i^{-1}}^2/S, \end{aligned}
(3)
where $${\widehat{{{\varvec{{\nu }}}}}}$$ is the pooled estimator defined as
\begin{aligned} {\widehat{{{\varvec{{\nu }}}}}}={{\varvec{\text {A}}}}\sum _{i=1}^k {{\varvec{\text {V}}}}_i^{-1}{\varvec{\text {X}}}_i, \quad {{\varvec{\text {A}}}}=\left (\sum _{i=1}^k{{\varvec{\text {V}}}}_i^{-1}\right )^{-1}. \end{aligned}
(4)
The preliminary test estimator of $${{\varvec{{\mu }}}}_1$$ is
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{PT}= \left\{ \begin{array}{ll} {\varvec{\text {X}}}_1 & \quad \text {if}\ F>(p(k-1)/n)F_{p(k-1), n, {\alpha }}\\ {\widehat{{{\varvec{{\nu }}}}}}& \quad \text {otherwise}, \end{array}\right. \end{aligned}
(5)
where $$F_{p(k-1), n, {\alpha }}$$ is the upper $${\alpha }$$ point of the F distribution with $$(p(k-1), n)$$ degrees of freedom, and $${\widehat{{{\varvec{{\nu }}}}}}$$ is the pooled estimator given in (4). However, the preliminary test estimator is not smooth and does not necessarily improve on $${\varvec{\text {X}}}_1$$.

As an alternative approach to the preliminary test estimator, we consider a Bayesian method under the prior distributions of $${{\varvec{{\mu }}}}_i$$ having a common mean $${{\varvec{{\nu }}}}$$ as a hyper-parameter. Empirical and hierarchical Bayes estimators are derived when the uniform prior distribution is assumed for $${{\varvec{{\nu }}}}$$. We also provide empirical Bayes estimator under the assumption of the normal prior distribution for $${{\varvec{{\nu }}}}$$. It is shown that these Bayesian estimators improve on $${\varvec{\text {X}}}_1$$, namely, are minimax.

The topic treated in this paper is related to the so-called Stein problem in simultaneous estimation of multivariate normal means. This problem has long been of interest in the literature. See, for example, Stein (1956, 1981), James and Stein (1961), Strawderman (1971, 1973), Efron and Morris (1973, 1976) and Berger (1985). For recent articles extending to prediction and high-dimensional problems, see Komaki (2001), Brown et al. (2008) and Tsukuma and Kubokawa (2015). As articles related to this paper, Sclove et al. (1972) showed the inadmissibility of the preliminary test estimator, Smith (1973) provided Bayes estimators in one-way and two-way models, Ghosh and Sinha (1988) derived hierarchical and empirical Bayes estimators with minimaxity, and Sun (1996) provided Bayesian minimax estimators in multivariate two-way random effects models.

In Sect. 2, we treat two classes of shrinkage estimators, say Class 1 and Class 2. Estimators in Class 1 shrink $${\varvec{\text {X}}}_1$$ toward the pooled estimator $${\widehat{{{\varvec{{\nu }}}}}}$$, and estimators in Class 2 incorporate a part of shrinking $${\widehat{{{\varvec{{\nu }}}}}}$$ in addition to estimators in Class 1. Ghosh and Sinha (1988) obtained conditions for minimaxity of estimators in Class 1 in the two sample problem. For $$k=2$$, the test statistic F can be simply written as $$F=({\varvec{\text {X}}}_1-{\varvec{\text {X}}}_2)^\top ({{\varvec{\text {V}}}}_1+{{\varvec{\text {V}}}}_2)^{-1}({\varvec{\text {X}}}_1-{\varvec{\text {X}}}_2)/S$$, while F is more complicated as $$F=\sum _{i=1}^k ({\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_i^{-1}({\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}})/S$$ in the k sample case. This means that the minimaxity is harder to establish in the k sample problem. The key tool in the extension is the inequality given in Lemma 2.2. Using the inequality, we obtain conditions for minimaxity of shrinkage estimators in Class 1 and Class 2.

In Sect. 3, we suggest three kinds of Bayesian methods for estimation of $${{\varvec{{\mu }}}}_1$$. Empirical and hierarchical Bayes estimators are derived under the assumption of the uniform prior distribution for $${{\varvec{{\nu }}}}$$. Because these estimators belong to Class 1, we can provide conditions for their minimaxity using the result in Sect. 2. Also empirical Bayes estimator is obtained under the normal prior distribution for $${{\varvec{{\nu }}}}$$ and belongs to Class 2. Conditions for the minimaxity are given from the result in Sect. 2. The performances of those Bayesian procedures are investigated by simulation in Sect. 5. The results in Sect. 2 are extended in Sect. 4 to the problem of estimating a linear combination of $${{\varvec{{\mu }}}}_1, \ldots , {{\varvec{{\mu }}}}_k$$ under the quadratic loss. Concluding remarks are given in Sect. 6.

## 2 Classes of minimax estimators

Motivated from the preliminary test estimator (5), we consider a class of estimators shrinking $${\varvec{\text {X}}}_1$$ toward the pooled estimator $${\widehat{{{\varvec{{\nu }}}}}}$$, given by
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1(\phi ) = {\varvec{\text {X}}}_1 - {\phi (F, S)\over F}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}), \end{aligned}
(6)
where $$\phi (F,S)$$ is an absolutely continuous function, F is the test statistic given in (3) and $${\widehat{{{\varvec{{\nu }}}}}}$$ is the pooled estimator $${\widehat{{{\varvec{{\nu }}}}}}$$ given in (4).

We can derive conditions on $$\phi (F,S)$$ for minimaxity of $${\widehat{{\varvec{{\mu }}}}}_1(\phi )$$ in Theorem 2.1, which is proved in the end of this section.

### Theorem 2.1

Assume the conditions
\begin{aligned} \mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}\ne 0\quad \mathrm{and}\quad \mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}>2. \end{aligned}
(7)
Then, the estimator $${\widehat{{\varvec{{\mu }}}}}_1(\phi )$$ is minimax relative to the quadratic loss (2) if $$\phi (F,S)$$ satisfies the following conditions:
1. (a)

$$\phi (F,S)$$ is non-decreasing in F and non-increasing in S.

2. (b)

$$0<\phi (F,S)\le 2 [\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}-2]/(n+2)$$, where $$\mathrm{Ch}_\mathrm{max}({{\varvec{\text {C}}}})$$ denotes the maximum characteristic value of matrix $${{\varvec{\text {C}}}}$$.

This theorem provides an extension of Ghosh and Sinha (1988) to the k-sample problem. In the case of $${{\varvec{\text {V}}}}_1=\cdots ={{\varvec{\text {V}}}}_k={{\varvec{\text {Q}}}}^{-1}$$, the conditions in (7) are expressed as $$k\not = 1$$ and $$p>2$$, and the latter condition is well known in the Stein problem.

We next consider the class of double shrinkage estimators
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1(\phi , \psi ) = {\varvec{\text {X}}}_1 - {\phi (F, S)\over F}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})- {\psi (G, S)\over G}{\widehat{{{\varvec{{\nu }}}}}}, \end{aligned}
(8)
where $$\phi (F,S)$$ and $$\psi (G,S)$$ are absolutely continuous functions, and
\begin{aligned} G=\Vert {\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {A}}}}^{-1}}^2/S. \end{aligned}

### Theorem 2.2

Assume condition (7) and
\begin{aligned} \mathrm{{Ch}}_\mathrm{{max}}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})\ne 0 \quad \mathrm{and}\quad \mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})/\mathrm{Ch}_\mathrm{max}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})>2. \end{aligned}
(9)
Then, the double shrinkage estimator $${\widehat{{\varvec{{\mu }}}}}_1(\phi , \psi )$$ in (8) is minimax relative to the quadratic loss (2) if $$\phi (F,S)$$ and $$\psi (G,S)$$ satisfy the following conditions:
1. (a)

$$\phi (F,S)$$ is non-decreasing in F and non-increasing in S.

2. (b)

$$0<\phi (F,S)\le [\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}-2]/(n+2)$$.

3. (c)

$$\psi (G,S)$$ is non-decreasing in G and non-increasing in S.

4. (d)

$$0<\psi (G,S)\le \{\mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})/\mathrm{Ch}_\mathrm{max}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})-2\}/(n+2)$$.

For the proofs, the Stein identity due to Stein (1981) and the chi-square identity due to Efron and Morris (1976) are useful. See also (Bilodeau and Kariya(1989) for a multivariate version of the Stein identity.

### Lemma 2.1

1. (1)
Assume that $${{\varvec{\text {Y}}}}=(Y_1, \ldots , Y_p)^\top$$ is a p-variate random vector having $$\mathcal{N}_p({{\varvec{{\mu }}}}, {{\varvec{{{\Sigma }}}}})$$ and that $${{\varvec{\text {h}}}}(\cdot )$$ is an absolutely continuous function from $$\mathfrak {R}^p$$ to $$\mathfrak {R}^p$$. Then, the Stein identity is given by
\begin{aligned} E[({{\varvec{\text {Y}}}}-{{\varvec{{\mu }}}})^\top {{\varvec{\text {h}}}}({{\varvec{\text {Y}}}})]=E[ \mathrm{tr\,}\{{{\varvec{{{\Sigma }}}}}{{\varvec{{\nabla }}}}_{{\varvec{\text {Y}}}}{{\varvec{\text {h}}}}({{\varvec{\text {Y}}}})^\top \}], \end{aligned}
(10)
provided the expectations in both sides exist, where $${{\varvec{{\nabla }}}}_{{\varvec{\text {Y}}}}=(\partial /\partial Y_1, \ldots , \partial /\partial Y_p)^\top$$.

2. (2)
Assume that S is a random variable such that $$S/{\sigma }^2\sim \chi _n^2$$ and that $$g(\cdot )$$ is an absolutely continuous function from $$\mathfrak {R}$$ to $$\mathfrak {R}$$. Then, the chi-square identity is given by
\begin{aligned} E[S g(S)] = {\sigma }^2 E[n g(S) + 2Sg'(S)], \end{aligned}
(11)
provided the expectations in both sides exist.

### Proof of Theorem 2.1

The risk function is decomposed as
\begin{aligned} R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1(\phi )) =&E[\Vert {\varvec{\text {X}}}_1-{{\varvec{{\mu }}}}_1\Vert _{{\varvec{\text {Q}}}}^2/{\sigma }^2] -2 E\Big [({\varvec{\text {X}}}_1-{{\varvec{{\mu }}}}_1)^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}){\phi \over {\sigma }^2 F}\Big ] + {1\over {\sigma }^2} E\Big [ {\phi ^2\over F^2}\Vert {\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{\varvec{\text {Q}}}}^2\Big ]\nonumber \\ =&I_1 - 2 I_2 + I_3, \quad \quad \text {(say)} \end{aligned}
(12)
for $$\phi =\phi (F,S)$$ and $$\psi =\psi (G,S)$$.
It is easy to see $$I_1=\mathrm{tr\,}({{\varvec{\text {V}}}}_1{{\varvec{\text {Q}}}})$$. Note that F is expressed as $$F=(\sum _{i=1}^k{\varvec{\text {X}}}_i^\top {{\varvec{\text {V}}}}_i^{-1}{\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {A}}}}^{-1}{\widehat{{{\varvec{{\nu }}}}}})/S$$. Letting $${{\varvec{{\nabla }}}}_1=\partial /\partial {\varvec{\text {X}}}_1$$, we have $${{\varvec{{\nabla }}}}_1 F=2{{\varvec{\text {V}}}}_1^{-1}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})/S$$ and
\begin{aligned} {{\varvec{{\nabla }}}}_1\Big \{({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {\phi (F,S)\over F} \Big \}=({{\varvec{\text {I}}}}-{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {A}}}}) {\phi \over F}+2{{\varvec{\text {V}}}}_1^{-1}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top \Big \{ - {\phi \over F^2} +{ \phi _F\over F}\Big \}\frac{1}{S}, \end{aligned}
where $$\phi _F=(\partial /\partial F)\phi (F,S)$$. Then from (10), it is seen that
\begin{aligned} I_2 =&E\Big [\mathrm{tr\,}\Big [{{\varvec{\text {Q}}}}{{\varvec{\text {V}}}}_1{{\varvec{{\nabla }}}}_1 \Big \{({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {\phi (F,S)\over F}\Big \}\Big ]\Big ] \nonumber \\ =&E\Bigg[ \mathrm{tr\,}\{ {{\varvec{\text {Q}}}}{{\varvec{\text {V}}}}_1 ({{\varvec{\text {I}}}}-{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {A}}}}) \} {\phi \over F} + 2{\mathrm{tr\,}\{{{\varvec{\text {Q}}}}{{\varvec{\text {V}}}}_1{{\varvec{\text {V}}}}_1^{-1}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top \}\over S}\Big \{ - {\phi \over F^2} +{ \phi _F\over F}\Big \} \Bigg ] \nonumber \\ =&E\Bigg [ \mathrm{tr\,}\{ ({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}{\phi \over F} - 2 B{\phi \over F} + 2 B\phi _F\Bigg ], \end{aligned}
(13)
where
\begin{aligned} B= ({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})/\sum _{j=1}^k ({\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_j^{-1}({\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}}). \end{aligned}
(14)
Also, from (11), it is observed that
\begin{aligned} I_3=&{1\over {\sigma }^2} E\Big [{S\over F}B\phi ^2\Big ] \nonumber \\ =&E\Big [{n\over F}B\phi ^2+ 2SB\Big (-{F\over S}\Big ) \Big \{ - {\phi ^2\over F^2}+2{\phi \phi _F\over F}\Big \} + 2S B{\phi \phi _S\over F} \Big ] \nonumber \\ =&E\Big [B{n+2\over F}\phi ^2- 4B\phi \phi _F + 4S B{\phi \phi _S\over F} \Big ]. \end{aligned}
(15)
Thus, the risk function is expressed as
\begin{aligned} R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1(\phi )) =&E\Big [ \mathrm{tr\,}({{\varvec{\text {V}}}}_1{{\varvec{\text {Q}}}}) + {\phi \over F}[(n+2)B\phi - 2 \mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}+4B] \nonumber \\&-4B\phi _F-4B\phi \phi _F+4S B{\phi \phi _S\over F} \Big ]. \end{aligned}
(16)
Using Lemma 2.2 given below, we can see that
\begin{aligned} B \le {({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}) \over ({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top ({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}})^{-1}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})}\le \mathrm{Ch}_\mathrm{max}(({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}), \end{aligned}
which implies that
\begin{aligned} R(&{\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1(\phi )) - \mathrm{tr\,}({{\varvec{\text {V}}}}_1{{\varvec{\text {Q}}}}) \nonumber \\ \le&E\Bigg [\mathrm{Ch}_\mathrm{max}(({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}){\phi \over F}\Bigg \{(n+2)\phi - 2 {\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}\over \mathrm{Ch}_\mathrm{max}(({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}})}+4\Bigg \} \nonumber \\&-4B\phi _F-4B\phi \phi _F+4S B{\phi \phi _S\over F} \Bigg ]. \end{aligned}
(17)
Hence, $$R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1(\phi )) \le \mathrm{tr\,}({{\varvec{\text {V}}}}_1{{\varvec{\text {Q}}}})$$ under the conditions in Theorem 2.1.

### Lemma 2.2

It holds that
\begin{aligned} \sum _{j=1}^k({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_j^{-1}({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}}) \ge ({{\varvec{\text {x}}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top ({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}})^{-1}({{\varvec{\text {x}}}}_1-{\widehat{{{\varvec{{\nu }}}}}}). \end{aligned}
(18)

### Proof

Let $${{\varvec{\text {C}}}}=({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}})^{-1}$$ and $${{\varvec{\text {x}}}}_*=\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j$$. Then, $${\widehat{{{\varvec{{\nu }}}}}}={{\varvec{\text {A}}}}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {x}}}}_1+{{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*$$ and $${{\varvec{\text {x}}}}_1-{\widehat{{{\varvec{{\nu }}}}}}={{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {x}}}}_1-{{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*$$. The RHS of (18) is rewritten as
\begin{aligned} ({{\varvec{\text {x}}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {C}}}}({{\varvec{\text {x}}}}_1-{\widehat{{{\varvec{{\nu }}}}}})= {{\varvec{\text {x}}}}_1^\top {{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {x}}}}_1 - 2{{\varvec{\text {x}}}}_1^\top {{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_* + {{\varvec{\text {x}}}}_*^\top {{\varvec{\text {A}}}}{{\varvec{\text {C}}}}{{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*. \end{aligned}
(19)
On the other hand, it can be observed that
\begin{aligned} ({{\varvec{\text {x}}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_1^{-1}({{\varvec{\text {x}}}}_1-{\widehat{{{\varvec{{\nu }}}}}})=&{{\varvec{\text {x}}}}_1^\top {{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {x}}}}_1 - 2{{\varvec{\text {x}}}}_1^\top {{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_* + {{\varvec{\text {x}}}}_*^\top {{\varvec{\text {A}}}}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*,\\ \sum _{j=2}^k({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_j^{-1}({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}}) =&\sum _{j=2}^k{{\varvec{\text {x}}}}_j^\top {{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j -2{{\varvec{\text {x}}}}_*^\top {{\varvec{\text {A}}}}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {x}}}}_1 - 2{{\varvec{\text {x}}}}_*^\top {{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*\\&+{{\varvec{\text {x}}}}_1^\top {{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {A}}}}({{\varvec{\text {A}}}}^{-1}-{{\varvec{\text {V}}}}_1^{-1}){{\varvec{\text {A}}}}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {x}}}}_1 + 2{{\varvec{\text {x}}}}_1{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {A}}}}({{\varvec{\text {A}}}}^{-1}-{{\varvec{\text {V}}}}_1^{-1}){{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*\\&+ {{\varvec{\text {x}}}}_*^\top {{\varvec{\text {A}}}}({{\varvec{\text {A}}}}^{-1}-{{\varvec{\text {V}}}}_1^{-1}){{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*, \end{aligned}
which gives
\begin{aligned} \sum _{j=1}^k&({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_j^{-1}({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}})\nonumber \\ =&{{\varvec{\text {x}}}}_1^\top {{\varvec{\text {V}}}}_1^{-1}\{{{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {C}}}}^{-1}+{{\varvec{\text {A}}}}({{\varvec{\text {A}}}}^{-1}-{{\varvec{\text {V}}}}_1^{-1}){{\varvec{\text {A}}}}\}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {x}}}}_1\nonumber \\&-2{{\varvec{\text {x}}}}_1^\top {{\varvec{\text {V}}}}_1^{-1}\{{{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}+{{\varvec{\text {I}}}}-{{\varvec{\text {A}}}}({{\varvec{\text {A}}}}^{-1}-{{\varvec{\text {V}}}}_1^{-1})\}{{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_*\nonumber \\&-{{\varvec{\text {x}}}}_*^\top {{\varvec{\text {A}}}}{{\varvec{\text {x}}}}_* + \sum _{j=2}^k{{\varvec{\text {x}}}}_j^\top {{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j. \end{aligned}
(20)
It is noted that $${{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}{{\varvec{\text {C}}}}^{-1}+{{\varvec{\text {A}}}}({{\varvec{\text {A}}}}^{-1}-{{\varvec{\text {V}}}}_1^{-1}){{\varvec{\text {A}}}}={{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}$$, $${{\varvec{\text {C}}}}^{-1}{{\varvec{\text {V}}}}_1^{-1}+{{\varvec{\text {I}}}}-{{\varvec{\text {A}}}}(|A^{-1}-{{\varvec{\text {V}}}}_1^{-1})={{\varvec{\text {I}}}}$$ and $${{\varvec{\text {A}}}}{{\varvec{\text {C}}}}{{\varvec{\text {A}}}}+{{\varvec{\text {A}}}}={{\varvec{\text {A}}}}({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}})^{-1}{{\varvec{\text {V}}}}_1=({{\varvec{\text {A}}}}^{-1}-{{\varvec{\text {V}}}}_1^{-1})^{-1}=(\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1})^{-1}$$. From (19) and (20), the inequality (18) is equivalent to
\begin{aligned} \sum _{j=2}^k {{\varvec{\text {x}}}}_j^\top {{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j \ge \Big (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j\Big )^\top \Big (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}\Big )^{-1} \Big (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j\Big ). \end{aligned}
(21)
To show the inequality (21), let $${{\varvec{{\nu }}}}_*=(\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1})^{-1} (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j)$$. Then, it can be seen that
\begin{aligned} \sum _{j=2}^k&{{\varvec{\text {x}}}}_j^\top {{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j - \Big (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j\Big )^\top \Big (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}\Big )^{-1} \Big (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j\Big )\\ =&\sum _{j=2}^k {{\varvec{\text {x}}}}_j^\top {{\varvec{\text {V}}}}_j^{-1}{{\varvec{\text {x}}}}_j - {{\varvec{{\nu }}}}_*^\top \Big (\sum _{j=2}^k{{\varvec{\text {V}}}}_j^{-1}\Big ) {{\varvec{{\nu }}}}_*\\ =&\sum _{j=2}^k ({{\varvec{\text {x}}}}_j-{{\varvec{{\nu }}}}_*)^\top {{\varvec{\text {V}}}}_j^{-1}({{\varvec{\text {x}}}}_j-{{\varvec{{\nu }}}}_*), \end{aligned}
which is non-negative, and the proof of Lemma 2.2 is complete. $$\quad\quad\square$$

### Proof of Theorem 2.2

The risk function of $${\widehat{{\varvec{{\mu }}}}}_1(\phi ,\psi )$$ is
\begin{aligned} R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1(\phi ,\psi ))=&R({\varvec{{\omega }}},{\widehat{{\varvec{{\mu }}}}}_1(\phi )) -2 {1\over {\sigma }^2}E\Big [ ({\varvec{\text {X}}}_1-{{\varvec{{\mu }}}}_1)^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}{\psi \over G}\Big ]\nonumber \\&\quad +2{1\over {\sigma }^2}E\Big [ {\phi \psi \over FG}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}{{\varvec{{\nu }}}}\Big ] + E\Big [ {\psi ^2\over {\sigma }^2G^2}{\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}\Big ], \end{aligned}
for $$\phi =\phi (F,S)$$ and $$\psi =\psi (G,S)$$. Because of $$({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}\le \{({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})\}^{1/2} \{{\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}\}^{1/2}$$, we have
\begin{aligned} {2\over {\sigma }^2}{\phi \psi \over FG}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}\le&{2S^2\over {\sigma }^2} \Bigg \{\phi {\{({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})\}^{1/2}\over \sum _{j=1}^k\Vert {\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_j^{-1}}^2}\Bigg \} \Bigg \{ \psi {\{{\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}\}^{1/2}\over {\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {A}}}}^{-1}{\widehat{{{\varvec{{\nu }}}}}}}\Bigg \}\\ \le&{2S^2\over {\sigma }^2} \Bigg \{\phi ^2 {({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})\over 2\{\sum _{j=1}^k\Vert {\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_j^{-1}}^2\}^2 }+ \psi ^2 {{\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}\over 2({\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {A}}}}^{-1}{\widehat{{{\varvec{{\nu }}}}}})^2}\Bigg \}\\ =&{S\over {\sigma }^2}{\phi ^2\over F} B+ {S\over {\sigma }^2}{\psi ^2\over G} {{\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}\over {\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {A}}}}^{-1}{\widehat{{{\varvec{{\nu }}}}}}}, \end{aligned}
for B defined in (14). Thus,
\begin{aligned} R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1(\phi ,\psi ))=&R({\varvec{{\omega }}},{\widehat{{\varvec{{\mu }}}}}_1(\phi )) -2 {1\over {\sigma }^2}E\Big [ ({\varvec{\text {X}}}_1-{{\varvec{{\mu }}}}_1)^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}{\psi \over G}\Big ]\nonumber \\&+ 2E\Big [ {S\over {\sigma }^2}{\psi ^2\over G} W\Big ] +E\Big [ {S\over {\sigma }^2}{\phi ^2\over F} B\Big ]\nonumber \\ =&R({\varvec{{\omega }}},{\widehat{{\varvec{{\mu }}}}}_1(\phi )) -2J_1 + 2J_2 +J_3, \quad \mathrm{(say)} \end{aligned}
(22)
where $$W={{\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {Q}}}}{\widehat{{{\varvec{{\nu }}}}}}/ {\widehat{{{\varvec{{\nu }}}}}}^\top {{\varvec{\text {A}}}}^{-1}{\widehat{{{\varvec{{\nu }}}}}}}$$. The Stein identity is applied to rewrite $$J_1$$ as
\begin{aligned} J_1=E\Big [ {\psi \over G}\mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}}) - 2 W{\psi \over G}+ 2 W\psi _G \Big ]. \end{aligned}
(23)
The chi-square identity is used to rewrite $$J_2$$ as
\begin{aligned} J_2= E\Big [ (n+2)W{\psi ^2\over G} -4W\psi \psi _G + 4WS{\psi \psi _S\over G}\Big ]. \end{aligned}
(24)
The calculation of $$J_3$$ is given in (15), and from (16), (22), (23) and (24), it follows that
\begin{aligned} R({\varvec{{\omega }}}&, {\widehat{{\varvec{{\mu }}}}}_1(\phi ,\psi ))-\mathrm{tr\,}({{\varvec{\text {V}}}}_1{{\varvec{\text {Q}}}})\nonumber \\ =&E\Big [ {\phi \over F}\{2(n+2)B\phi - 2 \mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}+4B\} + {\psi \over G}\{ 2(n+2)W\psi - 2\mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}}) +4W\}\nonumber \\&-4B\phi _F-8B\phi \phi _F+8BS{\phi \phi _S\over F} -4 W\psi _G -8W\psi \psi _G + 8WS{\psi \psi _S\over G} \Big ]. \end{aligned}
(25)
Because $$W\le \mathrm{Ch}_\mathrm{max}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})$$, it can be verified that $$R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1(\phi ,\psi ))\le \mathrm{tr\,}({{\varvec{\text {V}}}}_1{{\varvec{\text {Q}}}})$$ under the conditions (a–d) in Theorem 2.2, which is proved. $$\quad\quad\square$$

## 3 Hierarchical and empirical Bayes minimax estimators

### 3.1 Empirical Bayes estimator under the uniform prior for $${{\varvec{{\nu }}}}$$

We begin with assuming the prior distribution
\begin{aligned} {{\varvec{{\mu }}}}_i\mid {{\varvec{{\nu }}}}, \tau ^2\sim & {} \mathcal{N}_p({{\varvec{{\nu }}}}, \tau ^2{{\varvec{\text {V}}}}_i), \quad \text {for}\ i=1, \ldots , k,\nonumber \\ {{\varvec{{\nu }}}}\sim & {} \text {Uniform}(\mathfrak {R}^p), \end{aligned}
(26)
where Uniform$$(\mathfrak {R}^p)$$ denotes the improper uniform distribution over $$\mathfrak {R}^p$$, and $$\tau ^2$$ is an unknown parameter. The posterior distribution of $${{\varvec{{\mu }}}}_i$$ given $${\varvec{\text {X}}}_i$$ and $${{\varvec{{\nu }}}}$$, the posterior distribution of $${{\varvec{{\nu }}}}$$ given $${\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k$$ and the marginal distribution of $${\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k$$ are
\begin{aligned} {{\varvec{{\mu }}}}_i \mid {\varvec{\text {X}}}_i, {{\varvec{{\nu }}}}, \tau ^2, {\sigma }^2\sim & {} \mathcal{N}_p ( {\widehat{{\varvec{{\mu }}}}}_i^*({\sigma }^2, \tau ^2,{{\varvec{{\nu }}}}), ({\sigma }^{-2}+\tau ^{-2})^{-1}),\quad i=1, \ldots , k,\nonumber \\ {{\varvec{{\nu }}}}\mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, \tau ^2, {\sigma }^2\sim & {} \mathcal{N}_p( {\widehat{{{\varvec{{\nu }}}}}}, (\tau ^2+{\sigma }^2){{\varvec{\text {A}}}}), \nonumber \\ f_\pi ({{\varvec{\text {x}}}}_1, \ldots , {{\varvec{\text {x}}}}_k \mid \tau ^2, {\sigma }^2)\propto & {} {1\over (\tau ^2+{\sigma }^2)^{p(k-1)/2}}\exp \Bigg \{ -{\sum _{i=1}^k\Vert {{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_i^{-1}}^2 \over 2(\tau ^2+{\sigma }^2)}\Bigg \}, \end{aligned}
(27)
where $${\widehat{{\varvec{{\mu }}}}}_i^*({\sigma }^2, \tau ^2,{{\varvec{{\nu }}}}) = {\varvec{\text {X}}}_i - \{{\sigma }^2/(\tau ^2+{\sigma }^2)\}({\varvec{\text {X}}}_i-{{\varvec{{\nu }}}})$$. Then, the Bayes estimator of $${{\varvec{{\mu }}}}_1$$ is
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^B({\sigma }^2, \tau ^2)=E[{\widehat{{\varvec{{\mu }}}}}_1^*({\sigma }^2, \tau ^2,{{\varvec{{\nu }}}}) \mid {\varvec{\text {X}}}_1, \ldots ,{\varvec{\text {X}}}_k] ={\varvec{\text {X}}}_1 - {{\sigma }^2\over \tau ^2+{\sigma }^2}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}). \end{aligned}
(28)
Because $$\tau ^2+{\sigma }^2$$ and $${\sigma }^2$$ are unknown, we estimate $$\tau ^2+{\sigma }^2$$ by $$\sum _{i=1}^k \Vert {\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_i^{-1}}^2/\{p(k-1)-2\}$$ from the marginal likelihood in (27). When $${\sigma }^2$$ is estimated by $${\hat{\sigma }}^2=S/(n+2)$$, the resulting empirical Bayes estimator is
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{EB} = {\varvec{\text {X}}}_1 - \min \Big ( {a_0 \over F}, 1\Big )({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}), \end{aligned}
(29)
for $$a_0=\{p(k-1)-2\}/(n+2)$$. It follows from Theorem 2.1 that the empirical Bayes estimator $${\widehat{{\varvec{{\mu }}}}}_1^{EB}$$ is minimax for $$0<a_0 \le 2 [\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}-2]/(n+2)$$.

### 3.2 Hierarchical Bayes minimax estimator under the uniform prior for $${{\varvec{{\nu }}}}$$

We consider the prior distribution for $${\sigma }^2$$ in (1) and $$\tau ^2$$ in (26), namely, in addition of (26), we assume that
\begin{aligned} \pi (\tau ^2 \mid {\sigma }^2)\propto & {} \Big ({{\sigma }^2\over \tau ^2+{\sigma }^2}\Big )^{a +1},\nonumber \\ \pi ({\sigma }^2)\propto & {} ({\sigma }^2)^{c -2}, \quad \text {for} \quad \ {\sigma }^2\le 1/L, \end{aligned}
(30)
where a and c are constants, and L is a positive constant. From (27), the posterior distribution of $$(\tau ^2, {\sigma }^2)$$ given $${\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, S$$ is
\begin{aligned} \pi&(\tau ^2, {\sigma }^2\mid {{\varvec{\text {x}}}}_1, \ldots , {{\varvec{\text {x}}}}_k, S)\\&\propto \Big ({{\sigma }^2\over \tau ^2+{\sigma }^2}\Big )^{p(k-1)/2+a +1}\Big ({1\over {\sigma }^2}\Big )^{\{n+p(k-1)\}/2+2-c}\exp \left \{ -{\sum _{i=1}^k\Vert {{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_i^{-1}}^2 \over 2(\tau ^2+{\sigma }^2)} - {S\over 2{\sigma }^2}\right \}. \end{aligned}
Then, the hierarchical Bayes estimator of $${{\varvec{{\mu }}}}_1$$ relative to the quadratic loss (2) is written as
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{HB} =&E[{{\varvec{{\mu }}}}_1/{\sigma }^2\mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, S]/E[1/{\sigma }^2\mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, S]\nonumber \\ =&{\varvec{\text {X}}}_1 - {E[(\tau ^2+{\sigma }^2)^{-1} \mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, S] \over E[({\sigma }^2)^{-1} \mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, S] }({\varvec{\text {X}}}_1 - {\widehat{{{\varvec{{\nu }}}}}}). \end{aligned}
(31)
Making the transformation $${\lambda }={\sigma }^2/(\tau ^2+{\sigma }^2)$$ and $$\eta =1/{\sigma }^2$$ with the Jacobian $$|\partial (\tau ^2, {\sigma }^2)/\partial ({\lambda },\eta )|=1/({\lambda }^2\eta ^3)$$ gives
\begin{aligned} \pi ({\lambda }, \eta \mid {{\varvec{\text {x}}}}_1, \ldots , {{\varvec{\text {x}}}}_k,S) \propto {\lambda }^{p(k-1)/2+a -1}\eta ^{\{n+p(k-1)\}/2-c -1}\exp \Big \{ - {{\lambda }\eta \over 2}\sum _{i=1}^k\Vert {{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_i^{-1}}^2 -{\eta \over 2}S\Big \}, \end{aligned}
where $$0<{\lambda }<1$$ and $$\eta \ge L$$. Thus, we have
\begin{aligned} {E[(\tau ^2+{\sigma }^2)^{-1} \mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, S] \over E[({\sigma }^2)^{-1} \mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, S] } =&{\int _0^1 \int _L^\infty {\lambda }^{p(k-1)/2+a }\eta ^{\{n+p(k-1)\}/2-c }\exp \left\{ - {\eta S\over 2}({\lambda }F+1)\right\}\mathrm{{d}}\eta \mathrm{{d}}{\lambda }\over \int _0^1 \int _L^\infty {\lambda }^{p(k-1)/2+a -1}\eta ^{\{n+p(k-1)\}/2-c }\exp \left\{ - {\eta S\over 2}({\lambda }F+1)\right\}\mathrm{{d}}\eta \mathrm{{d}}{\lambda }}\\ =&{\phi ^{HB}(F,S) / F}, \end{aligned}
for
\begin{aligned} \phi ^{HB}(F,S)= {\int _0^F \int _{LS}^\infty x^{p(k-1)/2+a }v^{\{n+p(k-1)\}/2-c }\exp \{ - {v}(x+1)/2\}{\rm {d}}v {\rm {d}}x \over \int _0^F \int _{LS}^\infty x^{p(k-1)/2+a -1}v^{\{n+p(k-1)\}/2-c }\exp \{ - {v}(x+1)/2\}{\rm {d}}v {\rm {d}}x}, \end{aligned}
(32)
where the transformations $$x=F{\lambda }$$ and $$v=S\eta$$ are used.
It is noted that the hierarchical Bayes estimator belongs to the class (6), and the minimaxity can be shown by checking the conditions (a) and (b) in Theorem 2.1. By differentiating $$\phi ^{HB}(F,S)$$ with respect to F and S, the condition (a) can be easily verified. The condition (a) implies that
\begin{aligned} \phi ^{HB}(F,S)\le \lim _{F\rightarrow \infty }\lim _{S\rightarrow 0} \phi ^{HB}(F,S) =&{\int _0^\infty x^{p(k-1)/2+a }/ (1+x)^{\{n+p(k-1)\}/2+1-c } \mathrm{{d}}x \over \int _0^\infty x^{p(k-1)/2+a -1}/(1+x)^{\{n+p(k-1)\}/2+1-c }\mathrm{{d}}x}\\ =&{B(p(k-1)/2+a +1, n/2-a -c ) \over B(p(k-1)/2+a , n/2-a -c +1)} = {p(k-1)+2a \over n- 2(a +c )}, \end{aligned}
for the beta function B(ab) if $$a>-p(k-1)/2$$ and $$a+c<n/2$$. Thus, the condition (b) is satisfied if
\begin{aligned}&a>-p(k-1)/2, a+c<n/2,\nonumber \\&{p(k-1)+2a \over n- 2(a +c )}\le 2 [\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}-2]/(n+2), \end{aligned}
(33)
which provides conditions on a and c for the minimaxity of the hierarchical Bayes estimator.

### 3.3 Hierarchical empirical Bayes minimax estimator under the normal prior for $${{\varvec{{\nu }}}}$$

The empirical Bayes estimator $${\widehat{{\varvec{{\mu }}}}}_1^{EB}$$ and the hierarchical Bayes estimator $${\widehat{{\varvec{{\mu }}}}}_1^{HB}$$ are derived under the uniform prior distribution for $${{\varvec{{\nu }}}}$$. Instead of the uniform prior, we here assume the normal prior distribution for $${{\varvec{{\nu }}}}$$, namely,
\begin{aligned} {{\varvec{{\mu }}}}_i\mid {{\varvec{{\nu }}}}, \tau ^2\sim & {} \mathcal{N}_p({{\varvec{{\nu }}}}, \tau ^2{{\varvec{\text {V}}}}_i), \quad \text {for} \quad \ i=1, \ldots , k,\nonumber \\ {{\varvec{{\nu }}}}\mid {\gamma }^2,\sim & {} \mathcal{N}_p(0, {\gamma }^2 {{\varvec{\text {A}}}}), \end{aligned}
(34)
for $${\gamma }>0$$ and $${{\varvec{\text {A}}}}=\Big (\sum _{i=1}^k{{\varvec{\text {V}}}}_i^{-1}\Big )^{-1}$$. Then the posterior distributions are given by
\begin{aligned} {{\varvec{{\mu }}}}_i \mid {\varvec{\text {X}}}_i, {{\varvec{{\nu }}}}, \tau ^2, {\sigma }^2\sim & {} \mathcal{N}_p\Big ( {\widehat{{\varvec{{\mu }}}}}_i^*({\sigma }^2, \tau ^2,{{\varvec{{\nu }}}}), ({\sigma }^{-2}+\tau ^{-2})^{-1}\Big ),\quad \quad i=1, \ldots , k,\nonumber \\ {{\varvec{{\nu }}}}\mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k, \tau ^2,{\gamma }^2,{\sigma }^2\sim & {} \mathcal{N}_p\Big ( {{\gamma }^2\over {\gamma }^2 + \tau ^2+{\sigma }^2}{\widehat{{{\varvec{{\nu }}}}}}, {{\gamma }^2(\tau ^2+{\sigma }^2)\over {\gamma }^2 + \tau ^2+{\sigma }^2}{{\varvec{\text {A}}}}\Big ), \end{aligned}
(35)
where $${\widehat{{\varvec{{\mu }}}}}_i^*({\sigma }^2, \tau ^2,{{\varvec{{\nu }}}})$$ is defined below (27). Thus, the Bayes estimator is
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^B({\sigma }^2, \tau ^2,{\gamma }^2)=&{\varvec{\text {X}}}_1 - {{\sigma }^2\over \tau ^2+{\sigma }^2}({\varvec{\text {X}}}_1-E[{{\varvec{{\nu }}}}\mid {\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k])\nonumber \\ =&{\varvec{\text {X}}}_1 - {{\sigma }^2\over \tau ^2+{\sigma }^2}({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}})-{{\sigma }^2\over {\gamma }^2+\tau ^2+{\sigma }^2}{\widehat{{{\varvec{{\nu }}}}}}. \end{aligned}
(36)
Because the marginal density of $${\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k$$ is
\begin{aligned} f_\pi ({{\varvec{\text {x}}}}_1,\ldots , {{\varvec{\text {x}}}}_k \mid \tau ^2, {\gamma }^2, {\sigma }^2)\propto & {} {1\over (\tau ^2+{\sigma }^2)^{p(k-1)/2}}\exp \Bigg \{ -{\sum _{i=1}^k\Vert {{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_i^{-1}}^2 \over 2(\tau ^2+{\sigma }^2)} \Bigg \} \nonumber \\\times & {} {1\over ({\gamma }^2 + \tau ^2+{\sigma }^2)^{p/2}} \exp \Bigg \{ - {\Vert {\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {A}}}}^{-1}}^2 \over 2({\gamma }^2 + \tau ^2+{\sigma }^2)}\Bigg \}, \end{aligned}
(37)
we can estimate $$\tau ^2+{\sigma }^2$$, $${\gamma }^2+\tau ^2+{\sigma }^2$$ by $$\sum _{i=1}^k \Vert {\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {V}}}}_i^{-1}}^2/\{p(k-1)-2\}$$ and $$\Vert {\widehat{{{\varvec{{\nu }}}}}}\Vert _{{{\varvec{\text {A}}}}^{-1}}^2/(p-2)$$, respectively, from the marginal likelihood. When $${\sigma }^2$$ is estimated by $${\hat{\sigma }}^2=S/(n+2)$$, the resulting hierarchical empirical Bayes estimator is
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{HEB} = {\varvec{\text {X}}}_1 - \min \Big ( {a_0\over F}, 1\Big )({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}) -\min \Big ( {b_0\over G}, 1\Big ){\widehat{{{\varvec{{\nu }}}}}}, \end{aligned}
(38)
for $$a_0=\{p(k-1)-2\}/(n+2)$$ and $$b_0=(p-2)/(n+2)$$. This estimator belongs to the class (8). It follows from Theorem 2.2 that the hierarchical empirical Bayes estimator $${\widehat{{\varvec{{\mu }}}}}_1^{HEB}$$ is minimax if $$0<a_0\le [\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}-2]/(n+2)$$ and $$0<b_0\le \{\mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})/\mathrm{Ch}_\mathrm{max}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}})-2\}/(n+2)$$.

## 4 Extension to estimation of linear combinations

We here extend the result in Sect. 2 to estimation of the linear combination of $${{\varvec{{\mu }}}}_1, \ldots , {{\varvec{{\mu }}}}_k$$, namely,
\begin{aligned} {\varvec{\theta }}=\sum _{i=1}^k d_i {{\varvec{{\mu }}}}_i, \end{aligned}
where $$d_1, \ldots , d_k$$ are constants. Based on (6), we consider the class of the estimators
\begin{aligned} {\widehat{\varvec{\theta }}}(\phi ) = \sum _{i=1}^k d_i \Big \{{\varvec{\text {X}}}_i - {\phi (F, S)\over F}({\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}})\Big \}. \end{aligned}
(39)
When the estimator is evaluated in light of risk relative to the loss function $$\Vert {\widehat{\varvec{\theta }}}(\phi )-{\varvec{\theta }}\Vert ^2_{{\varvec{\text {Q}}}}/{\sigma }^2$$, we obtain conditions for minimaxity of the estimators (39).

### Theorem 4.1

Assume the conditions $$\mathrm{Ch}_\mathrm{max}((\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}})\ne 0$$ and $$\mathrm{tr\,}\{(\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}/\mathrm{Ch}_\mathrm{max}((\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}})>2$$. Then, the estimator $${\widehat{\varvec{\theta }}}(\phi )$$ is minimax if $$\phi (F,S)$$ satisfies the following conditions:
1. (a)

$$\phi (F,S)$$ is non-decreasing in F and non-increasing in S.

2. (b)

$$\phi (F,S)$$ satisfies the inequality

\begin{aligned} 0<\phi (F,S)\le {2\over n+2} \left[ {\mathrm{tr\,}\{(\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}\over \mathrm{Ch}_\mathrm{max}((\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}})} -2 \right] . \end{aligned}
(40)

### Proof

The risk function of the estimator $${\widehat{\varvec{\theta }}}(\phi )$$ is
\begin{aligned} R({\varvec{{\omega }}}, {\widehat{\varvec{\theta }}}(\phi ))=&\sum _{i=1}^k {d_i^2 \over {\sigma }^2}E\Big [ \Big \Vert {\varvec{\text {X}}}_i - {{\varvec{{\mu }}}}_i - {\phi (F, S)\over F}({\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}}) \Big \Vert _{{\varvec{\text {Q}}}}^2\Big ]\\&+ \sum _{i\not = j} {d_id_j\over {\sigma }^2} E\Big [ \Big \{ {\varvec{\text {X}}}_i - {{\varvec{{\mu }}}}_i - {\phi (F, S)\over F}({\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}}) \Big \}^\top {{\varvec{\text {Q}}}}\Big \{ {\varvec{\text {X}}}_j - {{\varvec{{\mu }}}}_j - {\phi (F, S)\over F}({\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}}) \Big \}\Big ] \\ =&K_1+K_2. \quad \text { (say)} \end{aligned}
Concerning $$K_2$$, the same arguments as in (13) and (15) are used to get
\begin{aligned} {1\over {\sigma }^2}E\Big [ ({\varvec{\text {X}}}_i-{{\varvec{{\mu }}}}_i)^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}}) {\phi \over F}\Big ] =&E\Big [\mathrm{tr\,}\Big [{{\varvec{\text {V}}}}_i{{\varvec{\text {Q}}}}{{\varvec{{\nabla }}}}_i \Big \{({\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}})^\top {\phi (F,S)\over F}\Big \}\Big ]\Big ] \nonumber \\ =&E\Big [ - \mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}}){\phi \over F} - 2 B_{ij}{\phi \over F} + 2 B_{ij}\phi _F\Big ], \end{aligned}
and
\begin{aligned} B_{ij} E\Big [{S\over {\sigma }^2}{\phi ^2\over F}\Big ] =&B_{ij} E\Big [ (n+2) {\phi ^2\over F} - 4\phi \phi _F + 4S {\phi \phi _S\over F} \Big ], \end{aligned}
where
\begin{aligned} B_{ij}= ({\varvec{\text {X}}}_i-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {Q}}}}({\varvec{\text {X}}}_j-{\widehat{{{\varvec{{\nu }}}}}})/\sum _{a=1}^k ({\varvec{\text {X}}}_a-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_a^{-1}({\varvec{\text {X}}}_a-{{\varvec{{\nu }}}}). \end{aligned}
(41)
Thus, $$K_2$$ is written as
\begin{aligned} K_2=\sum _{i\not = j} d_id_j\Big [ 2\mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}}){\phi \over F} + B_{ij} \Big \{4 {\phi \over F} - 4 \phi _F + (n+2){\phi ^2\over F}- 4 \phi \phi _F+4S{\phi \phi _S\over F}\Big \}\Big ]. \end{aligned}
(42)
Concerning $$K_1$$, from (16), it follows that
\begin{aligned} K_1=&\sum _{i=1}^k d_i^2 E\Big [ \mathrm{tr\,}({{\varvec{\text {V}}}}_i{{\varvec{\text {Q}}}}) - 2 \mathrm{tr\,}\{({{\varvec{\text {V}}}}_i-{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}{\phi \over F} \nonumber \\&+ B_{ii} \Big \{ (n+2){\phi ^2\over F} + 4 {\phi \over F} -4\phi _F-4\phi \phi _F+4S {\phi \phi _S\over F}\Big \} \Big ]. \end{aligned}
(43)
It is here observed that
\begin{aligned} \sum _{i=1}^k d_i^2 B_{ii} + \sum _{i\not = j}d_id_j B_{ij} =B({\varvec{\text {X}}}), \end{aligned}
where $${\varvec{\text {X}}}=({\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k)$$ and
\begin{aligned} B({{\varvec{\text {x}}}}_1, \ldots , {{\varvec{\text {x}}}}_k) = {\{\sum _{i=1}^k d_i({{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}})\}^\top {{\varvec{\text {Q}}}}\{\sum _{i=1}^k d_i({{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}})\} \over \sum _{j=1}^k ({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_j^{-1}({{\varvec{\text {x}}}}_j-{{\varvec{{\nu }}}})}. \end{aligned}
(44)
Combining (42) and (43), one gets
\begin{aligned} R({\varvec{{\omega }}}, {\widehat{\varvec{\theta }}}(\phi ))=&\sum _{i=1}^k d_i^2 \mathrm{tr\,}({{\varvec{\text {V}}}}_i{{\varvec{\text {Q}}}}) \\&+ E\Big [ {\phi \over F} \Big \{- 2 \sum _{i=1}^k\mathrm{tr\,}(d_i^2{{\varvec{\text {V}}}}_i{{\varvec{\text {Q}}}}) + 2 \Big (\sum _{i=1}^k d_i\Big )^2 \mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {Q}}}}) + 4 B({\varvec{\text {X}}}) + (n+2) \phi B({\varvec{\text {X}}}) \Big \} \\&+ B({\varvec{\text {X}}})\Big \{ -4\phi _F-4\phi \phi _F+4S {\phi \phi _S\over F}\Big \} \Big ], \end{aligned}
which is smaller than $$R({\varvec{{\omega }}}, \sum _{i=1}^k d_i{\varvec{\text {X}}}_i)$$ under the conditions (a) and (b) in Theorem 4.1, because $$B({{\varvec{\text {x}}}})\le \mathrm{Ch}_\mathrm{max}((\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}})$$ from Lemma 4.1. Hence, the proof of Theorem 4.1 is complete. $$\quad\quad\square$$

### Lemma 4.1

For $$B({{\varvec{\text {x}}}}_1, \ldots , {{\varvec{\text {x}}}}_k)$$ given in (14), it holds that
\begin{aligned} B({{\varvec{\text {x}}}}_1, \ldots , {{\varvec{\text {x}}}}_k)\le \mathrm{{Ch}}_\mathrm{{max}}\left ( \left (\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-\left( \sum _{i=1}^k d_i\right) ^2{{\varvec{\text {A}}}}\right ){{\varvec{\text {Q}}}}\right ). \end{aligned}

### Proof

Let $${{\varvec{\text {y}}}}_i={{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}}$$ and $${{\varvec{\text {x}}}}=({{\varvec{\text {x}}}}_1, \ldots , {{\varvec{\text {x}}}}_k)$$. Then, it is noted that $$\sum _{i=1}^k {{\varvec{\text {V}}}}_i^{-1}{{\varvec{\text {y}}}}_i=\sum _{i=1}^k {{\varvec{\text {V}}}}_i^{-1}({{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}})=\mathbf{{\varvec{0}}}$$, which means that
\begin{aligned} {{\varvec{\text {W}}}}\begin{pmatrix}{{\varvec{\text {y}}}}_1\\ \vdots \\ {{\varvec{\text {y}}}}_k \end{pmatrix}=\begin{pmatrix}{{\varvec{\text {y}}}}_1\\ \vdots \\ {{\varvec{\text {y}}}}_k \end{pmatrix}, \end{aligned}
for
\begin{aligned} {{\varvec{\text {W}}}}= \mathrm{{block\ diag}}({{\varvec{\text {I}}}}, \ldots , {{\varvec{\text {I}}}}) - \begin{pmatrix}{{\varvec{\text {A}}}}\\ \vdots \\ {{\varvec{\text {A}}}}\end{pmatrix}({{\varvec{\text {V}}}}_1^{-1}, \ldots , {{\varvec{\text {V}}}}_k^{-1}). \end{aligned}
Then, the numerator and the denominator of $$B({{\varvec{\text {x}}}})$$ are
\begin{aligned} \left\{ \sum _{i=1}^k d_i({{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}})\right\} ^\top {{\varvec{\text {Q}}}}\left\{ \sum _{i=1}^k d_i({{\varvec{\text {x}}}}_i-{\widehat{{{\varvec{{\nu }}}}}})\right\} & = ({{\varvec{\text {y}}}}_1^\top , \ldots , {{\varvec{\text {y}}}}_k^\top ) \begin{pmatrix}d_1{{\varvec{\text {I}}}}\\ \vdots \\ d_k{{\varvec{\text {I}}}}\end{pmatrix}{{\varvec{\text {Q}}}}(d_1{{\varvec{\text {I}}}}, \ldots , d_k {{\varvec{\text {I}}}})\begin{pmatrix}{{\varvec{\text {y}}}}_1\\ \vdots \\ {{\varvec{\text {y}}}}_k\end{pmatrix}\\ & = ({{\varvec{\text {y}}}}_1^\top , \ldots , {{\varvec{\text {y}}}}_k^\top ) {{\varvec{\text {W}}}}^\top \begin{pmatrix}d_1{{\varvec{\text {I}}}}\\ \vdots \\ d_k{{\varvec{\text {I}}}}\end{pmatrix}{{\varvec{\text {Q}}}}(d_1{{\varvec{\text {I}}}}, \ldots , d_k {{\varvec{\text {I}}}}){{\varvec{\text {W}}}}\begin{pmatrix}{{\varvec{\text {y}}}}_1\\ \vdots \\ {{\varvec{\text {y}}}}_k\end{pmatrix},\\ \sum _{j=1}^k ({{\varvec{\text {x}}}}_j-{\widehat{{{\varvec{{\nu }}}}}})^\top {{\varvec{\text {V}}}}_j^{-1}({{\varvec{\text {x}}}}_j-{{\varvec{{\nu }}}}) & = ({{\varvec{\text {y}}}}_1^\top , \ldots , {{\varvec{\text {y}}}}_k^\top ) \mathrm{block\ diag}({{\varvec{\text {V}}}}_1^{-1}, \ldots , {{\varvec{\text {V}}}}_k^{-1})\begin{pmatrix}{{\varvec{\text {y}}}}_1\\ \vdots \\ {{\varvec{\text {y}}}}_k\end{pmatrix}. \end{aligned}
Thus, we get an upper bound given by
\begin{aligned} B({{\varvec{\text {x}}}}) \le&\mathrm{Ch}_\mathrm{max}\Big ( \mathrm{block\ diag}({{\varvec{\text {V}}}}_1, \ldots , {{\varvec{\text {V}}}}_k) {{\varvec{\text {W}}}}^\top \begin{pmatrix}d_1{{\varvec{\text {I}}}}\\ \vdots \\ d_k{{\varvec{\text {I}}}}\end{pmatrix}{{\varvec{\text {Q}}}}(d_1{{\varvec{\text {I}}}}, \ldots , d_k {{\varvec{\text {I}}}}) {{\varvec{\text {W}}}}\Big ) \\ =&\mathrm{Ch}_\mathrm{max}\Big ( {{\varvec{\text {Q}}}}(d_1{{\varvec{\text {I}}}}, \ldots , d_k {{\varvec{\text {I}}}}) {{\varvec{\text {W}}}}\mathrm{block\ diag}({{\varvec{\text {V}}}}_1, \ldots , {{\varvec{\text {V}}}}_k) {{\varvec{\text {W}}}}^\top \begin{pmatrix}d_1{{\varvec{\text {I}}}}\\ \vdots \\ d_k{{\varvec{\text {I}}}}\end{pmatrix} \Big ) \\ =&\mathrm{Ch}_\mathrm{{max}}\left ( \left (\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-\left( \sum _{i=1}^k d_i\right) ^2{{\varvec{\text {A}}}}\right ){{\varvec{\text {Q}}}}\right ), \end{aligned}
which shows Lemma 4.1. $$\quad\quad\square$$
We here give some examples of the condition (40) in specific cases. For example, in the case of $$k=2$$, it is observed that $$\mathrm{tr\,}\{(\sum _{i=1}^2 d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^2 d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}}\}=\mathrm{tr\,}\{(d_1{{\varvec{\text {V}}}}_1-d_2{{\varvec{\text {V}}}}_2)({{\varvec{\text {V}}}}_1+{{\varvec{\text {V}}}}_2)^{-1}(d_1{{\varvec{\text {V}}}}_1-d_2{{\varvec{\text {V}}}}_2){{\varvec{\text {Q}}}}\}=\mathrm{tr\,}({{\varvec{\text {H}}}})$$, where
\begin{aligned} {{\varvec{\text {H}}}}=({{\varvec{\text {V}}}}_1+{{\varvec{\text {V}}}}_2)^{-1/2}(d_1{{\varvec{\text {V}}}}_1-d_2{{\varvec{\text {V}}}}_2){{\varvec{\text {Q}}}}(d_1{{\varvec{\text {V}}}}_1-d_2{{\varvec{\text {V}}}}_2)({{\varvec{\text {V}}}}_1+{{\varvec{\text {V}}}}_2)^{-1/2}. \end{aligned}
Also, it can be seen that $$\mathrm{Ch}_\mathrm{max}((\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}})=\mathrm{Ch}_\mathrm{max}({{\varvec{\text {H}}}})$$. Hence, the condition (b) in Theorem 4.1 is expressed as
\begin{aligned} 0<\phi (F,S)\le {2\over n+2} \left [ {\mathrm{tr\,}({{\varvec{\text {H}}}})\over \mathrm{Ch}_\mathrm{max}({{\varvec{\text {H}}}})} -2 \right ]. \end{aligned}
For example, in the case that $${{\varvec{\text {V}}}}_1=\cdots ={{\varvec{\text {V}}}}_k={{\varvec{\text {Q}}}}={{\varvec{\text {I}}}}$$, we have
\begin{aligned} \mathrm{tr\,}\left\{ \left( \sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-\left( \sum _{i=1}^k d_i\right) ^2{{\varvec{\text {A}}}}\right) {{\varvec{\text {Q}}}}\right\} = p\sum _{i=1}^k (d_i-\overline{d})^2, \end{aligned}
for $$\overline{d}=k^{-1}\sum _{i=1}^k d_i$$. Similarly, $$\mathrm{Ch}_\mathrm{max}((\sum _{i=1}^k d_i^2{{\varvec{\text {V}}}}_i-(\sum _{i=1}^k d_i)^2{{\varvec{\text {A}}}}){{\varvec{\text {Q}}}})=\sum _{i=1}^k (d_i-\overline{d})^2$$, which implies that the condition (b) is expressed as
\begin{aligned} 0<\phi (F,S)\le 2(p-2)/(n+2). \end{aligned}

## 5 Simulation studies

We investigate the numerical performances of the risk functions of the preliminary test estimator and several empirical and hierarchical Bayes estimators through simulation. We employ the quadratic loss function $$L({\varvec{{\delta }}}_1, {{\varvec{{\mu }}}}_1, {\sigma }^2)$$ in (2) for $${{\varvec{\text {Q}}}}={{\varvec{\text {V}}}}_1^{-1}$$.

The estimators which we compare are the following five:

PT: the preliminary test estimator given in (5)
$$\begin{array}{ll} {\widehat{{\varvec{{\mu }}}}}_1^{PT}= \left\{ \begin{array}{ll} {\varvec{\text {X}}}_1 & \ \text {if}\ F>(p(k-1)/n)F_{p(k-1), n, {\alpha }},\\ {\widehat{{{\varvec{{\nu }}}}}}& \ \text {otherwise}, \end{array}\right. \end{array}$$
JS: the James–Stein estimator
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{JS}={\varvec{\text {X}}}_1 - {p-2 \over n+2}{S\over \Vert {\varvec{\text {X}}}_1\Vert ^2_{{{\varvec{\text {V}}}}_1^{-1}}}{\varvec{\text {X}}}_1, \end{aligned}
EB: the empirical Bayes estimator given in (29)
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{EB} = {\varvec{\text {X}}}_1 - \min \Big ( {a_0 \over F}, 1\Big )({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}), \end{aligned}
for $$a_0= [\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {V}}}}_1^{-1}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {V}}}}_1^{-1}\}-2]/(n+2)$$ (one can see that this constant choice is optimal with respect to the upper bound of the risk difference),
HB: the hierarchical Bayes estimator given in (31) and (32),
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{HB} ={\varvec{\text {X}}}_1 - {\phi ^{HB}(F,S) \over F}({\varvec{\text {X}}}_1 - {\widehat{{{\varvec{{\nu }}}}}}), \end{aligned}
HEB: the hierarchical empirical Bayes estimator given in (38)
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{HEB} = {\varvec{\text {X}}}_1 - \min \Big ( {a_0\over F}, 1\Big )({\varvec{\text {X}}}_1-{\widehat{{{\varvec{{\nu }}}}}}) -\min \Big ( {b_0\over G}, 1\Big ){\widehat{{{\varvec{{\nu }}}}}}, \end{aligned}
for $$a_0= [\mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {V}}}}_1^{-1}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {V}}}}_1^{-1}\}-2]/\{2(n+2)\}$$ and $$b_0=\{\mathrm{tr\,}({{\varvec{\text {A}}}}{{\varvec{\text {V}}}}_1^{-1})/\mathrm{Ch}_\mathrm{max}({{\varvec{\text {A}}}}{{\varvec{\text {V}}}}_1^{-1})-2\}/\{2(n+2)\}$$ (these constants are also optimal choices).
It is noted that the James–Stein estimator does not use $${\varvec{\text {X}}}_2, \ldots , {\varvec{\text {X}}}_k$$, but is minimax. Concerning the hierarchical Bayes estimator $${\widehat{{\varvec{{\mu }}}}}_1^{HB}$$, the constants c and L are $$c=1$$ and $$L=0$$, and a is the solution of the equation
\begin{aligned} {p(k-1)+2a \over n- 2(a +1 )}(n+2)= \mathrm{tr\,}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {V}}}}_1^{-1}\}/\mathrm{Ch}_\mathrm{max}\{({{\varvec{\text {V}}}}_1-{{\varvec{\text {A}}}}){{\varvec{\text {V}}}}_1^{-1}\}-2, \end{aligned}
which guarantees the minimaxity from the condition (33).
In this simulation, we generate random numbers of $${\varvec{\text {X}}}_1, \ldots , {\varvec{\text {X}}}_k$$ and S based on the model (1) for $$p=k=5$$, $$n=20$$, $${\sigma }^2=2$$ and $${{\varvec{\text {V}}}}_i=(0.1\times i){{\varvec{\text {I}}}}_p$$, $$i=1, \ldots , k$$. For the mean vectors $${{\varvec{{\mu }}}}_i$$, we treat the 12 cases:
\begin{aligned} ({{\varvec{{\mu }}}}_1,&\ldots , {{\varvec{{\mu }}}}_5)\\ =&(\mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}), (1{{\varvec{\text {j}}}}_5, 1{{\varvec{\text {j}}}}_5, 1{{\varvec{\text {j}}}}_5, 1{{\varvec{\text {j}}}}_5, 1{{\varvec{\text {j}}}}_5), (2{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5), (3{{\varvec{\text {j}}}}_5, 3{{\varvec{\text {j}}}}_5, 3{{\varvec{\text {j}}}}_5, 3{{\varvec{\text {j}}}}_5, 3{{\varvec{\text {j}}}}_5),\\&(-0.4{{\varvec{\text {j}}}}_5, -0.2{{\varvec{\text {j}}}}_5, \mathbf{{\varvec{0}}}, 0.2{{\varvec{\text {j}}}}_5, 0.4{{\varvec{\text {j}}}}_5), (2{{\varvec{\text {j}}}}_5, -0.5{{\varvec{\text {j}}}}_5, -0.5{{\varvec{\text {j}}}}_5, -0.5{{\varvec{\text {j}}}}_5, -0.5{{\varvec{\text {j}}}}_5), \\&(4{{\varvec{\text {j}}}}_5,-1{{\varvec{\text {j}}}}_5, -1{{\varvec{\text {j}}}}_5, -1{{\varvec{\text {j}}}}_5, -1{{\varvec{\text {j}}}}_5), \\&(1.2{{\varvec{\text {j}}}}_5, 1.4{{\varvec{\text {j}}}}_5, 1.6{{\varvec{\text {j}}}}_5, 1.8{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5),(\mathbf{{\varvec{0}}}, 2{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5, 2{{\varvec{\text {j}}}}_5), (\mathbf{{\varvec{0}}}, 4{{\varvec{\text {j}}}}_5, 4{{\varvec{\text {j}}}}_5, 4{{\varvec{\text {j}}}}_5, 4{{\varvec{\text {j}}}}_5), (2{{\varvec{\text {j}}}}_5, \mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}, \mathbf{{\varvec{0}}}), \\ \end{aligned}
where $${{\varvec{\text {j}}}}_p=(1, \ldots , 1)^\top \in \mathfrak {R}^p$$. The first four are the cases of equal means, the next three are the cases with $$\sum _{i=1}^5{{\varvec{{\mu }}}}_i=\mathbf{{\varvec{0}}}$$ and the last four are various unbalanced cases.
For each estimator $${\widehat{{\varvec{{\mu }}}}}_1$$, based on 5,000 replication of simulation, we obtain an approximated value of the risk function $$R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1)=E[L({\widehat{{\varvec{{\mu }}}}}_1, {{\varvec{{\mu }}}}_1, {\sigma }^2)]$$. Table 1 reports the percentage relative improvement in average loss (PRIAL) of each estimator $${\widehat{{\varvec{{\mu }}}}}_1$$ over $${\varvec{\text {X}}}_1$$, defined by
\begin{aligned} \mathrm{PRIAL} = 100\times \{ R({\varvec{{\omega }}}, {\varvec{\text {X}}}_1) - R({\varvec{{\omega }}}, {\widehat{{\varvec{{\mu }}}}}_1)\}/R({\varvec{{\omega }}}, {\varvec{\text {X}}}_1). \end{aligned}
Table 1

Values of PRIAL of estimators PT, JS, EB, HB and HEB

$$({{\varvec{{\mu }}}}_1, \ldots , {{\varvec{{\mu }}}}_5)$$

PT

JS

EB

HB

HEB

$$(0,0,0,0,0)\otimes {{\varvec{\text {j}}}}_5$$

52.15317

53.97469

14.66425

14.57437

26.38606

$$(1, 1, 1, 1, 1)\otimes {{\varvec{\text {j}}}}_5$$

52.15317

12.89115

14.66425

14.57437

9.891098

$$(2, 2, 2, 2, 2)\otimes {{\varvec{\text {j}}}}_5$$

52.15317

4.066823

14.66425

14.57437

8.516356

$$(3, 3, 3, 3, 3)\otimes {{\varvec{\text {j}}}}_5$$

52.15317

2.268442

14.66425

14.57437

8.249028

$$(-0.4,-0.2,0,0.2,0.4)\otimes {{\varvec{\text {j}}}}_5$$

37.34717

37.20396

13.01833

12.97352

22.64692

$$(2,-0.5,-0.5,-0.5,-0.5)\otimes {{\varvec{\text {j}}}}_5$$

− 56.8291

4.066823

3.213333

3.213459

6.031053

$$(4,-1,-1,-1,-1)\otimes {{\varvec{\text {j}}}}_5$$

0.7375904

1.620614

1.358956

1.358821

2.098222

$$(1.2,1.4,1.6,1.8,2)\otimes {{\varvec{\text {j}}}}_5$$

37.34717

9.463467

13.01833

12.97352

8.397694

$$(0.2, 2, 2, 2, 2)\otimes {{\varvec{\text {j}}}}_5$$

− 98.94453

49.73141

4.947591

4.949466

5.183324

$$(0.4, 4,4,4,4)\otimes {{\varvec{\text {j}}}}_5$$

− 2.492994

37.56052

1.805795

1.80584

2.071347

$$(2,0,0,0,0)\otimes {{\varvec{\text {j}}}}_5$$

− 94.45962

4.066823

4.439434

4.440511

4.479298

It is revealed from Table 1 that the performance of the preliminary test estimator PT strongly depends on the setup of parameters, namely it is good under the hypothesis of equal means, but not good for parameters close to the hypothesis. The James–Stein estimator JS is good for small $${{\varvec{{\mu }}}}_1$$, but not good for large $${{\varvec{{\mu }}}}_1$$. The empirical Bayes estimator EB and the hierarchical Bayes estimator HB perform similarly and they remain good even for large $${{\varvec{{\mu }}}}_1$$ as long as $${{\varvec{{\mu }}}}_1=\cdots {{\varvec{{\mu }}}}_5$$. The performance of HEB depends on parameters and is good for smaller means. For means with $$\sum _{i=1}^5 {{\varvec{{\mu }}}}_i=\mathbf{{\varvec{0}}}$$, HEB is better than EB and HB. Thus, EB, HB and HEB are used as an alternative to PT.

## 6 Concluding remarks

An interesting query is to find an admissible and minimax estimator of $${{\varvec{{\mu }}}}_1$$. In the framework of simultaneous estimation of $$({{\varvec{{\mu }}}}_1, \ldots , {{\varvec{{\mu }}}}_k)$$, Imai et al. (2017) demonstrated that all the estimators within the class (6) are improved on by an estimator belonging to the class (8), which means that the hierarchical Bayes estimators against the uniform prior of $${{\varvec{{\nu }}}}$$ is inadmissible. However, we could not show the same story in the single estimation of $${{\varvec{{\mu }}}}_1$$. Because the hierarchical Bayes estimator $${\widehat{{\varvec{{\mu }}}}}_1^{HB}$$ is derived under the uniform prior for $${{\varvec{{\nu }}}}$$, it could be supposedly inadmissible. Thus, whether it is admissible or not is an open question.

An approach to the admissible and minimax estimation is the proper prior distribution
\begin{aligned} \pi (\tau ^2 \mid {\sigma }^2)\propto & {} \Big ({{\sigma }^2\over \tau ^2+{\sigma }^2}\Big )^{a +1},\nonumber \\ \pi ({\gamma }^2 \mid \tau ^2, {\sigma }^2)\propto & {} \Big ({{\sigma }^2\over {\gamma }^2 + \tau ^2+{\sigma }^2}\Big )^{b +1},\nonumber \\ \pi ({\sigma }^2)\propto & {} ({\sigma }^2)^{c -3},\quad \text {for}\ {\sigma }^2\le 1/L, \end{aligned}
(45)
where a, b and c are constants and L is a positive constant. As seen from Imai et al. (2017), the resulting Bayes estimator has the form
\begin{aligned} {\widehat{{\varvec{{\mu }}}}}_1^{FB} = {\varvec{\text {X}}}_1 - {\phi ^{FB}(F,G,S) \over F}({\varvec{\text {X}}}_1 - {\widehat{{{\varvec{{\nu }}}}}}) - {\psi ^{FB}(F,G, S) \over G}{\widehat{{{\varvec{{\nu }}}}}}, \end{aligned}
(46)
where $$\phi ^{FB}(F,G,S)$$ and $$\psi ^{FB}(F,G,S)$$ are functions of F, G and S. Unfortunately, we could not establish the minimaxity for this type of estimators, which is another interesting query.
In this paper, we investigated the minimaxity of Bayesian alternatives to the preliminary test estimator. Beyond this framework, we consider the estimation of $${{\varvec{{\mu }}}}_1$$ based on $${\varvec{\text {X}}}_1$$ and S without $${\varvec{\text {X}}}_2, \ldots , {\varvec{\text {X}}}_k$$. Using the argument as in Strawderman (1973), we can derive an admissible and minimax estimator of $${{\varvec{{\mu }}}}_1$$. Taking the hypothesis $$H_0 : {{\varvec{{\mu }}}}_1=\cdots ={{\varvec{{\mu }}}}_k$$ into account, we can use the same argument as in Strawderman (1973) under the assumption that $${\varvec{\text {X}}}_2, \ldots , {\varvec{\text {X}}}_k$$ are given and fixed. Namely, we consider the prior distributions (26) and (30) where $${{\varvec{{\nu }}}}$$ is replaced with $${{\varvec{{\nu }}}}^*=(\sum _{i=2}^k {{\varvec{\text {V}}}}_i^{-1})^{-1}\sum _{i=2}^k{{\varvec{\text {V}}}}_i^{-1}{\varvec{\text {X}}}_i$$. Then, the Bayes estimator is
\begin{aligned} {\varvec{\text {X}}}_1 - {\phi ^S(\Vert {\varvec{\text {X}}}_1-{{\varvec{{\nu }}}}^*\Vert ^2, S)\over \Vert {\varvec{\text {X}}}_1-{{\varvec{{\nu }}}}^*\Vert ^2/ S}({\varvec{\text {X}}}_1-{{\varvec{{\nu }}}}^*), \end{aligned}
where $$\phi ^S$$ is a function derived from the Bayes estimator against the Strawderman type prior. This shrinks $${\varvec{\text {X}}}_1$$ towards $${{\varvec{{\nu }}}}^*$$ and is minimax under some condition on $$\phi ^S$$. Thus, it is admissible and minimax in the framework that $${\varvec{\text {X}}}_2, \ldots , {\varvec{\text {X}}}_k$$ are given and fixed.

## Notes

### Acknowledgements

We would like to thank the Editor, the Associate Editor and the reviewer for valuable comments and helpful suggestions which led to an improved version of this paper. Research of the second author was supported in part by Grant-in-Aid for Scientific Research (15H01943 and 26330036) from Japan Society for the Promotion of Science.

## References

1. Berger, J. O. (1985). Statistical decision theory and Bayesian analysis, 2nd. edn. New York: Springer.Google Scholar
2. Bilodeau, M., & Kariya, T. (1989). Minimax estimators in the normal MANOVA model. Journal of Multivariate Analysis, 28, 260–270.
3. Brown, L. D., George, E. I., & Xu, X. (2008). Admissible predictive density estimation. Annals of Statistics, 36, 1156–1170.
4. Efron, B., & Morris, C. N. (1973). Steinfs estimation rule and its competitors : an empirical Bayes approach. Journal of the American Statistical Association, 68, 117–130.
5. Efron, B., & Morris, C. (1976). Families of minimax estimators of the mean of a multivariate normal distribution. Annals of Statistics, 4, 11–21.
6. Ghosh, M., & Sinha, B. K. (1988). Empirical and hierarchical Bayes competitors of preliminary test estimators in two sample problems. Journal of Multivariate Analysis, 27, 206–227.
7. Imai, R., Kubokawa, T., & Ghosh, M. (2017). Bayesian simultaneous estimation for means in $$k$$ sample problems, arXiv:1711.10822.
8. James, W., & Stein, C. (1961). Estimation with quadratic loss, In Proceedings of Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol.I, University of California Press, Berkeley (pp 361–379).Google Scholar
9. Komaki, F. (2001). A shrinkage predictive distribution for multivariate normal observables. Biometrika, 88, 859–864.
10. Sclove, S. L., Morris, C., & Radhakrishnan, R. (1972). Nonoptimality of preliminary test estimators for the multinormal mean. The Annals of Mathematical Statistics, 43, 1481–1490.
11. Smith, A. F. M. N. (1973). Bayes estimates in one-way and two-way models. Biometrika, 60, 319–329.
12. Stein, C. (1956) Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, (pp 197–206). Berkeley: University of California Press.Google Scholar
13. Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Annals of Statistics, 9, 1135–1151.
14. Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. The Annals of Mathematical Statistics, 42, 385–388.
15. Strawderman, W. E. (1973). Proper Bayes minimax estimators of the multivariate normal mean vector for the case of common unknown variances. Annals of Statistics, 1, 1189–1194.
16. Sun, L. (1996). Shrinkage estimation in the two-way multivariate normal model. Annals of Statistics, 24, 825–840.
17. Tsukuma, H., & Kubokawa, T. (2015). A unified approach to estimating a normal mean matrix in high and low dimensions. Journal of Multivariate Analysis, 139, 312–328.