Advertisement

Exact finite-sample bias and MSE reduction in a simple linear regression model with measurement error

  • Hisayuki TsukumaEmail author
Original Paper
  • 178 Downloads

Abstract

This paper deals with the problem of estimating a slope parameter in a simple linear regression model, where independent variables have functional measurement errors. Measurement errors in independent variables, as is well known, cause biasedness of the ordinary least squares estimator. A general procedure for the bias reduction is presented in a finite sample situation, and some exact bias-reduced estimators are proposed. Also, it is shown that certain truncation procedures improve the mean square errors of the ordinary least squares and the bias-reduced estimators.

Keywords

Bias correction Errors-in-variables model Functional relationship Mean square error Multivariate calibration problem Repeated measurement Shrinkage estimator Statistical control problem Statistical decision theory Structural relationship 

Mathematics Subject Classifications

Primary 62F10 Secondary 62J07 

1 Introduction

Linear regression model with measurement errors in independent variables is of practical importance, and many theoretical and experimental approaches have been studied extensively for a long time. Adcock (1877, 1878) first treated estimation of the slope in a simple linear measurement error model and derived the maximum likelihood (ML) estimator, which nowadays is known as orthogonal regression estimator (see Anderson 1984). Reiersøl (1950) has investigated identifiability related to possibility of constructing a consistent estimator. For efficient estimation, see Bickel and Ritov (1987), and for consistent estimation based on shrinkage estimators, see Whittemore (1989) and Guo and Ghosh (2012). A multivariate generalization of univariate linear measurement error model has been considered by Gleser (1981). See Anderson (1984), Fuller (1987) and Cheng and Van Ness (1999) for a systematic overview of theoretical development in estimation of linear measurement error models.

Even though many estimation procedures for the slope have been developed and proposed, each procedure generally has both theoretical merits and demerits. The ML estimator possesses consistency and asymptotic normality. However, the first moment of the ML estimator does not exist and it is hard to theoretically investigate finite-sample properties of the ML procedure. Besides the ML procedure, the most well-known procedure may be the least squares (LS) procedure. The ordinary LS estimator has finite moments up to some order, but is not asymptotically unbiased. The asymptotic biasedness of the LS estimator is called attenuation bias in the literature (see Fuller 1987).

This paper addresses a simple linear measurement error model in a finite sample setup and discusses the problem of reducing the bias and the mean square error (MSE) for slope estimators. Let r be the number of groups and let n be the sample size of each group. Consider a linear measurement error model of the form:
$$\begin{aligned} \begin{aligned} Y_i&={\alpha }_0+{\beta }{\gamma }_i+{\delta }_i, \\ X_{ij}&={\gamma }_i+{\varepsilon }_{ij}, \end{aligned} \end{aligned}$$
(1.1)
for \(i=1,\ldots ,n\) and \(j=1,\ldots ,r\), where \(Y_i\) and \(X_{ij}\) are observable variables, \({\alpha }_0\) and \({\beta }\) are, respectively, unknown intercept and slope parameters, \({\gamma }_i\) is the true but unobservable variable, \({\delta }_i\) is the residual term and \({\varepsilon }_{ij}\) is the measurement error. Assume that all the residual terms \({\delta }_i\)’s and the measurement errors \({\varepsilon }_{ij}\)’s are mutually independent and also assume that \({\delta }_i\sim \mathcal{N}(0,\tau ^2)\) for all i and \({\varepsilon }_{ij}\sim \mathcal{N}(0,{\sigma }_x^2)\) for all i and j, where \(\tau ^2\) and \({\sigma }_x^2\) are unknown. It is important to note that the error variance in independent variables, \({\sigma }_x^2\), can be estimated.

For the unobservable variable \({\gamma }_i\) in model (1.1), there are two different points of view, namely, \({\gamma }_i\) is considered as unknown fixed value or as random variable. In the former case, (1.1) is referred to as a functional model and, in the latter case, is called a structural model (Kendall and Stuart 1979; Anderson 1984 and Cheng and Van Ness 1999). In this paper, we consider the functional model, namely, all the \(Y_i\)’s and the \(X_{ij}\)’s are mutually independent, and we shall develop a finite-sample theory of estimating the slope \({\beta }\).

The remainder of this paper is organized as follows. In Sect. 2, we simplify the estimation problem in model (1.1) and define a broad class of slope estimators including the LS estimator, the method of moments estimator, and a Stefanski’s (1985) estimator. Also, Sect. 2 shows some technical lemmas used for evaluating moments. Section 3 presents a unified method of reducing the bias of the broad class as well as that of the LS estimator. In Sect. 4, we handle the problem of reducing the MSEs of slope estimators. It is revealed that the slope estimation under the MSE criterion is closely related to the statistical control problem (see Zellner 1971 and Aoki 1989) and also to the multivariate calibration problem (see Osborne 1991; Brown 1993 and Sundberg 1999). Our approach to the MSE reduction is carried out in a similar way to Kubokawa and Robert (1994), and a general method is established for improvement of several estimators such as the LS estimator and Guo and Ghosh’s (2012) estimator. Section 5 illustrates thec numerical performance for the biases and the MSEs of alternative estimators. In Sect. 6, we point out some remarks on our results and related topics.

2 Simplification of the estimation problem

2.1 Reparametrized model

Define \(\overline{X}_i=(1/r)\sum _{j=1}^r X_{ij}\) for \(i=1,2,\ldots ,n\). Consider the regression of the \(Y_i\)s on the \(\overline{X}_i\)s. The LS estimator of \(({\beta },{\alpha }_0)\) is defined as a unique solution of
$$\begin{aligned} \min _{\begin{array}{c} -{\infty }<{\beta }<{\infty }\\ -{\infty }<{\alpha }_0<{\infty } \end{array}}\ \sum _{i=1}^{n}(Y_i-{\alpha }_0-{\beta }\overline{X}_i)^2. \end{aligned}$$
Denote by \((\hat{{\beta }}^{LS},\hat{{\alpha }}_0^{LS})\) the resulting ordinary LS estimator of \(({\beta },{\alpha }_0)\). Then, \(\hat{{\beta }}^{LS}\) and \(\hat{{\alpha }}_0^{LS}\) are given, respectively, by
$$\begin{aligned} \hat{{\beta }}^{LS}=\frac{\sum _{i=1}^{n}(\overline{X}_i-\overline{X}) (Y_i-\overline{Y})}{\sum _{i=1}^{n}(\overline{X}_i-\overline{X})^2},\quad \hat{{\alpha }}_0^{LS}=\overline{Y}-\hat{{\beta }}^{LS}\overline{X}, \end{aligned}$$
where \(\overline{X}=(1/n)\sum _{i=1}^{n}\overline{X}_i\) and \(\overline{Y}=(1/n)\sum _{i=1}^{n}Y_i\).
Let \({{\varvec{{\gamma }}}}=({\gamma }_1,\ldots ,{\gamma }_{n})^t\), \({{{\varvec{Y}}}}=(Y_1,\ldots ,Y_{n})^t\) and \({{{\varvec{X}}}}=(\overline{X}_1,\ldots ,\overline{X}_{n})^t\). Define
$$\begin{aligned} S=\frac{1}{r}\sum _{i=1}^{n}\sum _{j=1}^r(X_{ij}-\overline{X}_i)^2. \end{aligned}$$
Denote by \(I_{n}\) the identity matrix of order n and by \(1_{n}\) the n-dimensional vector consisting of ones. It is then observed that
$$\begin{aligned} \begin{aligned} {{{\varvec{Y}}}}&\sim \mathcal{N}_{n}({\alpha }_01_{n}+{\beta }{{\varvec{{\gamma }}}},\tau ^2I_{n}), \\ {{{\varvec{X}}}}&\sim \mathcal{N}_{n}({{\varvec{{\gamma }}}},{\sigma }^2I_{n}), \end{aligned} \quad \begin{aligned}&\\ S&\sim {\sigma }^2\chi _m^2, \end{aligned} \end{aligned}$$
(2.1)
for \(m=n(r-1)\) and \({\sigma }^2={\sigma }_x^2/r\). Note that \({{{\varvec{Y}}}}\), \({{{\varvec{X}}}}\) and S are mutually independent.
Furthermore, let \({{{\varvec{Q}}}}\) be an \(n\times n\) orthogonal matrix whose first row is \(1_{n}^t/\sqrt{n}\). Denote \(p=n-1\) and \({\alpha }={\alpha }_0\sqrt{n}\). Define \({{{\varvec{Q}}}}{{{\varvec{Y}}}}=(Z_0,{{{\varvec{Z}}}}^t)^t\), \({{{\varvec{Q}}}}{{{\varvec{X}}}}=(U_0,{{{\varvec{U}}}}^t)^t\) and \({{{\varvec{Q}}}}{{\varvec{{\gamma }}}}=({\theta },{{\varvec{\xi }}}^t)^t\), where \({{{\varvec{Z}}}}\), \({{{\varvec{U}}}}\) and \({{\varvec{\xi }}}\) are p-dimensional vectors. Then model (2.1) can be replaced with
$$\begin{aligned} \begin{aligned} Z_0&\sim \mathcal{N}({\alpha }+{\beta }{\theta },\tau ^2),\\ U_0&\sim \mathcal{N}({\theta },{\sigma }^2), \end{aligned} \quad \begin{aligned} {{{\varvec{Z}}}}&\sim \mathcal{N}_p({\beta }{{\varvec{\xi }}},\tau ^2I_p),\\ {{{\varvec{U}}}}&\sim \mathcal{N}_p({{\varvec{\xi }}},{\sigma }^2I_p), \end{aligned} \quad \begin{aligned}&\\ S&\sim {\sigma }^2\chi _m^2. \end{aligned} \end{aligned}$$
(2.2)
These five statistics, \(Z_0, {{{\varvec{Z}}}}, U_0, {{{\varvec{U}}}}\) and S, are mutually independent, and \({\alpha }\), \({\beta }\), \({\theta }\), \({{\varvec{\xi }}}\), \({\sigma }^2\) and \(\tau ^2\) are unknown parameters. Throughout this paper, we suppose that \({{\varvec{\xi }}}\ne 0_p\).
From reparametrized model (2.2), the ordinary LS estimators \(\hat{{\beta }}^{LS}\) and \(\hat{{\alpha }}^{LS}=\hat{{\alpha }}_0^{LS}\sqrt{n}\) can be rewritten, respectively, as
$$\begin{aligned} \hat{{\beta }}^{LS}=\frac{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}}{\Vert {{{\varvec{U}}}}\Vert ^2},\quad \hat{{\alpha }}^{LS}=Z_0-\hat{{\beta }}^{LS}U_0. \end{aligned}$$
(2.3)
Hereafter, we mainly deal with the problem of estimating \({\beta }\) in reparametrized model (2.2) and consider improvement on the bias and MSE of \(\hat{{\beta }}^{LS}\). Denote the bias and the MSE of an estimator \(\hat{{\beta }}\), respectively, by
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }};{\beta })&=E[\hat{{\beta }}]-{\beta }, \\ \mathrm{MSE}(\hat{{\beta }};{\beta })&=E[(\hat{{\beta }}-{\beta })^2], \end{aligned}$$
where the expectation E is taken with respect to (2.2). The bias of \(\hat{{\beta }}\) is smaller than that of another estimator \(\hat{{\beta }}_*\) if \(|\mathrm{Bias}(\hat{{\beta }};{\beta })|\le |\mathrm{Bias}(\hat{{\beta }}_*;{\beta })|\) for any \({\beta }\). Similarly, if \(\mathrm{MSE}(\hat{{\beta }};{\beta })\le \mathrm{MSE}(\hat{{\beta }}_*;{\beta })\) for any \({\beta }\), then the MSE of \(\hat{{\beta }}\) is said to be better than that of \(\hat{{\beta }}_*\), or \(\hat{{\beta }}\) is said to dominate \(\hat{{\beta }}_*\).

2.2 Estimators studied in the literature

If \(\lim _{n\rightarrow {\infty }} \Vert {{\varvec{\xi }}}\Vert ^2/p={\sigma }_\xi ^2\) where \({\sigma }_\xi ^2\) is a positive value, it follows that \({{{\varvec{U}}}}^t{{{\varvec{Z}}}}/p\rightarrow {\beta }{\sigma }_\xi ^2\) and \(\Vert {{{\varvec{U}}}}\Vert ^2/p\rightarrow {\sigma }_\xi ^2+{\sigma }^2\) in probability as n tends to infinity, and hence
$$\begin{aligned} \hat{{\beta }}^{LS}\mathop {\rightarrow } \frac{{\sigma }_\xi ^2}{{\sigma }_\xi ^2+{\sigma }^2}{\beta }\quad \text {in probability} \quad (n\rightarrow {\infty }). \end{aligned}$$
(2.4)
This implies that the ordinary LS estimator \(\hat{{\beta }}^{LS}\) is inconsistent and, more precisely, it is asymptotically biased toward zero. This phenomenon is called attenuation bias (see Fuller 1987).
For reducing the influence of attenuation bias, various alternatives to \(\hat{{\beta }}^{LS}\) have been proposed in the literature. For example, a typical alternative is the method of moments estimator
$$\begin{aligned} \hat{{\beta }}^{MM}=\frac{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}/p}{\Vert {{{\varvec{U}}}}\Vert ^2/p-S/m}. \end{aligned}$$
(2.5)
The method of moments estimator \(\hat{{\beta }}^{MM}\) converges to \({\beta }\) in probability as n goes to infinity, but \(\hat{{\beta }}^{MM}\) does not have finite moments. Noting that \(\hat{{\beta }}^{MM} =\{1-(p/m)S/\Vert {{{\varvec{U}}}}\Vert ^2\}^{-1}\hat{{\beta }}^{LS}\) and also using the Maclaurin expansion \((1-x)^{-1}=\sum _{j=0}^{\infty }x^j\), we obtain the \(\ell\)-th order corrected estimator of the form
$$\begin{aligned} \hat{{\beta }}_\ell ^{ST}=\bigg \{1+\frac{p}{m}\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2} +\cdots +\bigg (\frac{p}{m}\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^\ell \, \bigg \}\hat{{\beta }}^{LS}. \end{aligned}$$
(2.6)
The above estimator can also be derived from using the same arguments as in Stefanski (1985), who approached the bias correction from Huber’s (1981) M estimation. However, it is still not known whether or not the bias of \(\hat{{\beta }}_\ell ^{ST}\) is smaller than that of \(\hat{{\beta }}^{LS}\) in a finite sample situation.
In estimation of a normal mean vector \({{\varvec{\xi }}}\) with a quadratic loss, where \({{{\varvec{U}}}}\) and S are independently distributed as \({{{\varvec{U}}}}\sim \mathcal{N}_{p}({{\varvec{\xi }}},{\sigma }^2I_p)\) and \(S\sim {\sigma }^2\chi ^2_m\), it is well known that the ML estimator, \(\widehat{{{\varvec{\xi }}}}{}^{ML}={{{\varvec{U}}}}\), is uniformly dominated by the James and Stein (1961) shrinkage estimator \(\widehat{{{\varvec{\xi }}}}{}^{JS}=(1-G^{JS}){{{\varvec{U}}}}\) with \(G^{JS}=(p-2)S/\{(m+2)\Vert {{{\varvec{U}}}}\Vert ^2\}\). Moreover, from the integral expression of risk difference (IERD) method by Kubokawa and Robert (1994), we can show that \(\widehat{{{\varvec{\xi }}}}{}^{JS}\) is improved by a truncated shrinkage estimator \(\widehat{{{\varvec{\xi }}}}{}^{K}=(1-G^K){{{\varvec{U}}}}\) with \(G^K=\min \{(p-2)/p, G^{JS}\}\). Whittemore (1989) and Guo and Ghosh (2012) employed the above shrinkage estimators to find out better slope estimators for a linear measurement error model with a structural relationship. Their ideas can be applied to our slope estimation in the functional model (2.2). For the ordinary LS estimator \(\hat{{\beta }}^{LS}={{{\varvec{U}}}}^t{{{\varvec{Z}}}}/\Vert {{{\varvec{U}}}}\Vert ^2\), substituting \({{{\varvec{U}}}}\) with \(\widehat{{{\varvec{\xi }}}}{}^{JS}\) yields Whittemore (1989)-type estimator
$$\begin{aligned} \hat{{\beta }}^W =\frac{(\widehat{{{\varvec{\xi }}}}{}^{JS})^t{{{\varvec{Z}}}}}{\Vert \widehat{{{\varvec{\xi }}}}{}^{JS}\Vert ^2} =\frac{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}}{(1-G^{JS})\Vert {{{\varvec{U}}}}\Vert ^2}. \end{aligned}$$
Similarly, by replacing \({{{\varvec{U}}}}\) with \(\widehat{{{\varvec{\xi }}}}{}^{K}\), we obtain Guo and Ghosh (2012)-type estimator
$$\begin{aligned} \hat{{\beta }}^{GG}=\frac{(\widehat{{{\varvec{\xi }}}}{}^{K})^t{{{\varvec{Z}}}}}{\Vert \widehat{{{\varvec{\xi }}}}{}^{K}\Vert ^2} =\frac{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}}{(1-G^{K})\Vert {{{\varvec{U}}}}\Vert ^2}. \end{aligned}$$
(2.7)
The Whittemore estimator \(\hat{{\beta }}^W\) is asymptotically analogous to the method of moments estimator \(\hat{{\beta }}^{MM}\), and the bias and the MSE of \(\hat{{\beta }}^W\) do not exist. Meanwhile, the Guo and Ghosh estimator \(\hat{{\beta }}^{GG}\) has a finite MSE.
The maximum likelihood procedure is also typical for estimating the slope. For simplicity, we suppose \(\tau ^2={\sigma }_x^2\ (=r{\sigma }^2)\) in model (1.1). Then, the maximum likelihood estimator of \({\beta }\) has the form
$$\begin{aligned} \hat{{\beta }}^{ML}=\frac{\Vert {{{\varvec{Z}}}}\Vert ^2-r\Vert {{{\varvec{U}}}}\Vert ^2+\sqrt{(\Vert {{{\varvec{Z}}}}\Vert ^2-r\Vert {{{\varvec{U}}}}\Vert ^2)^2+4r({{{\varvec{U}}}}^t{{{\varvec{Z}}}})^2}}{2{{{\varvec{U}}}}^t{{{\varvec{Z}}}}}, \end{aligned}$$
(2.8)
which can be constructed by minimizing
$$\begin{aligned} \frac{1}{\tau ^2}\Vert {{{\varvec{Z}}}}-{\beta }{{\varvec{\xi }}}\Vert ^2+\frac{1}{{\sigma }^2}\Vert {{{\varvec{U}}}}-{{\varvec{\xi }}}\Vert ^2 =\frac{1}{\tau ^2}\{\Vert {{{\varvec{Z}}}}-{\beta }{{\varvec{\xi }}}\Vert ^2+r\Vert {{{\varvec{U}}}}-{{\varvec{\xi }}}\Vert ^2\}, \end{aligned}$$
subject to \(-{\infty }<{\beta }<{\infty }\) and \({{\varvec{\xi }}}\in \mathbb {R}^p\). Under a suitable convergence condition, \(\hat{{\beta }}^{ML}\) is a consistent estimator of \({\beta }\).
As stated in the beginning of Sect. 2.1, \(\hat{{\beta }}^{LS}\) is derived from the regression of the \(Y_i\)s on the \(\overline{X}_i\)s. Let us now consider the inverse regression, namely the \(\overline{X}_i\)s are regressed on the \(Y_i\)s. Through the use of statistics in (2.2), the least squares estimator for a slope of the inverse regression equals to \({{{\varvec{U}}}}^t{{{\varvec{Z}}}}/\Vert {{{\varvec{Z}}}}\Vert ^2\). Since the slope of the inverse regression is equivalent to \({\beta }^{-1}\) (the reciprocal of the slope in the usual regression), the resulting estimator of \({\beta }\) can be expressed as
$$\begin{aligned} \hat{{\beta }}^{IR}=\frac{\Vert {{{\varvec{Z}}}}\Vert ^2}{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}}. \end{aligned}$$
(2.9)
Note that \(\hat{{\beta }}^{ML}\) and \(\hat{{\beta }}^{IR}\) have no finite moments, and hence their biases and MSEs do not exist. If \({{{\varvec{U}}}}^t{{{\varvec{Z}}}}>0\), then it can easily be shown that \((\hat{{\beta }}^{ML})^{-1}<(\hat{{\beta }}^{LS})^{-1}\) and \(\hat{{\beta }}^{ML}<\hat{{\beta }}^{IR}\); namely, \(0<\hat{{\beta }}^{LS}<\hat{{\beta }}^{ML}<\hat{{\beta }}^{IR}\). In a similar fashion, we obtain \(\hat{{\beta }}^{IR}<\hat{{\beta }}^{ML}<\hat{{\beta }}^{LS}<0\) if \({{{\varvec{U}}}}^t{{{\varvec{Z}}}}<0\). See Anderson (1976).

2.3 A class of estimators

Convergence (2.4) is equivalent that \(\bar{{\beta }}=(1+{\sigma }^2/{\sigma }_\xi ^2)\hat{{\beta }}^{LS}\) converges to \({\beta }\) in probability as n goes to infinity. Replacing \({\sigma }^2/{\sigma }_\xi ^2\) of \(\bar{{\beta }}\) with a suitable function \(\phi\) of \(\Vert {{{\varvec{U}}}}\Vert ^2/S\) yields a general class of estimators,
$$\begin{aligned} \hat{{\beta }}_\phi =\bigg \{1+\phi \bigg (\frac{\Vert {{{\varvec{U}}}}\Vert ^2}{S}\bigg )\bigg \}\hat{{\beta }}^{LS}. \end{aligned}$$
(2.10)
Note that the class (2.10) includes \(\hat{{\beta }}^{MM}\), \(\hat{{\beta }}_\ell ^{ST}\), \(\hat{{\beta }}^{W}\) and \(\hat{{\beta }}^{GG}\), but not \(\hat{{\beta }}^{ML}\) and \(\hat{{\beta }}^{IR}\).

In this paper, we search a bias-reduced or an MSE-reduced estimator within (2.10) as an alternative to \(\hat{{\beta }}^{LS}\). Here, some new estimators derived in the following sections are presented briefly below.

Let \(\ell\) be a nonnegative integer. A simple modification of \(\hat{{\beta }}_\ell ^{ST}\), given in (2.6), is defined as
$$\begin{aligned} \hat{{\beta }}_\ell ^{BR}=\bigg \{1+\sum _{j=1}^\ell \frac{a_j}{b_j}\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\bigg \}\hat{{\beta }}^{LS}, \end{aligned}$$
(2.11)
where \(a_j=(p-2)(p-4)\cdots (p-2j)\) and \(b_j=m(m+2)\cdots (m+2j-2)\) for \(j=1,\ldots ,\ell\), and \(\hat{{\beta }}_0^{BR}\equiv \hat{{\beta }}^{LS}\). Also, let
$$\begin{aligned} \hat{{\beta }}_{\ell \cdot 2}^{BR}=\bigg \{1+2\sum _{j=1}^\ell \frac{a_j}{b_j} \bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\bigg \}\hat{{\beta }}^{LS}. \end{aligned}$$
(2.12)
Section 3 will prove that \(\hat{{\beta }}_{\ell }^{BR}\) and \(\hat{{\beta }}_{\ell \cdot 2}^{BR}\) have smaller biases than \(\hat{{\beta }}^{LS}\).
As an MSE-reduced estimator, we will consider
$$\begin{aligned} \hat{{\beta }}^{TLS}=\max \bigg [0,\min \bigg \{1,\, \frac{2(p+m-2)\Vert {{{\varvec{U}}}}\Vert ^2}{S+\Vert {{{\varvec{U}}}}\Vert ^2}-1\bigg \}\bigg ]\hat{{\beta }}^{LS}. \end{aligned}$$
(2.13)
In Section 4, the MSE of \(\hat{{\beta }}^{TLS}\) is shown to be smaller than that of \(\hat{{\beta }}^{LS}\).

2.4 Some useful lemmas

Next, we provide some technical lemmas which form the basis for evaluating the bias and MSE of (2.10).

Lemma 2.1

Let\({{{\varvec{U}}}}\sim \mathcal{N}_p({{\varvec{\xi }}},{\sigma }^2I_p)\)and\(S\sim {\sigma }^2\chi _m^2\)and assume that\({{{\varvec{U}}}}\)andSare mutually independent. Let\(\phi\)be a function on the positive real line. Define\({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\)and denote by\(P_{\lambda }(k)=e^{-{\lambda }}{\lambda }^k/k!\)the Poisson probabilities for\(k=0,1,2,\ldots\). Let\(g_n(t)\)be the p.d.f. of\(\chi ^2_n\).
  1. (i)
    If\(E[|\phi (\Vert {{{\varvec{U}}}}\Vert ^2/S){{{\varvec{U}}}}^t{{\varvec{\xi }}}|/\Vert {{{\varvec{U}}}}\Vert ^2]<{\infty }\)then we have
    $$\begin{aligned} E\bigg [\phi \Big (\frac{\Vert {{{\varvec{U}}}}\Vert ^2}{S}\Big )\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ] =\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)I_1(k|\phi ), \end{aligned}$$
    where\(I_1(k|\phi )=\int _0^{\infty }\int _0^{\infty }\phi (w/s)g_{p+2k}(w)\,\mathrm{d}w \,g_m(s)\,\mathrm{d}s\).
     
  2. (ii)
    If\(E[|\phi (\Vert {{{\varvec{U}}}}\Vert ^2/S)|({{{\varvec{U}}}}^t{{\varvec{\xi }}})^2/\Vert {{{\varvec{U}}}}\Vert ^4]<{\infty }\)then we have
    $$\begin{aligned} E\bigg [\phi \Big (\frac{\Vert {{{\varvec{U}}}}\Vert ^2}{S}\Big )\frac{({{{\varvec{U}}}}^t{{\varvec{\xi }}})^2}{\Vert {{{\varvec{U}}}}\Vert ^{4}}\bigg ] =\sum _{k=0}^{\infty }\frac{2{\lambda }(1+2k)}{p+2k}P_{\lambda }(k)I_2(k|\phi ), \end{aligned}$$
    where\(I_2(k|\phi )=\int _0^{\infty }\int _0^{\infty }w^{-1}\phi (w/s)g_{p+2k}(w)\,\mathrm{d}w \,g_m(s)\,\mathrm{d}s\).
     
When \(\phi \equiv 1\), (i) and (ii) of Lemma 2.1 are, respectively,
$$\begin{aligned} E\bigg [\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]&=E\bigg [\frac{2{\lambda }}{p+2K}\bigg ]\quad \hbox {for }p\ge 2, \end{aligned}$$
(2.14)
$$\begin{aligned} E\bigg [\frac{({{{\varvec{U}}}}^t{{\varvec{\xi }}})^2}{\Vert {{{\varvec{U}}}}\Vert ^{4}}\bigg ]&=E\bigg [\frac{2{\lambda }(1+2K)}{(p+2K)(p+2K-2)}\bigg ]\quad \hbox {for } p\ge 3, \end{aligned}$$
(2.15)
where K is the Poisson random variable with mean \({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\). Identities (2.14) and (2.15) have been given, for example, in Nishii and Krishnaiah (1988, Lemma 3).

Proof of Lemma 2.1

(i) Denote
$$\begin{aligned} E_1=E\bigg [\phi \Big (\frac{\Vert {{{\varvec{U}}}}\Vert ^2}{S}\Big )\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]. \end{aligned}$$
Let \({{\varvec{\xi }}}_1={{\varvec{\xi }}}/{\sigma }\). It turns out that
$$\begin{aligned} E_1=(2\pi )^{-p/2}\int _0^{\infty }\int _{\mathbb {R}^p}\phi \Big (\frac{\Vert {{{\varvec{u}}}}\Vert ^2}{s}\Big )\frac{{{{\varvec{u}}}}^t{{\varvec{\xi }}}_1}{\Vert {{{\varvec{u}}}}\Vert ^2}e^{-\Vert {{{\varvec{u}}}}-{{\varvec{\xi }}}_1\Vert ^2/2}\,\mathrm{d}{{{\varvec{u}}}}\,g_m(s)\,\mathrm{d}s. \end{aligned}$$
Denote \(c_0=(2\pi )^{-p/2}e^{-{\lambda }}\). Let \({{\varvec{\varXi }}}\) be a \(p\times p\) orthogonal matrix whose first row is \({{\varvec{\xi }}}_1/\Vert {{\varvec{\xi }}}_1\Vert\). Making the orthogonal transformation \({{{\varvec{u}}}}=(u_1,u_2,\ldots ,u_p)^t\rightarrow {{\varvec{\varXi }}}^t{{{\varvec{u}}}}\) gives that
$$\begin{aligned} E_1=c_0\int _0^{\infty }\int _{\mathbb {R}^p}\phi \Big (\frac{\Vert {{{\varvec{u}}}}\Vert ^2}{s}\Big )\frac{u_1\Vert {{\varvec{\xi }}}_1\Vert }{\Vert {{{\varvec{u}}}}\Vert ^2}e^{-\Vert {{{\varvec{u}}}}\Vert ^2/2+u_1\Vert {{\varvec{\xi }}}_1\Vert }\,\mathrm{d}{{{\varvec{u}}}}\,g_m(s)\,\mathrm{d}s. \end{aligned}$$
(2.16)
Now, for \(p\ge 2\), we make the following polar coordinate transformation
$$\begin{aligned} {{{\varvec{u}}}}=\begin{pmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_{p-1} \\ u_p \end{pmatrix} = \rho \left( \begin{array}{l} \cos \varphi \\ \sin \varphi \cos \varphi _2 \\ \sin \varphi \sin \varphi _2 \cos \varphi _3 \\ \vdots \\ \sin \varphi \sin \varphi _2 \sin \varphi _3 \cdots \sin \varphi _{p-2}\cos \varphi _{p-1} \\ \sin \varphi \sin \varphi _2 \sin \varphi _3 \cdots \sin \varphi _{p-2}\sin \varphi _{p-1} \end{array}\right) , \end{aligned}$$
where \(\rho >0\), \(0<\varphi <\pi\), \(0<\varphi _i<\pi\)\((i=2,3,\ldots ,p-2)\) and \(0<\varphi _{p-1}<2\pi\). The Jacobian of transformation \({{{\varvec{u}}}}\rightarrow (\rho ,\varphi ,\varphi _2,\varphi _3,\ldots ,\varphi _{p-1})\) is given by \(\rho ^{p-1}\sin ^{p-2}\varphi \sin ^{p-3}\varphi _2 \cdots \sin \varphi _{p-2}\), so (2.16) can be rewritten as
$$\begin{aligned} E_1&=c_1\int _0^{\infty }\int _0^{\infty }\int _0^\pi \phi \Big (\frac{\rho ^2}{s}\Big ) \frac{\Vert {{\varvec{\xi }}}_1\Vert \cos \varphi }{\rho }e^{-\rho ^2/2+\rho \Vert {{\varvec{\xi }}}_1\Vert \cos \varphi }{\nonumber }\\&\quad \times \rho ^{p-1}\sin ^{p-2}\varphi \,\mathrm{d}\varphi \,\mathrm{d}\rho \,g_m(s)\,\mathrm{d}s, \end{aligned}$$
with
$$\begin{aligned} c_1 =c_0\int _0^{2\pi }\,\mathrm{d}\varphi _{p-1}\prod _{i=2}^{p-2}\int _0^\pi \sin ^{p-i-1}\varphi _i\,\mathrm{d}\varphi _i. \end{aligned}$$
Note here that, for an even n,
$$\begin{aligned} \int _0^\pi \sin ^m\varphi \cos ^n\varphi \,\mathrm{d}\varphi = \frac{{\varGamma }[(m+1)/2]{\varGamma }[(n+1)/2]}{{\varGamma }[(m+n+2)/2]} \end{aligned}$$
and, for an odd n, the above definite integral is zero. Thus, it is seen that
$$\begin{aligned} c_1 =\frac{2^{1-p/2}\pi ^{-1/2}e^{-{\lambda }}}{{\varGamma }[(p-1)/2]} \end{aligned}$$
and
$$\begin{aligned} \int _0^\pi e^{\rho \Vert {{\varvec{\xi }}}_1\Vert \cos \varphi } \cos \varphi \sin ^{p-2}\varphi \,\mathrm{d}\varphi&=\sum _{j=0}^{\infty }\frac{\rho ^j\Vert {{\varvec{\xi }}}_1\Vert ^j}{j!} \int _0^\pi \cos ^{j+1}\varphi \sin ^{p-2}\varphi \,\mathrm{d}\varphi \\&=\sum _{k=0}^{\infty }\frac{\rho ^{2k+1}}{k!}{\lambda }^k\frac{\pi ^{1/2}\Vert {{\varvec{\xi }}}_1\Vert {\varGamma }[(p-1)/2]}{2^k(p+2k){\varGamma }[(p+2k)/2]}, \end{aligned}$$
so that
$$\begin{aligned} E_1=\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k) \int _0^{\infty }\int _0^{\infty }\phi \Big (\frac{\rho ^2}{s}\Big ) \frac{\rho ^{p+2k-1}e^{-\rho ^2/2}}{{\varGamma }[(p+2k)/2]2^{p/2+k-1}}\,\mathrm{d}\rho \,g_m(s)\,\mathrm{d}s. \end{aligned}$$
The change of variables \(w=\rho ^2\) leads to completeness of the proof of (i).
(ii) Denote
$$\begin{aligned} E_2=E\bigg [\phi \Big (\frac{\Vert {{{\varvec{U}}}}\Vert ^2}{S}\Big )\frac{({{{\varvec{U}}}}^t{{\varvec{\xi }}})^2}{\Vert {{{\varvec{U}}}}\Vert ^{4}}\bigg ]. \end{aligned}$$
Using the same arguments as in the proof of (i), we obtain
$$\begin{aligned} E_2&=c_1\int _0^{\infty }\int _0^{\infty }\int _0^\pi \phi \Big (\frac{\rho ^2}{s}\Big ) \frac{\Vert {{\varvec{\xi }}}_1\Vert ^2\cos ^2\varphi }{\rho ^2}e^{-\rho ^2/2+\rho \Vert {{\varvec{\xi }}}_1\Vert \cos \varphi } \\&\quad \times \rho ^{p-1}\sin ^{p-2}\varphi \,\mathrm{d}\varphi \,\mathrm{d}\rho \,g_m(s)\,\mathrm{d}s. \end{aligned}$$
Since
$$\begin{aligned} \int _0^\pi e^{\rho \Vert {{\varvec{\xi }}}_1\Vert \cos \varphi }\cos ^2\varphi \sin ^{p-2}\varphi \,\mathrm{d}\varphi =\sum _{k=0}^{\infty }\frac{\rho ^{2k}}{k!}{\lambda }^k\frac{\pi ^{1/2}(1+2k){\varGamma }[(p-1)/2]}{2^k(p+2k){\varGamma }[(p+2k)/2]}, \end{aligned}$$
it is observed that
$$\begin{aligned} E_2=\sum _{k=0}^{\infty }P_{\lambda }(k) \frac{2{\lambda }(1+2k)}{p+2k} I_2(k|\phi ), \end{aligned}$$
where
$$\begin{aligned} I_2(k|\phi )&=\int _0^{\infty }\int _0^{\infty }\frac{1}{\rho ^2}\phi \Big (\frac{\rho ^2}{s}\Big )\frac{\rho ^{p+2k-1}e^{-\rho ^2/2}}{{\varGamma }[(p+2k)/2]2^{p/2+k-1}}\,\mathrm{d}\rho \,g_m(s)\,\mathrm{d}s \\&=\int _0^{\infty }\int _0^{\infty }\frac{1}{w}\phi \Big (\frac{w}{s}\Big )g_{p+2k}(w)\,\mathrm{d}w \,g_m(s)\,\mathrm{d}s. \end{aligned}$$
Hence the proof of (ii) is complete. \(\square\)

Lemma 2.2

Let\({{{\varvec{U}}}}\sim \mathcal{N}_p({{\varvec{\xi }}},{\sigma }^2I_p)\). Letibe a natural number such that\(i<p/2\). Denote byKthe Poisson random variable with mean\({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\). Then we have
$$\begin{aligned} E\left[ \frac{{\sigma }^{2i}}{\Vert {{{\varvec{U}}}}\Vert ^{2i}}\right] ={\left\{ \begin{array}{ll} \prod \nolimits _{j=1}^i(p-2j)^{-1} &{} \quad \hbox {if }\; {{\varvec{\xi }}}=0_p, \\ E\left[ \prod \nolimits _{j=1}^i(p+2K-2j)^{-1}\right] &{} \quad \hbox {otherwise}. \end{array}\right. } \end{aligned}$$

Proof

We employ the same notation as in Lemma 2.1. Note that, when \({{\varvec{\xi }}}\ne 0_p\), \(\Vert {{{\varvec{U}}}}\Vert ^2/{\sigma }^2\) follows the noncentral chi-square distribution with p degrees of freedom and noncentrality parameter \(\Vert {{\varvec{\xi }}}\Vert ^2/{\sigma }^2\). Since the p.d.f. of the noncentral chi-square distribution is given by \(\sum _{k=0}^{\infty }P_{\lambda }(k)g_{p+2k}(w)\), it is seen that
$$\begin{aligned} E\bigg [\frac{{\sigma }^{2i}}{\Vert {{{\varvec{U}}}}\Vert ^{2i}}\bigg ]&=\sum _{k=0}^{\infty }P_{\lambda }(k)\int _0^{\infty }w^{-i}g_{p+2k}(w)\,\mathrm{d}w \\&=\sum _{k=0}^{\infty }P_{\lambda }(k)\prod _{j=1}^i(p+2k-2j)^{-1} =E\bigg [\prod _{j=1}^i(p+2K-2j)^{-1}\bigg ] \end{aligned}$$
for \(p-2i>0\). If \({{\varvec{\xi }}}=0_p\), then \(\Vert {{{\varvec{U}}}}\Vert ^2/{\sigma }^2\sim \chi _p^2\), so that \(E[{\sigma }^{2i}/\Vert {{{\varvec{U}}}}\Vert ^{2i}]=\prod _{j=1}^i(p-2j)^{-1}\) for \(p-2i>0\). Thus, the proof is complete. \(\square\)

The following lemma is given in Hudson (1978).

Lemma 2.3

LetK be a Poisson random variable with mean\({\lambda }\). Letg be a function satisfying\(|g(-1)| < \infty\) and\(E[|g(K)|]<{\infty }\). Then we have\({\lambda }E [g(K)]=E[Kg(K-1)]\).

3 Bias reduction

In this section, some results are presented for the bias reduction in slope estimation. First, we give an alternative expression for the bias of the LS estimator \(\hat{{\beta }}^{LS}\).

Lemma 3.1

LetKbe a Poisson random variable with mean\({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\). If\(p\ge 2\), then the bias of\(\hat{{\beta }}^{LS}\)is finite. Furthermore, if\(p\ge 3\), the bias of\(\hat{{\beta }}^{LS}\)can be expressed as
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}^{LS};{\beta })=-E\bigg [\frac{p-2}{p+2K-2}\bigg ]{\beta }. \end{aligned}$$

Proof

Using identity (2.14) gives that for \(p\ge 2,\)
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}^{LS};{\beta })=E\bigg [\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]{\beta }-{\beta }=E\bigg [\frac{2{\lambda }}{p+2K}\bigg ]{\beta }-{\beta }. \end{aligned}$$
(3.1)
If \(p\ge 3\), we apply Lemma 2.3 to (3.1) so as to obtain
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}^{LS};{\beta }) =E\bigg [\frac{2K}{p+2K-2}\bigg ]{\beta }-{\beta }=-E\bigg [\frac{p-2}{p+2K-2}\bigg ]{\beta }. \end{aligned}$$
Hence, the proof is complete. \(\square\)

We next derive a simple form for the bias of \(\hat{{\beta }}_\ell ^{BR}\), given in (2.11).

Proposition 3.1

LetK be a Poisson random variable with mean\({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\). Assume that\(p\ge 5\). If\(\ell <(p-2)/2\), then\(\mathrm{Bias}(\hat{{\beta }}_\ell ^{BR};{\beta })\) can be expressed as
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}_\ell ^{BR};{\beta })=-E\bigg [\prod _{j=1}^{\ell +1}\frac{p-2j}{p+2K-2j}\bigg ]{\beta }. \end{aligned}$$

Proof

We prove a case when \(\ell \ge 1\) because the \(\ell =0\) case is equivalent to Lemma 3.1. Note that
$$\begin{aligned} E[\hat{{\beta }}_\ell ^{BR}]=E[\hat{{\beta }}^{LS}]+E\bigg [\sum _{j=1}^\ell \frac{a_j}{b_j}\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]{\beta }, \end{aligned}$$
which implies from Lemma 3.1 that
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}_\ell ^{BR};{\beta })=-E\bigg [\frac{p-2}{p+2K-2}\bigg ]{\beta }+E\bigg [\sum _{j=1}^\ell \frac{a_j}{b_j}\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]{\beta }. \end{aligned}$$
(3.2)
Since \(E[X^j]=b_j\) for \(j=1,\ldots ,\ell\) when \(X\sim \chi ^2_m\), using (i) of Lemma 2.1 and Lemma 2.2 gives
$$\begin{aligned} E\bigg [\frac{a_j}{b_j}\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]{\nonumber }&=\frac{a_j}{b_j}\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)\int _0^{\infty }\int _0^{\infty }\frac{s^j}{w^j}g_{p+2k}(w)\,\mathrm{d}w \,g_m(s)\,\mathrm{d}s {\nonumber }\\&=a_j\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)\int _0^{\infty }\frac{1}{w^j}g_{p+2k}(w)\,\mathrm{d}w {\nonumber }\\&=E\bigg [\frac{2{\lambda }}{p+2K} \prod _{i=1}^{j}\frac{p-2i}{p+2K-2i}\bigg ] \end{aligned}$$
(3.3)
for \(p-2j>0\). Applying Lemma 2.3 to (3.3) gives that for \(p-2-2j>0,\)
$$\begin{aligned} E\bigg [\frac{a_j}{b_j}\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ] =E\bigg [\frac{2K}{p+2K-2} \prod _{i=1}^{j}\frac{p-2i}{p+2K-2i-2}\bigg ], \end{aligned}$$
which is substituted into (3.2) to obtain
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}_\ell ^{BR};{\beta })=-E\bigg [\frac{p-2}{p+2K-2}-\frac{2K}{p+2K-2}\sum _{j=1}^\ell \prod _{i=1}^{j}\frac{p-2i}{p+2K-2i-2}\bigg ]{\beta }. \end{aligned}$$
It is here observed that
$$\begin{aligned}&\frac{p-2}{p+2K-2}-\frac{2K}{p+2K-2}\sum _{j=1}^\ell \prod _{i=1}^{j}\frac{p-2i}{p+2K-2i-2}\\&\quad =\prod _{j=1}^{2}\frac{p-2j}{p+2K-2j}-\frac{2K}{p+2K-2}\sum _{j=2}^\ell \prod _{i=1}^{j}\frac{p-2i}{p+2K-2i-2}\\&\quad =\cdots =\prod _{j=1}^{\ell +1}\frac{p-2j}{p+2K-2j}, \end{aligned}$$
which yields that, for \(p-2\ell -2>0\),
$$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}_\ell ^{BR};{\beta })=-E\bigg [\prod _{j=1}^{\ell +1}\frac{p-2j}{p+2K-2j}\bigg ]{\beta }. \end{aligned}$$
Hence, the proof is complete. \(\square\)
If k is a nonnegative integer and \(\ell \ge 1\), it follows that
$$\begin{aligned} 0< \prod _{j=1}^{\ell +1}\frac{p-2j}{p+2k-2j} \le \prod _{j=1}^\ell \frac{p-2j}{p+2k-2j} \le \cdots \le \frac{p-2}{p+2k-2}. \end{aligned}$$
Combining Lemma 3.1 and Proposition 3.1 immediately yields the following proposition.

Proposition 3.2

If\(1\le \ell <(p-2)/2\), then for any\({\beta }\)
$$\begin{aligned} |\mathrm{Bias}(\hat{{\beta }}_\ell ^{BR};{\beta })| \le |\mathrm{Bias}(\hat{{\beta }}_{\ell -1}^{BR};{\beta })| \le \cdots \le |\mathrm{Bias}(\hat{{\beta }}_1^{BR};{\beta })| \le |\mathrm{Bias}(\hat{{\beta }}^{LS};{\beta })|. \end{aligned}$$

The following theorem specifies a general condition that \(\hat{{\beta }}_\phi\), given in (2.10), reduces the bias of \(\hat{{\beta }}^{LS}\) in a finite sample setup.

Theorem 3.1

Assume that\(p\ge 5\). Let the\(a_j\)s and the\(b_j\)s be defined as in (2.11). Assume that\(\phi (t)\)is bounded as\(0\le \phi (t)\le 2 \sum _{j=1}^\ell (a_j/b_j)t^{-j}\)for any\(t>0\)and a fixed natural number\(\ell\). If\(\ell <(p-2)/2\), then the bias of\(\hat{{\beta }}_\phi\)is smaller than that of\(\hat{{\beta }}^{LS}\)for any\({\beta }\).

Proof

Using the same arguments as in (3.2), we can express \(|\mathrm{Bias}(\hat{{\beta }}_\phi ;{\beta })|\) as \(|\mathrm{Bias}(\hat{{\beta }}_\phi ;{\beta })|=|-E_0+E_\phi |\cdot |{\beta }|\), where
$$\begin{aligned} E_0=E\bigg [\frac{p-2}{p+2K-2}\bigg ],\quad E_\phi =E\bigg [\phi \bigg (\frac{\Vert {{{\varvec{U}}}}\Vert ^2}{S}\bigg )\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]. \end{aligned}$$
From Lemma 3.1, it suffices to show that \(|-E_0+E_\phi |\le E_0\) or, equivalently, that
$$\begin{aligned} -2E_0\le -2E_0+E_\phi \le 0. \end{aligned}$$
(3.4)
Since \(\phi (t)\ge 0\) for any t, it follows from (i) of Lemma 2.1 that \(E_\phi \ge 0\). Thus, the first inequality of (3.4) is valid.
Combining (i) of Lemma 2.1 and the given boundedness assumption on \(\phi\) yields that
$$\begin{aligned} E_\phi&=\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)\int _0^{\infty }\int _0^{\infty }\phi \Big (\frac{w}{s}\Big )g_{p+2k}(w)\,\mathrm{d}w \,g_m(s)\,\mathrm{d}s \\&\le \sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)\int _0^{\infty }\int _0^{\infty }2\bigg \{\sum _{j=1}^\ell \frac{a_j}{b_j}\Big (\frac{s}{w}\Big )^j\bigg \}g_{p+2k}(w)\,\mathrm{d}w \,g_m(s)\,\mathrm{d}s \\&=2 E\bigg [\sum _{j=1}^\ell \frac{a_j}{b_j}\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]. \end{aligned}$$
Hence, by the same arguments as in the proof of Proposition 3.1, it is seen that
$$\begin{aligned} -2E_0+E_\phi&\le -2E_0+2E\bigg [\sum _{j=1}^\ell \frac{a_j}{b_j}\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^j\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]\\&=-2E\bigg [\prod _{j=1}^{\ell +1}\frac{p-2j}{p+2K-2j}\bigg ]\le 0, \end{aligned}$$
which implies that the second inequality of (3.4) is valid. \(\square\)

The following proposition clarifies a condition that the bias of \(\hat{{\beta }}_\ell ^{ST}\) given in (2.6) is smaller than that of \(\hat{{\beta }}^{LS}\).

Proposition 3.3

Let\(p>4\)and let\(\ell\)be a given natural number such that\(\ell <(p-2)/2\). If\((p/m)^\ell \le 2a_\ell /b_\ell\), then\(\hat{{\beta }}_\ell ^{ST}\)has a smaller bias than\(\hat{{\beta }}^{LS}\).

Proof

We define \(\phi _\ell ^{ST}(t)=\sum _{j=1}^\ell (p/m)^j t^{-j}\) for \(t>0\) and then \(\hat{{\beta }}_\ell ^{ST}\) can be expressed as \(\hat{{\beta }}_\ell ^{ST}=\{1+\phi _\ell ^{ST}(\Vert {{{\varvec{U}}}}\Vert ^2/S)\}\hat{{\beta }}^{LS}\). Since \((p-2j)/p\le 1\) and \(m/(m+2j-2)\le 1\) for \(j=1,\dots ,\ell\), it is observed that
$$\begin{aligned} 2\frac{a_\ell }{b_\ell }-\Big (\frac{p}{m}\Big )^\ell&=\Big \{2\frac{p-2\ell }{p}\frac{m}{m+2\ell -2}\frac{a_{\ell -1}}{b_{\ell -1}}-\Big (\frac{p}{m}\Big )^{\ell -1}\Big \}\frac{p}{m} \\&\le \Big \{2\frac{a_{\ell -1}}{b_{\ell -1}}-\Big (\frac{p}{m}\Big )^{\ell -1}\Big \}\frac{p}{m} \\&\le \cdots \le \Big (2\frac{a_{1}}{b_{1}}-\frac{p}{m}\Big )\Big (\frac{p}{m}\Big )^{\ell -1} =\frac{p-4}{m}\Big (\frac{p}{m}\Big )^{\ell -1}. \end{aligned}$$
The above inequalities imply that, if \((p/m)^\ell \le 2a_\ell /b_\ell\) for a given natural number \(\ell\), it follows that \((p/m)^j\le 2a_j/b_j\) for each \(j\in \{1,\ldots ,\ell -1\}\). Hence, when \((p/m)^\ell \le 2a_\ell /b_\ell\), we obtain
$$\begin{aligned} \phi _\ell ^{ST}(t)=\sum _{j=1}^\ell \Big (\frac{p}{m}\Big )^j t^{-j} \le 2 \sum _{j=1}^\ell \frac{a_j}{b_j}t^{-j}, \end{aligned}$$
and using Theorem 3.1 gives that \(|\mathrm{Bias}(\hat{{\beta }}_\ell ^{ST};{\beta })|\le |\mathrm{Bias}(\hat{{\beta }}^{LS};{\beta })|\). \(\square\)

We provide some other applications of Theorem 3.1 below.

Example 3.1

Theorem 3.1 immediately gives that the bias of \(\hat{{\beta }}_{\ell \cdot 2}^{BR}\) given in (2.12) is smaller than that of \(\hat{{\beta }}^{LS}\). It is, however, noted that the bias of \(\hat{{\beta }}_{\ell \cdot 2}^{BR}\) does not always have the same sign as that of \(\hat{{\beta }}^{LS}\). \(\square\)

Example 3.2

The first moment of \(\hat{{\beta }}^{MM}\) is not finite. Such an estimator not having finite moments can be modified by Theorem 3.1.

Assume that an estimator of \({\beta }\) has the form \(\hat{{\beta }}_{\bar{\phi }}=\{1+\bar{\phi }(\Vert {{{\varvec{U}}}}\Vert ^2/S)\}\hat{{\beta }}^{LS}\), where \(\hat{{\beta }}_{\bar{\phi }}\) does not necessarily have finite bias. Let \(\hat{{\beta }}_{\phi _\ell ^*}=\{1+\phi _\ell ^*(\Vert {{{\varvec{U}}}}\Vert ^2/S)\}\hat{{\beta }}^{LS}\), where \(\ell\) is a natural number and
$$\begin{aligned} \phi _\ell ^*(t)=\max \bigg [0,\min \bigg \{\bar{\phi }(t), \sum _{j=1}^\ell \frac{a_j}{b_j}t^{-j}\bigg \}\bigg ]. \end{aligned}$$
It is clear that \(0\le \phi _\ell ^*(t)\le 2\sum _{j=1}^\ell (a_j/b_j)t^{-j}\). Hence, if \(\ell <(p-2)/2\), then the bias of \(\hat{{\beta }}_{\phi _\ell ^*}\) is finite and smaller than that of \(\hat{{\beta }}^{LS}\) for any \({\beta }\). \(\square\)

Example 3.3

The second moment of \(\hat{{\beta }}_\ell ^{BR}\) is always larger than that of \(\hat{{\beta }}^{LS}\). Thus there is a considerable risk that \(\hat{{\beta }}_\ell ^{BR}\) has larger variance and MSE than \(\hat{{\beta }}^{LS}\). To reduce the risk, we consider, for example, the following truncation rule
$$\begin{aligned} \phi _\ell ^{**}(t)={\left\{ \begin{array}{ll} \sum \nolimits _{j=1}^\ell (a_j/b_j)t^{-j} &{}\quad \hbox {if }\;t>1, \\ (a_1/b_1)t^{-1} &{} \quad \hbox {otherwise}. \end{array}\right. } \end{aligned}$$
Then, the resulting estimator \(\hat{{\beta }}_{\phi _\ell ^{**}}=\{1+\phi _\ell ^{**}(\Vert {{{\varvec{U}}}}\Vert ^2/S)\}\hat{{\beta }}^{LS}\) always has a smaller second moment than \(\hat{{\beta }}_\ell ^{BR}\). \(\square\)

4 MSE reduction

In this section, a unified method is provided for the MSE reduction not only for \(\hat{{\beta }}^{LS}\) and \(\hat{{\beta }}^{GG}\), but also for the bias-reduced estimators \(\hat{{\beta }}_\phi\) given in Sect. 3.

4.1 Preliminaries

Suppose that an estimator of the slope \({\beta }\) in reparametrized model (2.2) depends only on \({{{\varvec{Z}}}}\), \({{{\varvec{U}}}}\) and S,  but not on \(Z_0\) and \(U_0\). Recall that
$$\begin{aligned} {{{\varvec{Z}}}}\sim \mathcal{N}_p({\beta }{{\varvec{\xi }}},\tau ^2I_p),\quad {{{\varvec{U}}}}\sim \mathcal{N}_p({{\varvec{\xi }}},{\sigma }^2I_p),\quad S\sim {\sigma }^2\chi _m^2. \end{aligned}$$
(4.1)
If \(\tau ^2={\sigma }^2\) in partial model (4.1), the problem of estimating \({\beta }\) is just the same as a linear calibration problem. More precisely, the MSE reduction problem for \(\hat{{\beta }}^{LS}\) corresponds to that for what is called a classical estimator in the multivariate linear calibration problem with a single independent variable. For details of the linear calibration problem, see Kubokawa and Robert (1994), who derived an alternative to the classical estimator under the MSE criterion. See also Osborne (1991), Brown (1993) and Sundberg (1999) for a general overview of the calibration problem.
Let \(V=\Vert {{{\varvec{U}}}}\Vert ^2/(S+\Vert {{{\varvec{U}}}}\Vert ^2)\) and let \(\psi (v)\) be a function on the interval (0, 1). In this section, we consider an alternative estimator of the form
$$\begin{aligned} \hat{{\beta }}_\psi =\psi (V)\hat{{\beta }}^{LS}=\psi (V)\frac{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}}{\Vert {{{\varvec{U}}}}\Vert ^2}. \end{aligned}$$
It is clear that
$$\begin{aligned} \mathrm{MSE}(\hat{{\beta }}_\psi ;{\beta })&=E\bigg [\psi ^2(V)\frac{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}{{{\varvec{Z}}}}^t{{{\varvec{U}}}}}{\Vert {{{\varvec{U}}}}\Vert ^4}-2{\beta }\psi (V)\frac{{{{\varvec{U}}}}^t{{{\varvec{Z}}}}}{\Vert {{{\varvec{U}}}}\Vert ^2}-{\beta }^2 \bigg ]. \end{aligned}$$
Taking expectation with respect to \({{{\varvec{Z}}}}\sim \mathcal{N}_p({\beta }{{\varvec{\xi }}},\tau ^2I_p)\) gives that
$$\begin{aligned} \mathrm{MSE}(\hat{{\beta }}_\psi ;{\beta })=\tau ^2E\bigg [\frac{\psi ^2(V)}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]+{\beta }^2E\bigg [\bigg \{\psi (V)\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}-1\bigg \}^2\bigg ]. \end{aligned}$$
(4.2)
Hence, if \(\psi ^2(V)\le 1\) and
$$\begin{aligned} E\bigg [\bigg \{\psi (V)\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}-1\bigg \}^2\bigg ]\le E\bigg [\bigg \{\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}-1\bigg \}^2\bigg ], \end{aligned}$$
(4.3)
then \(\hat{{\beta }}_\psi\) has a smaller MSE than \(\hat{{\beta }}^{LS}\).

As pointed out by Kubokawa and Robert (1994), condition (4.3) is closely related to a statistical control problem. The control problem is formulated as the problem of estimating a normal mean vector \({{\varvec{\xi }}}\), where the accuracy of an estimator \(\hat{{{\varvec{\xi }}}}\) is measured by loss \((\hat{{{\varvec{\xi }}}}{}^t{{\varvec{\xi }}}-1)^2\). For more details of the statistical control problem, the reader is referred to Zellner (1971) and also to Zaman (1981), Berger et al. (1982) and Aoki (1989).

In Kubokawa and Robert (1994), the IERD method (Kubokawa 1994) plays an important role in checking condition (4.3). Here, we do not employ the IERD method and we directly evaluate the expectations in (4.3) with the help of a Poisson variable.

Lemma 4.1

For nonnegative integersk, denote by\(P_{\lambda }(k)\)the Poisson probabilities with mean\({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\). Assume that\(\psi (V){{{\varvec{U}}}}^t{{\varvec{\xi }}}/\Vert {{{\varvec{U}}}}\Vert ^2\)has a finite second moment. Then we have
$$\begin{aligned} E\bigg [\bigg \{\psi (V)\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}-1\bigg \}^2\bigg ]-1 =\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)H_\psi (k), \end{aligned}$$
where
$$\begin{aligned} H_\psi (k)&=\int _0^1\bigg \{\frac{1+2k}{p+2k+m-2}\frac{\psi ^2(v)}{v}-2\psi (v)\bigg \}f_k(v)\,\mathrm{d}v , \\ f_k(v)&=\frac{{\varGamma }[(p+2k+m)/2]}{{\varGamma }[(p+2k)/2]{\varGamma }[m/2]}v^{(p+2k)/2-1}(1-v)^{m/2-1}. \end{aligned}$$

Proof

Note that V can be interpreted as a function of \(\Vert {{{\varvec{U}}}}\Vert ^2/S\). For that reason, Lemma 2.1 can be used to obtain
$$\begin{aligned} E\bigg [\bigg \{\psi (V)\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}-1\bigg \}^2\bigg ]-1 =\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)H_\psi ^*(k), \end{aligned}$$
where
$$\begin{aligned} H_\psi ^*(k)=\int _0^{\infty }\int _0^{\infty }\bigg \{\frac{1+2k}{w}\psi ^2 \Big (\frac{w}{w+s}\Big )-2\psi \Big (\frac{w}{w+s}\Big )\bigg \}g_m(s)g_{p+2k}(w)\,\mathrm{d}s\,\mathrm{d}w. \end{aligned}$$
For \(H_\psi ^*(k)\), we make the change of variables \(t=s+w\) and \(v=w/(w+s)\) with the Jacobian \(J[(s,w)\rightarrow (t,v)]=t\) and hence
$$\begin{aligned} H_\psi ^*(k)=\int _0^1\int _0^{\infty }\bigg \{\frac{1+2k}{tv}\psi ^2(v) -2\psi (v)\bigg \}g_{p+2k+m}(t)f_k(v)\,\mathrm{d}t\,\mathrm{d}v. \end{aligned}$$
Integrating out with respect to t yields that \(H_\psi ^*(k)=H_\psi (k)\), which completes the proof. \(\square\)

Next, we specify conditions for finiteness of the MSEs of \(\hat{{\beta }}^{LS}\) and \(\hat{{\beta }}_\ell ^{BR}\), where \(\hat{{\beta }}_\ell ^{BR}\) is given in (2.11).

Lemma 4.2

LetKbe a Poisson random variable with mean\({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\). If\(p\ge 3\), the MSE of\(\hat{{\beta }}^{LS}\)is finite and it can be expressed as
$$\begin{aligned} \mathrm{MSE}(\hat{{\beta }}^{LS};{\beta })&=\frac{\tau ^2}{{\sigma }^2}E\bigg [\frac{1}{p+2K-2}\bigg ]{\nonumber }\\&\quad +{\beta }^2E\bigg [\frac{2{\lambda }(1+2K)}{(p+2K)(p+2K-2)}-\frac{4{\lambda }}{p+2K}+1\bigg ]. \end{aligned}$$
(4.4)

Proof

From (4.2), the MSE of \(\hat{{\beta }}^{LS}\) can be written as
$$\begin{aligned} \mathrm{MSE}(\hat{{\beta }}^{LS};{\beta })=\tau ^2E\bigg [\frac{1}{\Vert {{{\varvec{U}}}}\Vert ^2} \bigg ]+{\beta }^2E\bigg [\frac{({{{\varvec{U}}}}^t{{\varvec{\xi }}})^2}{\Vert {{{\varvec{U}}}}\Vert ^4}-2\frac{{{{\varvec{U}}}}^t{{\varvec{\xi }}}}{\Vert {{{\varvec{U}}}}\Vert ^2}+1\bigg ]. \end{aligned}$$
Using identities (2.14) and (2.15) and Lemma 2.2, we obtain the lemma. \(\square\)

If \({\lambda }\) has the same order as n, the first term of the r.h.s. in (4.4) converges to zero as \(n\rightarrow {\infty }\). Hence, the MSE of \(\hat{{\beta }}^{LS}\) is not much influenced by \(\tau ^2\) when n is sufficiently large or when \(\tau ^2\) is sufficiently smaller than \({\sigma }^2\).

Lemma 4.3

Assume that\(p\ge 7\). If\(1\le \ell <(p-2)/4\), the MSE of\(\hat{{\beta }}_\ell ^{BR}\)is finite.

Proof

From (4.2), it is sufficient to derive a condition that
$$\begin{aligned} E\bigg [\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^{2\ell }\frac{1}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]<{\infty }\quad \hbox {and}\quad E\bigg [\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^{2\ell }\frac{({{{\varvec{U}}}}^t{{\varvec{\xi }}})^2}{\Vert {{{\varvec{U}}}}\Vert ^4}\bigg ]<{\infty }. \end{aligned}$$
Lemma 2.2 leads to, for \(p-4\ell -2>0\),
$$\begin{aligned} E\bigg [\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^{2\ell }\frac{1}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ]&=E\bigg [\frac{{\sigma }^{4\ell }}{\Vert {{{\varvec{U}}}}\Vert ^{4\ell +2}}\bigg ] \prod _{j=1}^{2\ell }(m+2j-2)\\&=\frac{1}{{\sigma }^2}E\bigg [\prod _{j=1}^{2\ell +1}\frac{1}{p+2K-2j}\bigg ] \prod _{j=1}^{2\ell }(m+2j-2). \end{aligned}$$
Similarly, using (ii) of Lemma 2.1 and Lemma 2.2 yields that for \(p-4\ell -2>0,\)
$$\begin{aligned} E\bigg [\bigg (\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg )^{2\ell }\frac{({{{\varvec{U}}}}^t{{\varvec{\xi }}})^2}{\Vert {{{\varvec{U}}}}\Vert ^4}\bigg ] =E\bigg [\frac{2{\lambda }(1+2K)}{p+2K}\prod _{j=1}^{2\ell +1}\frac{1}{p+2K-2j}\bigg ]\prod _{j=1}^{2\ell }(m+2j-2). \end{aligned}$$
Hence the finiteness of the MSE of \(\hat{{\beta }}_\ell ^{BR}\) needs \(p-4\ell -2>0\), namely \(\ell <(p-2)/4\). \(\square\)

We can express the MSE of \(\hat{{\beta }}_\ell ^{BR}\) alternatively by using the Poisson random variable as in Lemma 4.2, but it is omitted.

4.2 Main analytical result and some examples

Consider a slope estimator of the form \(\hat{{\beta }}_{\bar{\psi }}=\bar{\psi }(V)\hat{{\beta }}^{LS}\), where \(\bar{\psi }(v)\) is a function of v on the interval (0, 1). Assume that the second moment of \(\hat{{\beta }}_{\bar{\psi }}\) is finite. Suppose that we want to find out an estimator \(\hat{{\beta }}_{\psi }=\psi (V)\hat{{\beta }}^{LS}\) having a smaller MSE than \(\hat{{\beta }}_{\bar{\psi }}\), where \(\psi (v)\) is a function on (0, 1). To this end, \(\psi\) requires some conditions in the following theorem.

Theorem 4.1

If\(\psi ^2(v)\le \bar{\psi }^2(v)\)and
$$\begin{aligned} {\varDelta }(v|\psi ,\bar{\psi })=\psi ^2(v)-\bar{\psi }^2(v)-2(p+m-2)v\{\psi (v) -\bar{\psi }(v)\}\le 0 \end{aligned}$$
for any\(v\in (0,1)\), then\(\mathrm{MSE}(\hat{{\beta }}_{\psi };{\beta })\le \mathrm{MSE}(\hat{{\beta }}_{\bar{\psi }};{\beta })\).

Proof

Since \(\psi ^2(v)\le \bar{\psi }^2(v)\) for any v, \(\hat{{\beta }}_\psi\) inherits the finiteness of the second moment from \(\hat{{\beta }}_{\bar{\psi }}\). By virtue of Lemma 4.1, the difference between the MSEs of \(\hat{{\beta }}_{\psi }\) and \(\hat{{\beta }}_{\bar{\psi }}\) is expressed as
$$\begin{aligned} \mathrm{MSE}(\hat{{\beta }}_{\psi };{\beta })-\mathrm{MSE}(\hat{{\beta }}_{\bar{\psi }};{\beta })&=\tau ^2E\bigg [\frac{\psi ^2(V)-\bar{\psi }^2(V)}{\Vert {{{\varvec{U}}}}\Vert ^2}\bigg ] \\&\quad +{\beta }^2\sum _{k=0}^{\infty }\frac{2{\lambda }}{p+2k}P_{\lambda }(k)\int _0^1{\varDelta }_k(v|\psi ,\bar{\psi })\frac{f_k(v)}{v}\,\mathrm{d}v, \end{aligned}$$
where
$$\begin{aligned} {\varDelta }_k(v|\psi ,\bar{\psi })=\frac{1+2k}{p+2k+m-2}\{\psi ^2(v)-\bar{\psi }^2(v)\}-2v\{\psi (v)-\bar{\psi }(v)\}. \end{aligned}$$
It follows that for any \(k\ge 0,\)
$$\begin{aligned} \frac{1+2k}{p+2k+m-2}\ge \frac{1}{p+m-2}, \end{aligned}$$
which implies that \({\varDelta }_k(v|\psi ,\bar{\psi })\le {\varDelta }_0(v|\psi ,\bar{\psi })={\varDelta }(v|\psi ,\bar{\psi })/(p+m-2)\). Hence, the proof is complete. \(\square\)

Theorem 4.1 is the key to constructing a better estimator under the MSE criterion. In fact, the following proposition provides an improved estimator on \(\hat{{\beta }}^{LS}\).

Proposition 4.1

For a given\(\bar{\psi }\), let\(\psi _0(v)=\max [0,\min \{\bar{\psi }(v), 2(p+m-2)v-\bar{\psi }(v)\}]\). If\(\Pr (\psi _0(V)=\bar{\psi }(V))<1\), then\(\hat{{\beta }}_{\psi _0}=\psi _0(V)\hat{{\beta }}^{LS}\)is better than\(\hat{{\beta }}_{\bar{\psi }}\)under the MSE criterion. Particularly, \(\hat{{\beta }}^{TLS}\), given in (2.13), has a smaller MSE than\(\hat{{\beta }}^{LS}\)for\(p\ge 3\).

Proof

Note that \(0\le \psi _0(v)\le |\bar{\psi }(v)|\) for any \(v\in (0,1)\). It also turns out that
$$\begin{aligned} \psi _0(v)=\left\{ \begin{array}{ll} 2(p+m-2)v-\bar{\psi }(v)&\begin{array}{l} \quad \hbox {if }\; 0\le 2(p+m-2)v-\bar{\psi }(v)\le \bar{\psi }(v), \end{array} \\ \bar{\psi }(v) &{} \begin{array}{l}\quad \hbox {if }\; 0\le \bar{\psi }(v)< 2(p+m-2)v-\bar{\psi }(v), \end{array} \\ 0 &{} \begin{array}{l} \quad \hbox {if }\; 2(p+m-2)v-\bar{\psi }(v)< 0\le \bar{\psi }(v) \\ \quad \hbox {or if }\; \bar{\psi }(v)<0\le 2(p+m-2)v-\bar{\psi }(v), \end{array} \end{array} \right. \end{aligned}$$
which implies that \({\varDelta }(v|\psi _0,\bar{\psi })=0\) if \(\min \{\bar{\psi }(v), 2(p+m-2)v-\bar{\psi }(v)\}\ge 0\) and \({\varDelta }(v|\psi _0,\bar{\psi })=\bar{\psi }(v)\{2(p+m-2)v-\bar{\psi }(v)\}\le 0,\) otherwise. Hence, if \(\Pr (\psi _0(V)=\bar{\psi }(V))<1\), then \(\hat{{\beta }}_{\psi _0}=\psi _0(V)\hat{{\beta }}^{LS}\) is better than \(\hat{{\beta }}_{\bar{\psi }}\) under the MSE criterion.

If \(\bar{\psi }(v)\equiv 1\), then \(\hat{{\beta }}_{\bar{\psi }}=\hat{{\beta }}^{LS}\) and \(\hat{{\beta }}_{\psi _0}=\hat{{\beta }}^{TLS}\), so that the MSE of \(\hat{{\beta }}^{TLS}\) is smaller than that of \(\hat{{\beta }}^{LS}\). \(\square\)

In the following, we show some new estimators.

Example 4.1

Assume additionally that \(\bar{\psi }(v)\ge 1\) for \(0<v<1\). Let
$$\begin{aligned} \psi _1(v)=\max [1,\min \{\bar{\psi }(v), 2(p+m-2)v-\bar{\psi }(v)\}]. \end{aligned}$$
Using the same arguments as in Proposition 4.1, we can prove that if \(\Pr (\psi _1(V)=\bar{\psi }(V))<1,\) then \(\hat{{\beta }}_{\psi _1}\) has a smaller MSE than \(\hat{{\beta }}_{\bar{\psi }}\). Since \(\psi _1(v)\ge 1\) for any \(v\in (0,1)\), it holds true that \(|E[\hat{{\beta }}_{\psi _1}]|\ge |E[\hat{{\beta }}^{LS}]|\), which implies that \(\hat{{\beta }}_{\psi _1}\) not only improves on the MSE of \(\hat{{\beta }}_{\bar{\psi }}\), but also may correct the bias of \(\hat{{\beta }}^{LS}\). \(\square\)

Example 4.2

Assume that \(p\ge 7\). Let
$$\begin{aligned} \bar{\psi }_\ell ^{BR}(v)=1+\sum _{j=1}^\ell \frac{a_j}{b_j}\bigg (\frac{1-v}{v}\bigg )^j. \end{aligned}$$
The MSE of the bias-reduced estimator \(\hat{{\beta }}_\ell ^{BR}=\bar{\psi }_\ell ^{BR}(V)\hat{{\beta }}^{LS}\) is improved by
$$\begin{aligned} \begin{aligned} \hat{{\beta }}_\ell ^{TBR}&=\psi _\ell ^{TBR}(V)\hat{{\beta }}^{LS},\\ \psi _\ell ^{TBR}(v)&=\max [1,\min \{\bar{\psi }_\ell ^{BR}(v),2(p+m-2)v-\bar{\psi }_\ell ^{BR}(v)\}] \end{aligned} \end{aligned}$$
(4.5)
for \(1\le \ell <(p-2)/4\). Note from Theorem 3.1 that \(\hat{{\beta }}_\ell ^{TBR}\) has a smaller bias than \(\hat{{\beta }}^{LS}\).
Guo and Ghosh’s (2012) estimator can be written as \(\hat{{\beta }}^{GG}=\bar{\psi }^{GG}(V)\hat{{\beta }}^{LS}\) with
$$\begin{aligned} \bar{\psi }^{GG}(v)=\bigg [1-\min \bigg \{\frac{p-2}{p},\frac{p-2}{m+2}\frac{1-v}{v}\bigg \}\bigg ]^{-1}. \end{aligned}$$
Define
$$\begin{aligned} \begin{aligned} \hat{{\beta }}^{TGG}&=\psi ^{TGG}(V)\hat{{\beta }}^{LS}, \\ \psi ^{TGG}(v)&=\max [1,\min \{\bar{\psi }^{GG}(v),2(p+m-2)v-\bar{\psi }^{GG}(v)\}]. \end{aligned} \end{aligned}$$
(4.6)
Since \(\Pr (\psi ^{TGG}(V)=\bar{\psi }^{GG}(V))< 1\), \(\hat{{\beta }}^{TGG}\) dominates \(\hat{{\beta }}^{GG}\) under the MSE criterion. \(\square\)

Example 4.3

An improved estimator on \(\hat{{\beta }}^{LS}\) can be obtained by means of Equation (2.4) of Kubokawa and Robert (1994).

Assume that \(\bar{\psi }(v)\ge 0\) for \(0<v<1\). Let \(\psi ^{KR}(v)=\min \{\bar{\psi }(v), (p+m-2)v\}\). Then it is easy to show from Theorem 4.1 that \(\hat{{\beta }}^{KR}=\psi ^{KR}(V)\hat{{\beta }}^{LS}\) has a smaller MSE than \(\hat{{\beta }}_{\bar{\psi }}\) when \(\Pr (\psi ^{KR}(V)=\bar{\psi }(V))<1\). From the above-mentioned, it is obvious that
$$\begin{aligned} \hat{{\beta }}^{TLS2} =\min \{1,(p+m-2)V\}\hat{{\beta }}^{LS} =\min \bigg \{\frac{1}{\Vert {{{\varvec{U}}}}\Vert ^2},\,\frac{p+m-2}{\Vert {{{\varvec{U}}}}\Vert ^2+S}\bigg \}{{{\varvec{U}}}}^t{{{\varvec{Z}}}}\end{aligned}$$
has a smaller MSE than \(\hat{{\beta }}^{LS}\) for \(p\ge 3\). The estimator \(\hat{{\beta }}^{TLS2}\) is quite similar to an estimator given in Corollary 2.2 of Kubokawa and Robert (1994). \(\square\)

5 Numerical studies

5.1 Numerical examples with corn yield data

In this subsection, numerical examples with real data sets illustrate how regression lines are drawn with some estimates.
Table 1

Estimates of the slope and the intercept parameters for Fuller’s (1987) corn-yield data, where \(n=25\) and \(r=2\)

Procedure

\({\beta }\)

\({\alpha }\)

\(\mathrm{LS}^\mathrm{a}\)

0.47693

65.219

\(\mathrm{BR}_1\) b

0.52183

62.185

\(\mathrm{BR}_2\) b

0.52587

61.912

\(\mathrm{BR}_3\) b

0.52623

61.888

\(\mathrm{BR}_4\) b

0.52626

61.886

\(\mathrm{BR}_5\) b

0.52627

61.885

\(\mathrm{BR}_6\) b

0.52627

61.885

\(\mathrm{GG}^\mathrm{c}\)

0.52247

62.142

\(\mathrm{TLS}^\mathrm{d}\)

0.47693

65.219

\(\mathrm{TBR}_1{}^\mathrm{e}\)

0.47693

65.219

\(\mathrm{TBR}_2{}^\mathrm{e}\)

0.47693

65.219

\(\mathrm{TBR}_3{}^\mathrm{e}\)

0.47693

65.219

\(\mathrm{TBR}_4{}^\mathrm{e}\)

0.47693

65.219

\(\mathrm{TBR}_5{}^\mathrm{e}\)

0.47693

65.219

\(\mathrm{TBR}_6{}^\mathrm{e}\)

0.47693

65.219

\(\mathrm{TGG}^\mathrm{f}\)

0.52247

62.142

\(\mathrm{ML}^\mathrm{g}\)

0.52860

61.728

\(\mathrm{IR}^\mathrm{h}\)

0.93854

34.032

\(\mathrm{MM}{}^\mathrm{i}\)

0.53151

61.531

aLS: \(\hat{{\beta }}^{LS}\), given in (2.3);

b\(\mathrm{BR}_\ell\): \(\hat{{\beta }}_\ell ^{BR}\), given in (2.11);

cGG: \(\hat{{\beta }}^{GG}\), given in (2.7);

dTLS: \(\hat{{\beta }}^{TLS}\), given in (2.13);

e\(\mathrm{TBR}_\ell\): \(\hat{{\beta }}_\ell ^{TBR}\), given in (4.5);

fTGG: \(\hat{{\beta }}^{TGG}\), given in (4.6);

gML: \(\hat{{\beta }}^{ML}\), given in (2.8);

hIR: \(\hat{{\beta }}^{IR}\), given in (2.9);

iMM: \(\hat{{\beta }}^{MM}\), given in (2.5).

Table 2

Estimates of the slope and the intercept parameters for DeGracie and Fuller’s (1972) corn-yield data, where \(n=11\) and \(r=2\)

Procedure

\({\beta }\)

\({\alpha }\)

\(\mathrm{LS}\)

0.23972

75.031

\(\mathrm{BR}_1\)

0.59055

52.259

\(\mathrm{BR}_2\)

1.03857

23.179

\(\mathrm{BR}_3\)

1.55946

\(-\)10.632

\(\mathrm{GG}\)

1.19860

12.791

\(\mathrm{TLS}\)

0.23972

75.031

\(\mathrm{TBR}_1\)

0.35083

67.819

\(\mathrm{TBR}_2\)

0.79884

38.739

\(\mathrm{TBR}_3\)

1.31974

4.928

\(\mathrm{TGG}\)

1.19860

12.791

\(\mathrm{ML}\)

0.26902

73.129

\(\mathrm{IR}\)

1.17756

14.157

\(\mathrm{MM}\)

\(-\)0.28904

109.353

We present two numerical examples for corn-yield data sets given in Fuller (1987, Table 3.1.1) and in DeGracie and Fuller (1972, p.934). The data sets consist of the yields of corn with two soil nitrogen contents. The yield and the soil nitrogen content are assumed to be, respectively, dependent (Y) and independent (X) variables, where the data set of DeGracie and Fuller (1972) has duplicate observations of the yield and so the average of the two yields was regarded as one dependent variable. Tables 1 and 2 give some slope estimates for the two data sets and also give the corresponding estimates of the intercept \({\alpha }\), where the estimates of \({\alpha }\) in any procedure are \(Z_0-\hat{{\beta }} U_0\).

Figures 1 and 2 are scatter plots of the two data sets. In the figures, we added some regression lines by using the ordinary LS estimate \((\hat{{\alpha }}^{LS},\hat{{\beta }}^{LS})\), the bias-reduced (BR1) estimate \((\hat{{\alpha }}_1^{BR},\hat{{\beta }}_1^{BR})\), the ML estimate \((\hat{{\alpha }}^{ML},\hat{{\beta }}^{ML})\), the inverse regression (IR) estimate \((\hat{{\alpha }}^{IR},\hat{{\beta }}^{IR})\) and the method of moments (MM) estimate \((\hat{{\alpha }}^{MM},\hat{{\beta }}^{MM})\).

Table 1 and Fig. 1 indicate that \(\hat{{\beta }}_\ell ^{BR}\), \(\hat{{\beta }}^{ML}\) and \(\hat{{\beta }}^{MM}\) take similar values, while Table 2 and Fig. 2 show that they are very different. Even though it theoretically follows that \(0<\hat{{\beta }}^{LS}<\hat{{\beta }}^{ML}\) for \({{{\varvec{U}}}}^t{{{\varvec{Z}}}}>0\), \(\mathrm{ML}\) is just slightly larger than \(\mathrm{LS}\) as for the two data sets.

For the two data sets, \(\mathrm{LS}\) and \(\mathrm{TLS}\) take the same value, and also \(\mathrm{TGG}\) is the same as \(\mathrm{GG}\). Table 2 shows that \(\mathrm{TBR}_\ell\) has a slightly smaller value than \(\mathrm{BR}_\ell\) for each \(\ell\), but, in Table 1, all \(\mathrm{TBR}_\ell\) (\(\ell =1,\ldots ,5\)) are the same as \(\mathrm{LS}\).

From the data set of DeGracie and Fuller (1972), the value of \(S/\Vert {{{\varvec{U}}}}\Vert ^2\) is approximately \(1421.5/706.41\approx 2\) and, as in Table 2, \(\hat{{\beta }}_1^{BR}\) is calculated as
$$\begin{aligned} \hat{{\beta }}_1^{BR}&=\Big (1+\frac{p-2}{m}\frac{S}{\Vert {{{\varvec{U}}}}\Vert ^2}\Big )\hat{{\beta }}^{LS} \\&= \Big (1+\frac{10-2}{11}\times \frac{1421.5}{706.41}\Big )\times 0.23972=0.59055. \end{aligned}$$
When \(S/\Vert {{{\varvec{U}}}}\Vert ^2\) takes a large value, the value of \(\mathrm{BR}_\ell\) increases or decreases progressively as \(\ell\) increases, and only the method of moments estimate for the slope has different sign from other estimates. Furthermore, the value of slope estimate impacts an intercept estimate as long as we use \(Z_0-\hat{{\beta }} U_0\) as an estimate of the intercept.
Fig. 1

Some regression lines for Fuller’s (1987) corn-yield data, where \(n=25\) and \(r=2\)

Fig. 2

Some regression lines for DeGracie and Fuller’s (1972) corn-yield data, where \(n=11\) and \(r=2\)

5.2 Monte Carlo studies for bias and MSE comparison

Next, some results of Monte Carlo simulations are provided to compare the biases and MSEs of slope estimators.

For three different sample sizes \(n=10\), 30 and 100 with \(r=2\), each of the simulated biases and MSEs is based on 500, 000 independent replications of \(({{{\varvec{Z}}}},{{{\varvec{U}}}},S)\). It was assumed that \({\beta }=-5\), \(\tau ^2=10\), and \({\sigma }^2=1\) or 10. For the latent variable \({{\varvec{\xi }}}\), all the elements of \({{\varvec{\xi }}}\) were set to be \(1/\sqrt{10}\) or \(\sqrt{5}\), namely \(\Vert {{\varvec{\xi }}}\Vert ^2=p/10\) or 5p, which implies that \({\sigma }_\xi ^2\equiv \lim _{n\rightarrow {\infty }} \Vert {{\varvec{\xi }}}\Vert ^2/p=1/10\) or 5.

Table 3 shows some values of \({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\) which were assumed for our simulation. For example, the smallest value of \({\lambda }\) is 0.045 when \(n=10\), \({\sigma }_\xi ^2=1/10\) and \({\sigma }^2=10\), and the largest value of \({\lambda }\) is 247.5 when \(n=100\), \({\sigma }_\xi ^2=5\) and \({\sigma }^2=1\).
Table 3

Mean \({\lambda }=\Vert {{\varvec{\xi }}}\Vert ^2/(2{\sigma }^2)\) of the Poisson distribution (\(p=n-1\))

\({\sigma }_\xi ^2\)

\({\sigma }^2\)

n

10

30

100

1/10

1

0.45

1.45

4.95

1/10

10

0.045

0.145

0.495

5

1

22.5

72.5

247.5

5

10

2.25

7.25

24.75

Slope estimators which were investigated in our simulation are the same as those in the preceding subsection. The simulated biases and MSEs are summarized in Table 4, where \(\mathrm{LS}\), \(\mathrm{TLS}\), \(\mathrm{BR}_\ell\)\((\ell =1,\ 5)\), \(\mathrm{TBR}_\ell\)\((\ell =1,\ 5)\), \(\mathrm{GG}\) and \(\mathrm{TGG}\) denote, respectively, \(\hat{{\beta }}^{LS}\), \(\hat{{\beta }}^{TLS}\), \(\hat{{\beta }}_\ell ^{BR}\)\((\ell =1,\ 5)\), \(\hat{{\beta }}_\ell ^{TBR}\)\((\ell =1,\ 5)\), \(\hat{{\beta }}^{GG}\) and \(\hat{{\beta }}^{TGG}\). Since \(\mathrm{BR}_5\) and \(\mathrm{TBR}_5\) have no finite moments for \(n=10\), we omitted them from our simulation. For the same reason, \(\hat{{\beta }}^{ML}\), \(\hat{{\beta }}^{IR}\) and \(\hat{{\beta }}^{MM}\) were omitted.
Table 4

Simulated bias and MSE in slope estimation for \({\beta }=-5\)

\({\sigma }_\xi ^2\)

\({\sigma }^2\)

Estimator

\(n=10\)

\(n=30\)

\(n=100\)

Bias

MSE

Bias

MSE

Bias

MSE

1/10

1

\(\mathrm{LS}\)

4.54

22.17

4.54

21.03

4.54

20.76

\(\mathrm{TLS}\)

4.54

22.16

4.54

21.03

4.54

20.76

\(\mathrm{BR}_1\)

4.12

27.65

4.12

18.67

4.13

17.48

\(\mathrm{TBR}_1\)

4.16

24.00

4.12

18.67

4.13

17.48

\(\mathrm{BR}_5\)

  

2.77

108.32

2.81

12.85

\(\mathrm{TBR}_5\)

  

3.08

24.68

2.81

12.84

\(\mathrm{GG}\)

3.66

33.88

1.52

49.46

\(-\)4.10

197.64

\(\mathrm{TGG}\)

3.72

31.06

1.52

49.38

\(-\)4.10

197.64

1/10

10

\(\mathrm{LS}\)

4.95

24.69

4.95

24.55

4.95

24.52

\(\mathrm{TLS}\)

4.95

24.69

4.95

24.55

4.95

24.52

\(\mathrm{BR}_1\)

4.90

25.41

4.90

24.22

4.90

24.07

\(\mathrm{TBR}_1\)

4.91

24.87

4.90

24.22

4.90

24.07

\(\mathrm{BR}_5\)

  

4.71

49.35

4.71

22.86

\(\mathrm{TBR}_5\)

  

4.76

24.59

4.71

22.85

\(\mathrm{GG}\)

4.85

25.94

4.57

26.39

3.64

30.47

\(\mathrm{TGG}\)

4.86

25.57

4.57

26.37

3.64

30.47

5

1

\(\mathrm{LS}\)

0.70

1.02

0.79

0.78

0.82

0.72

\(\mathrm{TLS}\)

0.70

1.02

0.79

0.78

0.82

0.72

\(\mathrm{BR}_1\)

0.08

1.09

0.12

0.33

0.13

0.11

\(\mathrm{TBR}_1\)

0.08

1.09

0.12

0.33

0.13

0.11

\(\mathrm{BR}_5\)

  

0.00

0.39

0.00

0.12

\(\mathrm{TBR}_5\)

  

0.00

0.39

0.00

0.12

\(\mathrm{GG}\)

0.08

1.29

0.04

0.39

0.01

0.11

\(\mathrm{TGG}\)

0.08

1.29

0.04

0.39

0.01

0.11

5

10

\(\mathrm{LS}\)

3.25

11.22

3.31

11.09

3.33

11.10

\(\mathrm{TLS}\)

3.25

11.22

3.31

11.09

3.33

11.10

\(\mathrm{BR}_1\)

2.06

8.75

2.17

5.46

2.21

5.06

\(\mathrm{TBR}_1\)

2.11

7.75

2.17

5.46

2.21

5.06

\(\mathrm{BR}_5\)

  

0.37

20.98

0.42

2.81

\(\mathrm{TBR}_5\)

  

0.44

11.20

0.42

2.81

\(\mathrm{GG}\)

0.80

13.57

\(-\)2.05

62.49

\(-\)2.15

132.24

\(\mathrm{TGG}\)

0.89

12.45

\(-\)2.05

62.48

\(-\)2.15

132.24

Proposition 3.1 suggests that the bias of \(\hat{{\beta }}_\ell ^{BR}\) is small for a large \({\lambda }\). This has been confirmed by our simulations. In particular, when \({\lambda }\) is large, \(\mathrm{BR}_1\) and \(\mathrm{BR}_5\) substantially improve not only the bias of LS, but also its MSE. When \({\lambda }\) is very small (\(n=10\), \({\sigma }_\xi ^2=1/10\) and \({\sigma }^2=10\)), \(\mathrm{BR}_1\) slightly improves on the bias of LS, while the MSE of \(\mathrm{BR}_1\) is larger than that of \(\mathrm{LS}\). Also, as n increases, the MSEs of \(\mathrm{BR}_1\) and \(\mathrm{BR}_5\) decrease and their absolute values of biases gradually increase, which implies that the variances of \(\mathrm{BR}_1\) and \(\mathrm{BR}_5\) decrease with increasing n.

\(\mathrm{TLS}\) causes only very slight decrease in MSE of \(\mathrm{LS}\). On the other hand, \(\mathrm{TBR}_5\) makes successful reduction in MSE of \(\mathrm{BR}_5\) and, particularly, the reduction is substantial when \(n=30\). This suggests that the truncation rule (4.5) is notably effective in a higher-order bias-reduced estimator.

When \(n=10\), \(\mathrm{TGG}\) makes the MSE improvement on \(\mathrm{GG}\) at the cost of bias. Only \(\mathrm{GG}\) and \(\mathrm{TGG}\) have underestimated \({\beta }\) in some cases. Although \(\mathrm{GG}\) has the MSE convergence to \({\beta }\) under a structural model, the convergence rate is probably just a bit low.

6 Remarks

This paper considered a simple linear regression model with measurement error and discussed the bias and MSE reduction for slope estimation in a finite sample situation. Some new estimators having theoretically smaller bias or MSE than \(\hat{{\beta }}^{LS}\) were derived in Sects. 3 and 4. In Sect. 5, through numerical studies, we compared the new estimators with some estimators studied in the literature.

Although each of estimators has both good point and bad point from theoretical and empirical viewpoints, we can recommend \(\hat{{\beta }}^{BR}_\ell\) or \(\hat{{\beta }}^{TBR}_\ell\) based on our theoretical findings and numerical studies. Particularly, \(\hat{{\beta }}^{BR}_1\) or \(\hat{{\beta }}^{TBR}_1\) probably should be employed in small sample case (n is smaller than about 30). However, \(\hat{{\beta }}^{BR}_\ell\) and \(\hat{\beta }_\ell ^{TBR}\) are not consistent estimators. When n is sufficiently large, \(\hat{{\beta }}^{ML}\) and \(\hat{{\beta }}^{MM}\) can be recommended since they have consistency but not finite moments.

We conclude this paper with some other remarks:
  1. 1.
    For the simple linear regression model (1.1), we assume that \({\sigma }_x^2\) is known. Then, it is assumed that \({\sigma }^2={\sigma }_x^2/r=1\) without loss of generality, and model (1.1) can be reduced to
    $$\begin{aligned} \begin{aligned} Z_0&\sim \mathcal{N}({\alpha }+{\beta }{\theta },\tau ^2),\\ U_0&\sim \mathcal{N}({\theta },1), \end{aligned} \quad \begin{aligned} {{{\varvec{Z}}}}&\sim \mathcal{N}_p({\beta }{{\varvec{\xi }}},\tau ^2I_p),\\ {{{\varvec{U}}}}&\sim \mathcal{N}_p({{\varvec{\xi }}},I_p), \end{aligned} \end{aligned}$$
    (6.1)
    where \(Z_0\), \({{{\varvec{Z}}}}\), \(U_0\) and \({{{\varvec{U}}}}\) are mutually independent, and \({\alpha }\), \({\beta }\), \({\theta }\), \(\tau ^2\) and \({{\varvec{\xi }}}\) are unknown parameters. For such a known-\({\sigma }_x^2\) case, we can use the same arguments as in Sects. 3 and 4 to improve on the bias or the MSE of an ordinary LS estimator even if \(r=1\). For further detail, see Tsukuma (2018).
     
  2. 2.
    Consider here a simple structural model, where the latent variables \({\theta }\) and \({{\varvec{\xi }}}\) follow certain specified probability distributions. Then reparametrized model (2.2) is replaced with a conditional model:
    $$\begin{aligned} \begin{aligned} Z_0|{\theta }&\sim \mathcal{N}({\alpha }+{\beta }{\theta },\tau ^2),\\ U_0|{\theta }&\sim \mathcal{N}({\theta },{\sigma }^2), \end{aligned} \quad \begin{aligned} {{{\varvec{Z}}}}|{{\varvec{\xi }}}&\sim \mathcal{N}_p({\beta }{{\varvec{\xi }}},\tau ^2I_p),\\ {{{\varvec{U}}}}|{{\varvec{\xi }}}&\sim \mathcal{N}_p({{\varvec{\xi }}},{\sigma }^2I_p), \end{aligned} \quad \begin{aligned}&\\ S&\sim {\sigma }^2\chi _m^2. \end{aligned} \end{aligned}$$
    They are conditionally independent given \({\theta }\) and \({{\varvec{\xi }}}\). Let \(\hat{{\beta }}_s^{LS}={{{\varvec{U}}}}^t{{{\varvec{Z}}}}/\Vert {{{\varvec{U}}}}\Vert ^2\), which is the ordinary LS estimator of \({\beta }\). Let \(\hat{{\beta }}_\phi =\{1+\phi (\Vert {{{\varvec{U}}}}\Vert ^2/S)\}\hat{{\beta }}_s^{LS}\), where \(\phi\) is a suitable function. Denote by \(E[\cdot |{\theta },{{\varvec{\xi }}}]\) a conditional expectation with respect to \((Z_0,{{{\varvec{Z}}}},U_0,{{{\varvec{U}}}},S)\) given \({\theta }\) and \({{\varvec{\xi }}}\) and by \(E^{{\theta },{{\varvec{\xi }}}}[\cdot ]\) an expectation with respect to \({\theta }\) and \({{\varvec{\xi }}}\). The bias and MSE of \(\hat{{\beta }}_\phi\) can be written, respectively, as
    $$\begin{aligned} \mathrm{Bias}(\hat{{\beta }}_\phi ;{\beta })&=E^{{\theta },{{\varvec{\xi }}}}[E[\hat{{\beta }}_\phi -{\beta }|{\theta },{{\varvec{\xi }}}]],\\ \mathrm{MSE}(\hat{{\beta }}_\phi ;{\beta })&=E^{{\theta },{{\varvec{\xi }}}}[E[(\hat{{\beta }}_\phi -{\beta })^2|{\theta },{{\varvec{\xi }}}]]. \end{aligned}$$
    Hence, even in the above structural model, it is possible to analytically improve the bias or the MSE of the ordinary LS estimator \(\hat{{\beta }}_s^{LS}\) by means of the reducing methods considered in Sects. 3 and 4.
     
  3. 3.

    In this paper, the MSE reduction of the LS estimator is based on shrinking the LS estimator toward zero, while the bias reduction is achieved by expanding the LS estimator. However, a theoretically exact result on simultaneous reduction for both bias and MSE is still not known in a finite sample situation.

     
  4. 4.

    Estimation of the intercept \({\alpha }\) in reparametrized model (2.2) is an interesting problem. Using the same arguments as in Sects. 3 and 4, we can easily make the bias and MSE reduction of the LS estimator.

    Define a class of estimators for \({\alpha }\) as \(\hat{{\alpha }}_\phi =Z_0-\hat{{\beta }}_\phi U_0\), where \(\hat{{\beta }}_\phi\) is given in (2.10). Note that \(\hat{{\beta }}_\phi\) is independent of \(Z_0\) and \(U_0\). The bias of \(\hat{{\alpha }}_\phi\) is written as
    $$\begin{aligned} \mathrm{Bias}(\hat{{\alpha }}_\phi ;{\alpha })&=E[Z_0-\hat{{\beta }}_\phi U_0]-{\alpha }\\&={\alpha }+{\beta }{\theta }-E[\hat{{\beta }}_\phi ]{\theta }-{\alpha }\\&=-\mathrm{Bias}(\hat{{\beta }}_\phi ;{\beta }){\theta }. \end{aligned}$$
    Thus, as long as we consider the class \(\hat{{\alpha }}_\phi\) as an intercept estimator, the bias reduction in intercept estimation is directly linked to that in slope estimation. More precisely, if \(\hat{{\beta }}_\phi\) satisfies \(|\mathrm{Bias}(\hat{{\beta }}_\phi ;{\beta })|\le |\mathrm{Bias}(\hat{{\beta }}^{LS};{\beta })|\), then we have \(|\mathrm{Bias}(\hat{{\alpha }}_\phi ;{\alpha })|\le |\mathrm{Bias}(\hat{{\alpha }}^{LS};{\alpha })|\).
    Furthermore, it is observed that
    $$\begin{aligned} \mathrm{MSE}(\hat{{\alpha }}_\phi ;{\alpha })&=E[\{Z_0-{\alpha }-{\beta }{\theta }-(\hat{{\beta }}_\phi U_0-{\beta }{\theta })\}^2] \\&=\tau ^2+E[(\hat{{\beta }}_\phi U_0-{\beta }{\theta })^2] \\&=\tau ^2+E[\{\hat{{\beta }}_\phi (U_0-{\theta })-(\hat{{\beta }}_\phi -{\beta }){\theta }\}^2] \\&=\tau ^2+E[\hat{{\beta }}_\phi ^2]{\sigma }^2+\mathrm{MSE}(\hat{{\beta }}_\phi ;{\beta }){\theta }^2, \end{aligned}$$
    which implies that \(\hat{{\alpha }}_\phi\) has a smaller MSE than \(\hat{{\alpha }}^{LS}\) if \(E[\hat{{\beta }}_\phi ^2]\le E[(\hat{{\beta }}^{LS})^2]\) and \(\mathrm{MSE}(\hat{{\beta }}_\phi ;{\beta })\le \mathrm{MSE}(\hat{{\beta }}^{LS};{\beta })\). Hence, alternative intercept estimators to \(\hat{{\alpha }}^{LS}\) can be constructed from several MSE-reduced slope estimators obtained in Sect. 4.
     
  1. 5.

    If there is prior information that the slope \({\beta }\) of (2.2) lies near zero, we should positively use the prior information. In fact, using the prior information yields a good estimator such as an admissible estimator. See Tsukuma (2018), who discusses admissible estimation of the slope \({\beta }\) and the intercept \({\alpha }\) under the MSE criterion.

     

Notes

Acknowledgements

The author would like to thank the two reviewers for their careful review and for helpful comments and suggestions. This work was supported by Grant-in-Aid for Scientific Research (18K11201) from Japan Society for the Promotion of Science.

References

  1. Adcock, R. J. (1877). Note on the method of least squares. Analyst, 4, 183–184.CrossRefzbMATHGoogle Scholar
  2. Adcock, R. J. (1878). A problem in least squares. Analyst, 5, 53–54.CrossRefGoogle Scholar
  3. Anderson, T. W. (1976). Estimation of linear functional relationships: Approximate distributions and connections with simultaneous equations in econometrics. Journal of the Royal Statistical Society Series B, 38, 1–36.MathSciNetzbMATHGoogle Scholar
  4. Anderson, T. W. (1984). Estimating linear statistical relationships. The Annals of Statistics, 12, 1–45.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Aoki, M. (1989). Optimization of stochastic systems (2nd ed.). New York: Academic Press.zbMATHGoogle Scholar
  6. Berger, J. O., Berliner, L. M., & Zaman, A. (1982). General admissibility and inadmissibility results for estimation in a control problem. The Annals of Statistics, 10, 838–856.MathSciNetCrossRefzbMATHGoogle Scholar
  7. Bickel, P. J., & Ritov, Y. (1987). Efficient estimation in the errors in variables model. The Annals of Statistics, 15, 513–540.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Brown, P. J. (1993). Measurement, regression, and calibration. Oxford: Oxford University Press.zbMATHGoogle Scholar
  9. Cheng, C.-L., & Van Ness, J. W. (1999). Statistical regression with measurement error. New York: Oxford University Press.zbMATHGoogle Scholar
  10. DeGracie, J. S., & Fuller, W. A. (1972). Estimation of the slope and analysis of covariance when the concomitant variable is measured with error. Journal of the American Statistical Association, 67, 930–937.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Fuller, W. A. (1987). Measurement error models. New York: Wiley.CrossRefzbMATHGoogle Scholar
  12. Gleser, L. J. (1981). Estimation in a multivariate “errors in variables” regression model: Large sample results. The Annals of Statistics, 9, 24–44.MathSciNetCrossRefzbMATHGoogle Scholar
  13. Guo, M., & Ghosh, M. (2012). Mean squared error of James-Stein estimators for measurement error models. Statistics & Probability Letters, 82, 2033–2043.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Huber, P. J. (1981). Robust Statistics. New York: Wiley.CrossRefzbMATHGoogle Scholar
  15. Hudson, H. M. (1978). A natural identity for exponential families with applications in multiparameter estimation. The Annals of Statistics, 6, 473–484.MathSciNetCrossRefzbMATHGoogle Scholar
  16. James, W., & Stein, C. (1961). Estimation with quadratic loss. In: Proc. Fourth Berkeley Symp. Math. Statist. Probab., 1, (pp. 361–379). Berkeley: Univ. California Press.Google Scholar
  17. Kendall, M. G., & Stuart, A. (1979). The advanced theory of statistics, (4th ed., Vol. 2). London: Griffin.Google Scholar
  18. Kubokawa, T. (1994). A unified approach to improving equivariant estimators. The Annals of Statistics, 22, 290–299.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Kubokawa, T., & Robert, C. P. (1994). New perspectives on linear calibration. Journal of Multivariate Analysis, 51, 178–200.MathSciNetCrossRefzbMATHGoogle Scholar
  20. Nishii, R., & Krishnaiah, P. R. (1988). On the moments of classical estimates of explanatory variables under a multivariate calibration model. Sankhyā Series A, 50, 137–148.MathSciNetzbMATHGoogle Scholar
  21. Osborne, C. (1991). Statistical calibration: A review. International Statistical Review, 59, 309–336.CrossRefzbMATHGoogle Scholar
  22. Reiersøl, W. (1950). Identifiability of linear relationship between variables are subject to error. Econometrica, 23, 375–389.MathSciNetCrossRefzbMATHGoogle Scholar
  23. Stefanski, L. A. (1985). The effects of measurement error on parameter estimation. Biometrika, 72, 583–592.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Sundberg, R. (1999). Multivariate calibration—Direct and indirect regression methodology (with discussion). Scandinavian Journal of Statistics, 26, 161–207.MathSciNetCrossRefzbMATHGoogle Scholar
  25. Tsukuma, H. (2018). Estimation in a simple linear regression model with measurement error. arXiv:1804.03029.
  26. Whittemore, A. S. (1989). Errors-in-variables regression using Stein estimates. Journal of the American Statistical Association, 43, 226–228.Google Scholar
  27. Zaman, A. (1981). A complete class theorem for the control problem and further results on admissibility and inadmissibility. The Annals of Statistics, 9, 812–821.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York: Wiley.zbMATHGoogle Scholar

Copyright information

© Japanese Federation of Statistical Science Associations 2018

Authors and Affiliations

  1. 1.Faculty of MedicineToho UniversityTokyoJapan

Personalised recommendations