1 Introduction

In this article, we present a multivariate linear model which incorporates repeated measurements profiles (growth curves), covariate effects, random effects and effects due to latent variables. The model can be used to analyse quite complex data structures. We are not aware of any mathematical treatment of such a model. Focus is on obtaining explicit estimators because it is easier to understand explicit estimators than estimators derived by some algorithm. The derivation of explicit estimators also implies that it is easier to study properties of estimators and to perform model validation which is part of the statistical paradigm.

A base model here is a bilinear regression model which is often referred to as the growth curve model and was introduced by Potthoff and Roy (1964). It will be assumed that there exists covariate information (background information) which is modeled via fixed linear effects. It is also assumed that due to, for example, the sampling procedure there will be random effects which have an impact on the data, i.e., increase the variation. Further, we exploit the idea of adding latent process information to the model. When measuring many background variables, it is often the case that fewer latent processes are governing these variables. For example, if we make field trials unobserved soil characteristics can be important and it seems reasonable to think of the soil characteristics as latent variables. Another example is when measuring EEG signals on many places on the scalp, the response can be thought to be govern by a few latent variables. The latent variables in this article are taken into account by supposing rank restrictions on parameters and in our presentation a rank restriction on the mean parameters is applied. Note that sometimes in the literature, latent variables are motivating the use of random effects which is a different implementation of the concept of latent variable.

Before defining the model, some notation are introduced. Bold upper cases denote matrices: \({\mathcal {C}}(\varvec{A})\) is the column vector space generated by the columns of \(\varvec{A}\) and \({\mathcal {C}}(\varvec{A})^\perp\) denotes its orthogonal complement. The orthogonal projector on \({\mathcal {C}}(\varvec{A})\) is denoted \(\varvec{P}_{A}\) and equals \(\varvec{P}_{A}=\varvec{A}(\varvec{A}'\varvec{A})^-\varvec{A}'\), where “-” denotes an arbitrary generalized inverse (g-inverse). Note that \(\varvec{I}-\varvec{P}_A\) is a projector on \({\mathcal {C}}(\varvec{A})^\perp\). The rank of \(\varvec{A}\) is denoted \(r(\varvec{A})\). Moreover, we will often write \((\varvec{M})()'\) instead of \((\varvec{M})(\varvec{M})'\), where \(\varvec{M}\) represents any matrix expression. The matrix normal distribution with mean \(\varvec{\mu }\): \(p\times n\) and dispersion \(\varvec{\varPsi }\otimes \varvec{\varSigma }\) (the symbol \(\otimes\) stands for the Kronecker product) is denoted \(N_{p,n}(\varvec{\mu },\varvec{\varSigma },\varvec{\varPsi })\) for matrices of size \(p\times n\) and positive semi-definite matrices \(\varvec{\varSigma }\): \(p\times p\) and \(\varvec{\varPsi }\): \(n\times n\) (see Ohlson et al. 2013).

Now the model which will be considered is presented in detail.

Definition 1

Let

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{B}_2\varvec{C}_2+\varvec{\varTheta }\varvec{F}+\varvec{U}\varvec{Z}+\varvec{E}, \end{aligned}$$

where \(\varvec{A}\): \(p\times q_1\), \(\varvec{C}_1\): \(k_1\times n\), \(\varvec{C}_2\): \(k_2\times n\), \(\varvec{F}\): \(k_3\times n\), are all known matrices, \({\mathcal {C}}(\varvec{F}')\subseteq {\mathcal {C}}(\varvec{C}_1')\), \(r(\varvec{\varTheta })=q_2<\min (p,k_3)\), \(\varvec{Z}\): \(k_4\times n\), \(p\le k_4\), \(\varvec{Z}\varvec{Z}'=\varvec{I}_{k_4}\), \(\varvec{U}\sim N_{p,k_4}(\varvec{0},\varvec{\varSigma }_u,\varvec{I}_{k_4 })\), \(\varvec{E}\sim N_{p,n}(\varvec{0},\varvec{\varSigma }_e,\varvec{I}_n)\), where \(\varvec{U}\) and \(\varvec{E}\) are independently distributed.

Note the important standardization constraint \(\varvec{Z}\varvec{Z}'=\varvec{I}\) which will be utilized later. The parameters which are to be estimated are \(\varvec{B}_1\), \(\varvec{B}_2\), \(\varvec{\varTheta }\), \(\varvec{\varSigma }_u\) and \(\varvec{\varSigma }_e\). Instead of \({\mathcal {C}}(\varvec{F}')\subseteq {\mathcal {C}}(\varvec{C}_1')\) given in Definition 1 some other condition can be used which follows from the derivation of the estimates in the next section. Since in this article only estimation is considered, the random effect \(\varvec{U}\varvec{Z}\) will not be predicted.

When in Definition 1\(\varvec{B}_2\varvec{C}_2=\varvec{0}\), \(\varvec{\varTheta }\varvec{F}=\varvec{0}\) and \(\varvec{U}\varvec{Z}=\varvec{0}\), the classical growth curve model appears (see Potthoff and Roy 1964; von Rosen 2018). When \(\varvec{\varTheta }\varvec{F}=\varvec{0}\) and \(\varvec{U}\varvec{Z}=\varvec{0}\), then we have the growth curve model with background information, i.e., a mixture of GMANOVA and MANOVA models; some references to this model, as well as more general models, are Chinchilli and Elswick (1985), Verbyla and Venables (1988), von Rosen (1989) and Bai and Shi (2007). If \(\varvec{B}_2\varvec{C}_2=\varvec{0}\) and \(\varvec{\varTheta }\varvec{F}=\varvec{0}\), then we have the growth curve model with random effects (see Ip et al. 2007) whereas if only \(\varvec{\varTheta }\varvec{F}=\varvec{0}\) holds we refer to Yokoyama and Fujikoshi (1992) and Yokoyama (1995) where similar models are considered and where references to earlier works can be found. In these works, one puts structures on the covariance matrix which leads to somewhat different models than in this article.

2 Two examples

This section presents two examples where the proposed model in Definition 1 can be used.

Example 1

In the first example, a field experiment had been conducted where plant growth was studied under different treatment conditions. A couple of weeks after sowing, plant heights were collected weekly and in total p weeks were studied. In the study, treatments were randomly assigned to plots and each week plants were randomly chosen from plots and plant growth was measured. The experimental units were the plots which were modeled with respect to plant growth over time.

It was assumed that plant growth is linear and the basic model equaled:

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{E}, \end{aligned}$$
(1)

where with the help of \(\varvec{A}\) the linear plant growth over p weeks was modeled. For example when \(p=4\),

$$\begin{aligned} \varvec{A}'=\left( \begin{array}{cccc} 1&{}1&{}1&{}1\\ t_1&{}t_2&{}t_3&{}t_4 \end{array}\right) . \end{aligned}$$

The matrix \(\varvec{C}_1\) in (1) is the design matrix connected to the treatments. If we would have had a univariate response or \(\varvec{A}=\varvec{I}\), when comparing treatments, \(\varvec{C}_1\) is the same design matrix as in ANOVA or MANOVA, respectively.

In particular, one wanted to take into account that fields are inhomogeneous objects with respect to soil characteristics since usually plant growth depends on soil characteristics. Plots were distributed over relatively large areas and, therefore, the characteristics varied among the plots. Thus, it was important to include in (1) soil characteristics as covariable:

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{B}_2\varvec{C}_2+\varvec{E}. \end{aligned}$$
(2)

Moreover, when measuring plant growth, it was not possible to measure all plants within a plot. Instead each week a number of randomly selected plants were measured. Thus, it was natural to include a random effect in (2) leading to the model:

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{B}_2\varvec{C}_2+\varvec{U}\varvec{Z}+\varvec{E}. \end{aligned}$$
(3)

Note that the condition \(\varvec{Z}\varvec{Z}'=\varvec{I}_{k_4}\) standardized the random effects according to the number of plots used to study a specific treatment.

Weather variables were also considered which are important variables for plant growth. However, there were too many weather-related variables which can have an influence on plant growth. For example, every hour during the day/night cycle temperature, precipitation, wind speed, etc., was recorded. All these variables are more or less connected and it was difficult to directly measure their influence on plant growth. Therefore, the concept of latent variable was of interest and latent variables were implemented in our model via rank restrictions on a parameter matrix. Hence, we ended up with the model presented in Definition 1, i.e.,

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{B}_2\varvec{C}_2+\varvec{U}\varvec{Z}+\varvec{\varTheta }\varvec{F}+\varvec{E}. \end{aligned}$$

When the rank of \(\varvec{\varTheta }\) equals r the interpretation is that there exist r latent processes affecting plant growth. The condition \({\mathcal {C}}(\varvec{F}')\subseteq {\mathcal {C}}(\varvec{C}_1')\) in Definition 1 means that the weather observations were taken at the same plots where plant growth was measured.

Example 2

In the second example, it will be shown that the model in Definition 1 can be used to analyze “small area” (small domain) problems. Small area estimation is an active research area in statistics with many applications. An introduction to the subject is for example given in the book by Rao and Molina (2015). The common thread for small area estimation problems is that a survey study (finite population case) has taken place and based on the survey the goal is to extract information about small domains by adding local information which was not accounted for in the comprehensive survey.

Sometimes survey samples are investigated several times. For example, suppose that there exists a national survey which collects information about some specific production from a certain type of companies. The survey samples are followed up once per year and this is ongoing for, say, 5 years. Moreover, suppose that there are 20 regions and each region consists of 10 subregions. The survey can consist of four companies from each subregion so the whole survey comprises 800 companies.

The survey produces for each subregion and year an estimate of the production of interest. The variance of its estimator is also obtained. However, in some sense these estimates are biased since they do not take care of local information (covariables). Moreover, the sample sizes are usually to small to draw firm conclusions about subareas. Therefore, the idea is to borrow strength across subareas via statistical models and background information. Let \(\varvec{x}\) be a vector where all survey estimates of a specific subarea variable are gathered (a specific production in our example). Moreover, if for these variables a linear model is assumed, we can write:

$$\begin{aligned} \varvec{x}=\varvec{\beta }'\varvec{C}_1+\varvec{\epsilon }, \end{aligned}$$

where \(\varvec{\beta }'\varvec{C}_1\) models the true value of the variable of concern and the error term \(\varvec{\epsilon }\) is normally distributed with zero mean and known dispersion \(\varvec{V}\), where the dispersion is determined solely by the survey design. Since \(\varvec{V}\) is known, one can equivalently study \(\varvec{V}^{-1/2}\varvec{x}\) and, therefore, without loss of generality it can be assumed that \(\varvec{V}=\varvec{I}\). Since the observations in the survey are repeatedly measured, and if it is supposed that there exists a second-order polynomial trend over the years, which is of the same form for all subregions, the following model can be set up:

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{E}, \end{aligned}$$
(4)

which models what happens with the companies over time.

Under the assumption of 20 regions, 10 subregions within each region, four companies within each subregion, repeated data collection for each company for 5 years and \(\varvec{V}=\varvec{I}\), then \(\varvec{E}\sim N_{5,800}(\varvec{0},\varvec{\varSigma },\varvec{I})\). In this case, for example,

$$\begin{aligned} \varvec{A}'=\left( \begin{array}{ccccc} 1&{}1&{}1&{}1&{}1\\ t_1&{}t_2&{}t_3&{}t_4&{}t_5\\ t_1^2&{}t_2^2&{}t_3^2&{}t_4^2&{}t_5^2\end{array}\right) . \end{aligned}$$

When expressing the survey estimators with the help of \(\varvec{A}\) and \(\varvec{B}_1\), we cannot say this is always true. Therefore, a random error \(\varvec{U}\sim N_{p,l}(\varvec{0},\varvec{\varSigma }_u,\varvec{I}_l)\) is introduced in (4) and the following model emerges:

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{U}\varvec{Z}+\varvec{E}, \end{aligned}$$
(5)

where \(\varvec{Z}\) is related to \(\varvec{C}_1\), i.e., \({\mathcal {C}}(\varvec{Z}')\subseteq {\mathcal {C}}(\varvec{C}_1')\) and if additionally standardizing the effects \(\varvec{Z}\varvec{Z}'=\varvec{I}\).

There usually exist two types of covariables. One type is accounted for in the survey study and another type of covariables are variables, for example, obtained from registers about the companies in the survey or there are base-line data which were available when the survey started. The effect of the second type of covariables can be modeled by \(\varvec{B}_2\varvec{C}_2\) which included in (5) yields the model:

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{B}_2\varvec{C}_2+\varvec{U}\varvec{Z}+\varvec{E}. \end{aligned}$$

Moreover, suppose that a large cluster of socio-econometric variables exists. It is difficult to express a functional relationship between the survey estimators and the socio-econometric variables. Instead the idea of latent variables is employed leading to rank restrictions on an unknown parameter which should model the effect of these variables. Thus, we arrive again to the model presented in Definition 1:

$$\begin{aligned} \varvec{X}=\varvec{A}\varvec{B}_1\varvec{C}_1+\varvec{B}_2\varvec{C}_2+\varvec{\varTheta }\varvec{F}+\varvec{U}\varvec{Z}+\varvec{E}. \end{aligned}$$

3 Estimation

Let \(\varvec{Q}_1\) and \(\varvec{Q}_2\) be matrices of basis vectors such that

$$\begin{aligned} {\mathcal {C}}(\varvec{Q}_1)={\mathcal {C}}(\varvec{C}_1':\varvec{C}_2':\varvec{Z}'),\qquad {\mathcal {C}}(\varvec{Q}_2)={\mathcal {C}}(\varvec{C}_1':\varvec{C}_2':\varvec{Z}')^\perp \end{aligned}$$

and assume \(\varvec{Q}_i'\varvec{Q}_i=\varvec{I}\), \(i=1,2\), \(\varvec{Q}_1'\varvec{Q}_2=\varvec{0}\), \(v=r(\varvec{C}_1':\varvec{C}_2':\varvec{Z}')\), \(v>k_4\).

A one–one transformation of the model in Definition 1, using \(\varvec{Q}_1\) and \(\varvec{Q}_2\), yields:

$$\begin{aligned} \varvec{X}\varvec{Q}_1= & {} \varvec{A}\varvec{B}_1\varvec{C}_1\varvec{Q}_1+\varvec{B}_2\varvec{C}_2\varvec{Q}_1+\varvec{\varTheta }\varvec{F}\varvec{Q}_1+\varvec{U}\varvec{Z}\varvec{Q}_1+ \varvec{E}\varvec{Q}_1,\end{aligned}$$
(6)
$$\begin{aligned} \varvec{X}\varvec{Q}_2= & {} \varvec{E}\varvec{Q}_2. \end{aligned}$$
(7)

Lemma 1

Let \(\varvec{V}=\varvec{Q}_1'\varvec{Z}'\varvec{Z}\varvec{Q}_1\): \(v \times v\). The matrix \(\varvec{V}\) is idempotent.

Proof

Since \({\mathcal {C}}(\varvec{Z}')\subseteq {\mathcal {C}}(\varvec{Q}_1)\) and \(\varvec{Z}\varvec{Z}'=\varvec{I}\) the lemma is established by straight forward calculations of \(\varvec{V}\varvec{V}\). \(\square\)

From Lemma 1, it follows that:

$$\begin{aligned} \varvec{V}=\varvec{\varGamma }\left( \begin{array}{cc}\varvec{I}_{k_4}&{}\varvec{0}\\ \varvec{0}&{}\varvec{0}\end{array}\right) \varvec{\varGamma }', \end{aligned}$$

where \(\varvec{\varGamma }\): \(v\times v\) is an orthogonal matrix. The identity in (6) is post-multiplied by \(\varvec{\varGamma }\), leading to the model:

$$\begin{aligned} \varvec{X}\varvec{Q}_1\varvec{\varGamma }=\varvec{A}\varvec{B}_1\varvec{C}_1\varvec{Q}_1\varvec{\varGamma }+\varvec{B}_2\varvec{C}_2 \varvec{Q}_1\varvec{\varGamma }+\varvec{\varTheta }\varvec{F}\varvec{Q}_1\varvec{\varGamma }+\varvec{U}\varvec{Z}\varvec{Q}_1\varvec{\varGamma }+ \varvec{E}\varvec{Q}_1\varvec{\varGamma }. \end{aligned}$$
(8)

However, since the dispersion

$$\begin{aligned} D[\varvec{U}\varvec{Z}\varvec{Q}_1\varvec{\varGamma }]=\left( \begin{array}{cc}\varvec{I}_{k_4}&{}\varvec{0}\\ \varvec{0}&{}\varvec{0}\end{array}\right) \otimes \varvec{\varSigma }_u, \end{aligned}$$

the model in (8) is split into two models. Let \(\varvec{\varGamma }=(\varvec{\varGamma }_1:\varvec{\varGamma }_2)\): \(v\times k_4, v\times (v-k_4)\). Then, we have three models which will be used when finding estimators:

$$\begin{aligned}&\varvec{X}\varvec{Q}_1\varvec{\varGamma }_1=\varvec{A}\varvec{B}_1\varvec{C}_1\varvec{Q}_1\varvec{\varGamma }_1+\varvec{B}_2\varvec{C}_2\varvec{Q}_1 \varvec{\varGamma }_1+\varvec{\varTheta }\varvec{F}\varvec{Q}_1+\varvec{U}\varvec{Z}\varvec{Q}_1\varvec{\varGamma }_1+ \varvec{E}\varvec{Q}_1\varvec{\varGamma }_1,\nonumber \\&\quad \varvec{U}\varvec{Z}\varvec{Q}_1\varvec{\varGamma }_1\sim N_{p,k_4}(\varvec{0},\varvec{\varSigma }_u,\varvec{I}_{k_4}),\quad \varvec{E}\varvec{Q}_1\varvec{\varGamma }_1\sim N_{p,k_4}(\varvec{0},\varvec{\varSigma }_e,\varvec{I}_{k_4}), \end{aligned}$$
(9)
$$\begin{aligned}&\varvec{X}\varvec{Q}_1\varvec{\varGamma }_2=\varvec{A}\varvec{B}_1\varvec{C}_1\varvec{Q}_1\varvec{\varGamma }_2+\varvec{B}_2\varvec{C}_2\varvec{Q}_1\varvec{\varGamma }_2+\varvec{\varTheta }\varvec{F}\varvec{Q}_2\varvec{\varGamma }_2+ \varvec{E}\varvec{Q}_1\varvec{\varGamma }_2,\nonumber \\&\quad \varvec{E}\varvec{Q}_1\varvec{\varGamma }_2\sim N_{p,v-k_4}(\varvec{0},\varvec{\varSigma }_e,\varvec{I}_{v-k_4}), \end{aligned}$$
(10)
$$\begin{aligned}&\varvec{X}\varvec{Q}_2=\varvec{E}\varvec{Q}_2, \qquad \varvec{E}\varvec{Q}_2\sim N_{p,n-v}(\varvec{0},\varvec{\varSigma }_e,\varvec{I}_{n-v}). \end{aligned}$$
(11)

The idea is to utilize (10) and (11) to estimate \(\varvec{B}_1\), \(\varvec{B}_2\), \(\varvec{\varTheta }\) and \(\varvec{\varSigma }_e\). Thereafter, these estimators are inserted in (9) which yields simple estimation equations for obtaining \(\varvec{\varSigma }_u\). Suppose that estimators \({{\widehat{\varvec{B}}}}_1\), \({\widehat{\varvec{B}}}_2\), \({\widehat{\varvec{\varTheta }}}\) and \({\widehat{\varvec{\varSigma }}}_e\) have been obtained, and let

$$\begin{aligned} \varvec{Y}_0= & {} \varvec{X}\varvec{Q}_1\varvec{\varGamma }_1-\varvec{A}{\widehat{\varvec{B}}}_1\varvec{C}_1\varvec{Q}_1 \varvec{\varGamma }_1-{\widehat{\varvec{B}}}_2\varvec{C}_2\varvec{Q}_1\varvec{\varGamma }_1-{\widehat{\varvec{\varTheta }}}\varvec{F}\varvec{Q}_1,\\ \varvec{\varPsi }= & {} \varvec{\varSigma }_u+\varvec{\varSigma }_e, \end{aligned}$$

i.e., under the assumption of no randomness in \({\widehat{\varvec{B}}}_1\), \({\widehat{\varvec{B}}}_2\) and \({\widehat{\varvec{\varTheta }}}\) we have a model:

$$\begin{aligned} \varvec{Y}_0={{\widetilde{\varvec{E}}}},\qquad {{\widetilde{\varvec{E}}}}\sim N_{p,k_4}(\varvec{0},\varvec{\varPsi },\varvec{I}_{k_4}). \end{aligned}$$

Based on this model, the maximum likelihood estimator, under the assumption that \(p\le k_4\), \({\widehat{\varvec{\varPsi }}}=k_4^{-1}\varvec{Y}_0\varvec{Y}_0'\), which means that a natural estimator of \(\varvec{\varSigma }_u\) is given by

$$\begin{aligned} {\widehat{\varvec{\varSigma }}}_u=\frac{1}{k}_4\varvec{Y}_0\varvec{Y}_0'-{\widehat{\varvec{\varSigma }}}_e. \end{aligned}$$

This estimator can only be used if \({\widehat{\varvec{\varSigma }}}_u\) is positive definite. However, if the estimator is not positive definite, via eigenvalues of \({\widehat{\varvec{\varSigma }}}_u\) the estimator can be modified.

Now we return to (10) and (11) to estimate \({\widehat{\varvec{B}}}_1\), \({\widehat{\varvec{B}}}_2\), \({\widehat{\varvec{\varTheta }}}\) and \({\widehat{\varvec{\varSigma }}}_e\), and it is convenient to merge the models, i.e.,

$$\begin{aligned} \varvec{X}(\varvec{Q}_1\varvec{\varGamma }_2:\varvec{Q}_2)= & {} \varvec{A}\varvec{B}_1(\varvec{C}_1\varvec{Q}_1\varvec{\varGamma }_2:\varvec{0})+\varvec{B}_2(\varvec{C}_2\varvec{Q}_1\varvec{\varGamma }_2:\varvec{0})\nonumber \\&+\varvec{\varTheta }(\varvec{F}\varvec{Q}_1\varvec{\varGamma }_2:\varvec{0})+{{\widetilde{\varvec{E}}}},\quad {{\widetilde{\varvec{E}}}}\sim N_{p,n-k_4}(\varvec{0},\varvec{\varSigma }_e,\varvec{I}_{n-k_4}). \end{aligned}$$
(12)

Let

$$\begin{aligned} \varvec{Y}= & {} \varvec{X}(\varvec{Q}_1\varvec{\varGamma }_2:\varvec{Q}_2): p\times (n-k_4), \varvec{D}_1=(\varvec{C}_1\varvec{Q}_1\varvec{\varGamma }_2:\varvec{0}): k_1\times (n-k_4), \end{aligned}$$
(13)
$$\begin{aligned} \varvec{D}_2= & {} (\varvec{C}_2\varvec{Q}_1\varvec{\varGamma }_2:\varvec{0}): k_2\times (n-k_4), \varvec{D}_3=(\varvec{F}\varvec{Q}_1\varvec{\varGamma }_2:\varvec{0}): k_3\times (n-k_4) \end{aligned}$$
(14)

and then (12) is identical to

$$\begin{aligned} \varvec{Y}= & {} \varvec{A}\varvec{B}_1\varvec{D}_1+\varvec{B}_2\varvec{D}_2+\varvec{\varTheta }\varvec{D}_3+{{\widetilde{\varvec{E}}}}. \end{aligned}$$
(15)

Furthermore, the likelihood function which corresponds to the model in (15) equals

$$\begin{aligned} L(\varvec{B}_1,\varvec{B}_2,\varvec{\varTheta },\varvec{\varSigma }_e)= & {} (2\pi )^{-\tfrac{1}{2}p(n-k_4)}|\varvec{\varSigma }_e|^{-\tfrac{1}{2}(n-k_4)}\nonumber \\&\times e^{-\tfrac{1}{2}\mathrm tr\{\varvec{\varSigma }_e^{-1}(\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2 \varvec{D}_2-\varvec{\varTheta }\varvec{D}_3)()'\}} v \nonumber \\\le & {} (2\pi )^{-\tfrac{1}{2}p(n-k_4)}|\varvec{\varSigma }_e|^{-\tfrac{1}{2}(n-k_4)}\nonumber \\&\times e^{-\tfrac{1}{2}\mathrm tr\{\varvec{\varSigma }_e^{-1}(\varvec{Y}(\varvec{I}-\varvec{P}_{D_2'}) -\varvec{A}\varvec{B}_1\varvec{D}_1(\varvec{I}-\varvec{P}_{D_2'})-\varvec{\varTheta }\varvec{D}_3(\varvec{I}-\varvec{P}_{D_2'})()'\}} \end{aligned}$$
(16)

with equality if and only if

$$\begin{aligned} \varvec{B}_2\varvec{D}_2=\varvec{Y}\varvec{P}_{D_2'}-\varvec{A}\varvec{B}_1\varvec{D}_1\varvec{P}_{D_2'}-\varvec{\varTheta }\varvec{D}_3\varvec{P}_{D_2'}, \end{aligned}$$

which, under some full rank conditions on \(\varvec{D}_2\), determines \(\varvec{B}_2\) as a function of \(\varvec{B}_1\) and \(\varvec{\varTheta }\). To show the inequality, we have used

$$\begin{aligned}&(\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2\varvec{D}_2-\varvec{\varTheta }\varvec{D}_3)()'\\&\quad =(\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2\varvec{D}_2-\varvec{\varTheta }\varvec{D}_3) \varvec{P}_{D_2'} (\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2\varvec{D}_2-\varvec{\varTheta }\varvec{D}_3)'\\&\qquad +(\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2\varvec{D}_2-\varvec{\varTheta }\varvec{D}_3) (\varvec{I}-\varvec{P}_{D_2'}) (\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2\varvec{D}_2-\varvec{\varTheta }\varvec{D}_3)'\\&\quad =(\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2\varvec{D}_2-\varvec{\varTheta }\varvec{D}_3)\varvec{P}_{D_2'} (\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{B}_2\varvec{D}_2-\varvec{\varTheta }\varvec{D}_3)'\\&\qquad +(\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{\varTheta }\varvec{D}_3)(\varvec{I}-\varvec{P}_{D_2'}) (\varvec{Y}-\varvec{A}\varvec{B}_1\varvec{D}_1-\varvec{\varTheta }\varvec{D}_3)', \end{aligned}$$

where both terms are positive semi-definite. The density in (16) corresponds to the model:

$$\begin{aligned} \varvec{Y}(\varvec{I}-\varvec{P}_{D_2'})= & {} \varvec{A}\varvec{B}_1\varvec{D}_1(\varvec{I}-\varvec{P}_{D_2'})+\varvec{\varTheta }\varvec{D}_3(\varvec{I}-\varvec{P}_{D_2'})\nonumber \\&+\varvec{E}, \varvec{E}\sim N_{p,n-k_4}(\varvec{0},\varvec{\varSigma }_e,\varvec{I}_{n-k_4}). \end{aligned}$$
(17)

Thus, we have a model which was treated by von Rosen and von Rosen (2017) and any further calculations are not necessary. For notational conveniences, we write the model in (17):

$$\begin{aligned} \varvec{Y}_1= & {} \varvec{A}\varvec{B}_1{{\widetilde{\varvec{D}}}}_1+\varvec{\varTheta }{{\widetilde{\varvec{D}}}}_3+\varvec{E},\qquad \varvec{E}\sim N_{p,n-k_4}(\varvec{0},\varvec{\varSigma }_e,\varvec{I}_{n-k_4}), \end{aligned}$$
(18)

\({{\widetilde{\varvec{D}}}}_1=\varvec{D}_1(\varvec{I}-\varvec{P}_{D_2'})\) and \({{\widetilde{\varvec{D}}}}_3=\varvec{D}_3(\varvec{I}-\varvec{P}_{D_2'})\). Note that \({\mathcal {C}}({{\widetilde{\varvec{D}}}}_3')\subseteq {\mathcal {C}}({{\widetilde{\varvec{D}}}}_1')\) because \({\mathcal {C}}(\varvec{F}')\subseteq {\mathcal {C}}(\varvec{C}_1')\) which is essential for being able to obtain explicit estimators (see von Rosen 1989).

The following theorem presents the estimators of the parameters in the model given by Definition 1.

Theorem 1

Let the model be as in Definition 1and put \(v=\dim {\mathcal {C}}(\varvec{C}_1':\varvec{C}_2':\varvec{Z}')\). Define \(\varvec{\varTheta }_1\) and \(\varvec{\varTheta }_2\) via \(\varvec{\varTheta }=\varvec{\varTheta }_1\varvec{\varTheta }_2\), where \(\varvec{\varTheta }_1\): \(p\times r(\varvec{\varTheta })\) and \(\varvec{\varTheta }_2\): \(r(\varvec{\varTheta })\times k_3\). Moreover, the matrices \(\varvec{D}_1\), \(\varvec{D}_2\), and \(\varvec{D}_3\) are given in (13) and (14), and \(\varvec{Y}_1\), \({{\widetilde{\varvec{D}}}}_1\) and \({{\widetilde{\varvec{D}}}}_3\) are identified by comparing (17) and (18). Then,

  1. (i)

    \({\widehat{\varvec{\varTheta }}}_1=\varvec{H}(\varvec{H}'\varvec{H})^{-1}{{\widetilde{\varvec{F}}}}_1\), where \({{\widetilde{\varvec{F}}}}\) consists of the eigenvectors corresponding to the eigenvalues of the positive definite matrix

    $$\begin{aligned} \varvec{I}_{p-r(A)}-\varvec{H}'\varvec{Y}_1\varvec{P}_{{\widetilde{D}}_3'}\varvec{R}^{-1}\varvec{P}_{\widetilde{D}_3'}\varvec{Y}_1'\varvec{H}, \end{aligned}$$

    where

    $$\begin{aligned} \varvec{R}=\varvec{I}_v+\varvec{P}_{{\widetilde{D}}_3'}\varvec{Y}_1'\varvec{H}\varvec{H}'\varvec{Y}_1\varvec{P}_{\widetilde{D}_3'} \end{aligned}$$

    and \(\varvec{H}\) is defined by

    $$\begin{aligned} \varvec{T}_1'\varvec{S}_2^{-1}\varvec{T}_1=\varvec{H}\varvec{H}', \end{aligned}$$

    where

    $$\begin{aligned} \varvec{S}_1= & {} \varvec{Y}_1(\varvec{I}-\varvec{P}_{{\widetilde{D}}_1'})\varvec{Y}_1',\\ \varvec{T}_1= & {} \varvec{I}-\varvec{A}(\varvec{A}'\varvec{S}_1^{-1}\varvec{A})^-\varvec{A}'\varvec{S}_1^{-1},\\ \varvec{S}_2= & {} \varvec{S}_1+\varvec{T}_1\varvec{Y}_1(\varvec{P}_{{\widetilde{D}}_1'}-\varvec{P}_{\widetilde{D}_3'})\varvec{Y}_1'\varvec{T}_1'; \end{aligned}$$
  2. (ii)

    If \({\mathcal {C}}(\varvec{A})\cap {\mathcal {C}}(\varvec{\varTheta })=\{\varvec{0}\}\),

    $$\begin{aligned} {\widehat{\varvec{\varTheta }}}\widetilde{\varvec{D}}_3={\widehat{\varvec{\varTheta }}}_1{\widehat{\varvec{\varTheta }}}_1'\varvec{T}_1'\varvec{S}_2^{-1}\varvec{Y}_1\varvec{P}_{\widetilde{D}_3'}; \end{aligned}$$

    If additionally \(r({{\widetilde{\varvec{D}}}}_3')=k_3\),

    $$\begin{aligned} {\widehat{\varvec{\varTheta }}}={\widehat{\varvec{\varTheta }}}_1{\widehat{\varvec{\varTheta }}}_1'\varvec{T}_1'\varvec{S}_2^{-1}\varvec{Y}_1\widetilde{\varvec{D}}_3'({{\widetilde{\varvec{D}}}}_3{{\widetilde{\varvec{D}}}}_3')^{-1}; \end{aligned}$$
  3. (iii)

    If \({\mathcal {C}}(\varvec{A})\cap {\mathcal {C}}(\varvec{\varTheta })=\{\varvec{0}\}\),

    $$\begin{aligned} \varvec{A}{\widehat{\varvec{B}}}_1{{\widetilde{\varvec{D}}}}_1=\varvec{A}(\varvec{A}'\varvec{S}_1^{-1}\varvec{A})^-\varvec{A}' \varvec{S}_1^{-1}(\varvec{Y}_1\varvec{P}_{{{\widetilde{\varvec{D}}}}_1'}- {\widehat{\varvec{\varTheta }}}{{\widetilde{\varvec{D}}}}_3); \end{aligned}$$

    If additionally \(r({{\widetilde{\varvec{D}}}}_1)=k_1\) and \(r(\varvec{A})=q_1\),

    $$\begin{aligned} {\widehat{\varvec{B}}}_1=(\varvec{A}'\varvec{S}_1^{-1}\varvec{A})^{-1}\varvec{A}'\varvec{S}_1^{-1} (\varvec{Y}_1{{\widetilde{\varvec{D}}}}_1'({{\widetilde{\varvec{D}}}}_1 {{\widetilde{\varvec{D}}}}_1')^{-1}- {\widehat{\varvec{\varTheta }}}{{\widetilde{\varvec{D}}}}_3{{\widetilde{\varvec{D}}}}_1'({{\widetilde{\varvec{D}}}}_1 {{\widetilde{\varvec{D}}}}_1')^{-1}); \end{aligned}$$
  4. (iv)
    $$\begin{aligned} {\widehat{\varvec{B}}}_2\varvec{D}_2=\varvec{Y}\varvec{P}_{D_2'}-\varvec{A}{\widehat{\varvec{B}}}_1\varvec{D}_1 \varvec{P}_{D_2'}-{\widehat{\varvec{\varTheta }}}\varvec{D}_3\varvec{P}_{D_2'}; \end{aligned}$$

    If additionally \(r(\varvec{D}_2)=k_2\),

    $$\begin{aligned} {\widehat{\varvec{B}}}_2=\varvec{Y}\varvec{D}_2'(\varvec{D}_2\varvec{D}_2')^{-1}-\varvec{A}{\widehat{\varvec{B}}}_1\varvec{D}_1 \varvec{D}_2'(\varvec{D}_2\varvec{D}_2')^{-1}- {\widehat{\varvec{\varTheta }}}\varvec{D}_3\varvec{D}_2'(\varvec{D}_2\varvec{D}_2')^{-1}; \end{aligned}$$
  5. (v)
    $$\begin{aligned} (n-k_4){\widehat{\varvec{\varSigma }}}_e=\varvec{S}_2+{\widehat{\varvec{T}}}_2\varvec{T}_1\varvec{Y}_1\varvec{P}_{\widetilde{D}_3'}\varvec{Y}_1'\varvec{T}_1'{\widehat{\varvec{T}}}_2', \end{aligned}$$

    where

    $$\begin{aligned} {\widehat{\varvec{T}}}_2=\varvec{I}_p-\varvec{T}_1{\widehat{\varvec{\varTheta }}}_1({\widehat{\varvec{\varTheta }}}_1'\varvec{T}_1'\varvec{S}_2^{-1}\varvec{T}_1{\widehat{\varvec{\varTheta }}}_1)^- {\widehat{\varvec{\varTheta }}}_1'\varvec{T}_1'\varvec{S}_2^{-1}; \end{aligned}$$
  6. (vi)
    $$\begin{aligned} {\widehat{\varvec{\varSigma }}}_u=\frac{1}{k_4}\varvec{Y}_0\varvec{Y}_0'-{\widehat{\varvec{\varSigma }}}_e \end{aligned}$$

Proof

The proof follows from von Rosen and von Rosen (2017, Theorem 2.1) and the calculations presented in this section, in particular the model in (18) is utilized. \(\square\)

A few remarks can be made: In (i), \(\varvec{P}_{\widetilde{D}_1'}-\varvec{P}_{{\widetilde{D}}_3'}\) is always positive semi-definite because \({\mathcal {C}}({{\widetilde{\varvec{D}}}}_3')\subseteq {\mathcal {C}}({{\widetilde{\varvec{D}}}}_1')\); the condition \({\mathcal {C}}(\varvec{A})\cap {\mathcal {C}}(\varvec{\varTheta })=\{\varvec{0}\}\) in (ii) and (iii) can always be assumed to hold because \({\mathcal {C}}(\varvec{\varTheta })={\mathcal {C}}(\varvec{\varTheta }_1)\) and any estimator \({\widehat{\varvec{\varTheta }}}_1\) will not be related to \({\mathcal {C}}(\varvec{A})\); the estimator in (vi) is only appropriate if \({\widehat{\varvec{\varSigma }}}_u\) is positive definite; the condition \(r({{\widetilde{\varvec{D}}}}_1)=k_1\) in (iii) is fulfilled if \({\mathcal {C}}(\varvec{D}_1')\cap {\mathcal {C}}(\varvec{D}_2')=\{\varvec{0}\}\).