# Spherical regression models with general covariates and anisotropic errors

- 63 Downloads

## Abstract

Existing parametric regression models in the literature for response data on the unit sphere assume that the covariates have particularly simple structure, for example that they are either scalar or are themselves on the unit sphere, and/or that the error distribution is isotropic. In many practical situations, such models are too inflexible. Here, we develop richer parametric spherical regression models in which the covariates can have quite general structure (for example, they may be on the unit sphere, in Euclidean space, categorical or some combination of these) and in which the errors are anisotropic. We consider two anisotropic error distributions—the Kent distribution and the elliptically symmetric angular Gaussian distribution—and two parametrisations of each which enable distinct ways to model how the response depends on the covariates. Various hypotheses of interest, such as the significance of particular covariates, or anisotropy of the errors, are easy to test, for example by classical likelihood ratio tests. We also introduce new model-based residuals for evaluating the fitted models. In the examples we consider, the hypothesis tests indicate strong evidence to favour the novel models over simpler existing ones.

## Keywords

Angular Gaussian distribution Kent distribution Model selection Residuals Spherical data## 1 Introduction

Spherical data are observations that lie on the unit sphere \({\mathbb {S}}^{p-1} = \left\{ {\mathbf {y}} \in {\mathbb {R}}^p: {\mathbf {y}}^\top {\mathbf {y}} = 1 \right\} \). They arise in many scientific disciplines, including shape analysis, geology and meteorology [e.g. Mardia and Jupp (2000)] and more recently areas as diverse as genome sequence representations and text analysis [e.g. Hamsici and Martinez (2007)]. In this paper, we consider the regression problem in which the data are pairs \(\{\mathbf{x}_i, \mathbf{y}_i \}\), \(i=1, \ldots , n\), involving a \(q \times 1\) covariate vector, \(\mathbf{x}_i\), and a spherical response variable, \(\mathbf{y}_i \in {\mathbb {S}}^{2}\). The aim of regression modelling is to establish how the response variable \({\mathbf {y}}_i\) depends on \({\mathbf {x}}_i\).

Typical parametric regression models currently in use for spherical responses in dimension \(p \ge 3\) are fairly restrictive in the sense that (i) the covariates are assumed to have special structure, e.g. that the covariate is a scalar (such as time) or is itself on the sphere (i.e. a direction); and/or (ii) the models assume isotropic error distributions. Examples of (i) and (ii) in the literature are Chang (1986), Rivest (1989) and Rosenthal et al. (2014), see also Di Marzio et al. (2014) in a nonparametric context. Recent work in regression modelling on general Riemannian manifolds, for which the unit sphere is a special case, includes the nonparametric approach of Lin et al. (2017), who develop local regression models assuming Euclidean covariates, and the semi-parametric approach of Cornea et al. (2017), who use parametric link functions mapping from a general covariate space to the manifold, with a nonparametric error distribution; though in neither is the possibility of anisotropic errors explicitly considered.

The principal contribution of this paper is to introduce parametric regression models for spherical response data that relax both (i) and (ii). The motivation for doing so is that in many applications the covariates do not have the simple structure described in (i), and that there is rarely any basis for assuming a priori that the error distribution is isotropic.

Because they are isotropic, the 3-parameter Fisher and IAG distributions are too restrictive for many applications. Each, however, has a 5-parameter anisotropic generalisation: the Kent (1982) distribution, and the elliptically symmetric angular Gaussian (ESAG) distribution (Paine et al. 2017), respectively. Both the Kent and ESAG distributions have *elliptical* symmetry about the mean direction, that is, they have ellipse-like contours centred on the mean direction. The two extra parameters over their isotropic counterparts control the orientation and eccentricity of the elliptical contours. We describe the Kent and ESAG distributions in more detail in Sect. 2, but here introduce two parametrisations we shall use for each. The first parametrisation we shall consider is in terms of \((\kappa ,\beta ,{\varvec{\Gamma }})\), in which \(\kappa >0\) is a concentration parameter, \(\beta \ge 0\) is an eccentricity parameter, and \({\varvec{\Gamma }}= ({\tilde{{\varvec{\mu }}}} \,\,\, {\varvec{\xi }}_1 \,\, {\varvec{\xi }}_2)\in O(3)\) is an orthogonal matrix (i.e. \({\varvec{\Gamma }}^\top {\varvec{\Gamma }}= {\mathbf {I}}\), where \({\mathbf {I}}\) is the identity matrix), in which \({\tilde{{\varvec{\mu }}}}\) is the mean direction (having 2 degrees of freedom) and \(({\varvec{\xi }}_1, {\varvec{\xi }}_2)\) are the major and minor axes that identify the orientation of the elliptical contours (together having 1 remaining degree of freedom). This parametrisation generalises that of (1).

The second parametrisation we consider, generalising (2), is in terms of a pair of vectors, \({\varvec{\mu }}\in {\mathbb {R}}^3\) and \({\varvec{\gamma }}\in {\mathbb {R}}^2\), in which, as in (2), \({\varvec{\mu }}\) controls the mean direction and concentration; then \({\varvec{\gamma }}\in {\mathbb {R}}^2\) controls eccentricity and orientation of the elliptical contours.

Before giving more details about the parametrisations and models, we briefly discuss some earlier papers on spherical regression. Rivest (1989) considered the case with covariates themselves on the sphere, \({\mathbf {x}}_i \in {\mathbb {S}}^2\), and a Fisher error distribution with the mean direction modelled as \({\tilde{{\varvec{\mu }}}}({\mathbf {x}}_i) = {\mathbf {R}} {\mathbf {x}}_i\), where \({\mathbf {R}}\in \text {SO}(3)\) is a rotation. Rosenthal et al. (2014) replaced the rotation with the “projective linear transformation” (PLT), \({\tilde{{\varvec{\mu }}}}({\mathbf {x}}_i) = {\mathbf {A}} {\mathbf {x}}_i /\Vert {\mathbf {A}} {\mathbf {x}}_i \Vert \), with \({\mathbf {A}} \in \text {SL}(3)\) where \( \text {SL}(3) = \{ \mathbf{A} \in {\mathbb {R}}^{3\times 3}: \det (\mathbf{A}) = 1 \}\) is the special linear group. This is a generalisation of Rivest’s model since \(\text {SL}(3)\) contains \(\text {SO}(3)\). We consider the PLT later, using it to benchmark performance of the new models we introduce.

Besides regression models on the unit sphere, \({\mathbb {S}}^2\), there are several models for regression on the unit circle, \({\mathbb {S}}^1\). Presnell et al. (1998) considers regression on \({\mathbb {S}}^1\) for a general covariate \({\mathbf {x}}_i\), assuming IAG errors. We mention this model in particular because it is a close analogue on \({\mathbb {S}}^1\) of our ESAG2 model on \({\mathbb {S}}^2\) in the isotropic case (which corresponds to \({\varvec{\gamma }}= 0\)), as discussed later. Related work includes the \({\mathbb {S}}^1\) regression model of Fisher and Lee (1992), but this is less relevant to the present paper because it does not generalise conveniently to \({\mathbb {S}}^2\) or higher dimensional spheres; see Mardia and Jupp (2000) for a discussion of this and of the wider context of regression on \({\mathbb {S}}^1\). We also mention a regression model for data on the simplex introduced by Scealy and Welsh (2011). Their approach is to use a “square-root transformation” to map the data from the simplex to the positive orthant of the sphere, then to develop regression models for the transformed data using the Kent distribution. On the sphere, as opposed to the simplex, however, we believe it is especially important to allow what Scealy and Welsh (2011) refer to as \(\mathbf{K}^*\) to depend on regression variables, something that they do not consider due to the fact they focus on transformed compositional data; see the discussion in the concluding section of their paper.

The main goals of this paper are: to explore and compare the modelling Structures 1 and 2; to investigate in the regression context the advantages and disadvantages of the Kent and ESAG distributions as error distributions; and to develop hypothesis tests for the significance of particular covariates, and of anisotropy.

In the following section, we introduce the Kent and ESAG distributions in each of the two parametrisations, then in Sect. 3 we develop the two modelling structures and hypothesis testing procedures. In Sect. 4, we introduce some novel residuals for model fitting diagnostics; then in Sect. 5, we implement the models and methods on various examples involving both synthetic and real data. Code for fitting the models in this paper is available on the second author’s web page.

## 2 Elliptically symmetric distributions on \({\mathbb {S}}^2\)

Here, we give details of the \(({\varvec{\mu }}, {\varvec{\gamma }})\) and \((\kappa , \beta , {\varvec{\Gamma }})\) parametrisations of the Kent and ESAG distributions.

### 2.1 Kent distribution

### Lemma 1

The proof of Lemma 1 is in the “Appendix”.

### 2.2 Elliptically symmetric angular Gaussian (ESAG) distribution

### Lemma 2

Lemma 2 follows directly from substituting (10) and \({\varvec{\mu }}= \kappa {\tilde{{\varvec{\mu }}}}\), with \(\kappa \ge 0\) and \({\tilde{{\varvec{\mu }}}} \in {\mathbb {S}}^2\), into (8). Note that \(\beta =0\) in (10) implies isotropy.

### 2.3 Practical differences between Kent and ESAG distributions

Both the Kent and ESAG distributions have similar characteristics from a modelling perspective: each typically has ellipse-like contours of constant probability density centred on the mean direction in the unimodal case and for different parameter values each has unimodal and bimodal cases. On practical grounds, the two distributions have different advantages and disadvantages. The Kent distribution belongs to the exponential family, and hence, its density, (5), has a simple mathematical form. In comparison, the ESAG density, (8), is rather cumbersome. On the other hand, the ESAG density and likelihood can be computed exactly, whereas the Kent density and likelihood involves a normalising constant, \(C(\kappa , \beta )\) in (5), which is not known in closed form and hence needs to be approximated, by truncating an infinite series (Kent 1982), or else by saddlepoint or holonomic gradient methods (Kume and Sei 2017; Kume et al. 2013). In the present context, we maximise the likelihood for the regression models numerically, so the ESAG likelihood having a cumbersome form is no drawback, and the fact that it can be computed exactly is an advantage. For simulation, the Kent distribution requires a rejection algorithm (Kent et al. 2018), whereas ESAG can be simulated quickly and easily. Fast simulation is especially helpful in simulation-heavy inference procedures, e.g. the parametric bootstrap.

## 3 Regression model structures

In this section, we specify the two model structures in (3) and (4) and then discuss the advantages and disadvantages of each. It is assumed throughout the paper that the first element of \({\mathbf {x}}_i\) is 1, which is analogous to the inclusion in linear modelling of an “intercept term”. For Structure 2 models, see (4), this means that the simpler model of \(\{ {\mathbf {y}}_i \}\) being IID, i.e. not depending on the covariates, is nested in the general regression model and this is helpful for testing the significance of regression. The motivation for including the intercept term is less clear-cut a priori for Structure 1 models, see (3), though empirical results, for example in Table 2 later, suggest there is sometimes a benefit from doing so.

Each model structure is defined in terms of a preliminary orthogonal transformation, \({\mathbf {Q}}\). For Structure 1 models, \({\mathbf {Q}}\) is assumed to be a population quantity, defined explicitly in the “Appendix”, and estimated by a sample version \(\hat{{\mathbf {Q}}}\). For Structure 2 models, \({\mathbf {Q}}\) is treated as a tuning parameter and optimised with respect to. These preliminary transformations are needed so that desirable invariance and equivariance properties, discussed in the “Appendix”, hold when an arbitrary orthogonal transformation is applied to the \({\mathbf {y}}_i\).

### 3.1 Structure 1: \({\mathbf {Q}}^\top {\mathbf {y}}_i \sim H(\kappa , \beta , {\varvec{\Gamma }}({\mathbf {x}}_i))\)

*q*parameter matrix \({\mathbf {B}} = \left( {\varvec{\beta }}_1, {\varvec{\beta }}_2, {\varvec{\beta }}_3, {\varvec{\beta }}_4 \right) ^\top \).

### 3.2 Structure 2: \({\mathbf {Q}}^\top {\mathbf {y}}_i \sim H({\varvec{\mu }}({\mathbf {x}}_i),{\varvec{\gamma }}({\mathbf {x}}_i))\)

*q*-dimensional domain of the \(\{{\mathbf {x}}_i\}\) to \({\mathbb {R}}^3\) and \({\mathbb {R}}^2\), respectively. Here, we limit attention to linear functions,

*q*matrix \(\mathbf{B} = \left( \mathbf{B}_1^\top , \mathbf{B}^\top _2 \right) ^\top \), where the influence of the subsets of parameters can be clearly distinguished: \({\mathbf {B}}_1\) controls the influence of the covariates, via \({\varvec{\mu }}\), on the concentration and mean direction; and \({\mathbf {B}}_2\) controls influence, via \({\varvec{\gamma }}\), on the degree and orientation of anisotropy. This leads to natural tests, e.g. for anisotropy, discussed below.

Unlike in Structure 1, in which model (16) is naturally tied to the particularly defined \({\mathbf {Q}}\), for Structure 2 and model (19) there is no a priori reason to select a particular \({\mathbf {Q}} \in O(3)\); hence, we treat \({\mathbf {Q}}\) as a tuning parameter, seeking to maximise the likelihood of the data \(\left\{ {\mathbf {Q}}^\top {\mathbf {y}}_i \right\} \) over \(\left\{ {\mathbf {Q}}, {\mathbf {B}} \right\} \). A practical way to do so at least approximately is via a brute-force search for \({\mathbf {Q}}\) over *O*(3), for each value of \({\mathbf {Q}}\) on a grid over *O*(3) computing the maximum likelihood estimator \(\hat{{\mathbf {B}}}\) of \({\mathbf {B}}\), then selecting the pair \(\left\{ {\mathbf {Q}}, \hat{{\mathbf {B}}} \right\} \) corresponding to the largest maximised likelihood. In this paper, when comparing models for a particular data set, we compute \({\mathbf {Q}}\) for the most general ESAG2 model and keep this \({\mathbf {Q}}\) fixed for submodels and Kent2 models.

A helpful property proved by Presnell et al. in the circular case is that the log-likelihood function is a concave function of the regression parameters—in our notation \({\mathbf {B}}_1\)—that determine \({\varvec{\mu }}\); this guarantees that the MLE of \({\mathbf {B}}_1\) is unique and easily determined by numerical optimisation. The corresponding result holds for ESAG2 (20) in the \(p=3\) case with isotropic errors, i.e. \({\mathbf {B}}_2 = {\mathbf {0}}\), as follows [in which \({\text {vec}}\) is the standard vectorisation operator; see e.g. Mardia et al. (1979)].

### Proposition 1

The proof of this Proposition is given in the “Appendix”.

### 3.3 Tests for the significance of anisotropy and regression

*j*th column of \({\mathbf {B}}\) and corresponds to the covariate appearing as the

*j*th element of \({\mathbf {x}}_i\). A test of the significance of this particular covariate corresponds to a test with null and alternative hypotheses

*T*, to the \(\chi ^2_\nu \) distribution. An alternative possibility, preferable when

*n*is insufficiently large for the null asymptotic distribution (21) to be reasonable, is to approximate the null distribution using a bootstrap procedure.

*j*th element of \({\mathbf {x}}_i\), a test that the covariate is significant in \({\varvec{\gamma }}\) corresponds to the hypotheses

## 4 Residuals for model diagnostics

For spherical regression models, there are many possible ways to define a residual. Here, we describe some general spherical residuals defined by Jupp (1988) before defining some particular model-based residuals for regression models with ESAG and Kent errors.

*i*, and \({\mathbf {R}}(\hat{{\mathbf {y}}}_i,{\mathbf {y}}_0)\) is a rotation from \(\hat{{\mathbf {y}}}_i\) to \({\mathbf {y}}_0\), where \({\mathbf {R}}(\cdot ,\cdot )\) does not depend on

*i*. Then, the \({\mathbf {s}}_1, \ldots , {\mathbf {s}}_n\) lie in the plane tangent to the sphere at \({\mathbf {y}}_0\). Let \({\varvec{\zeta }}_1, {\varvec{\zeta }}_2\) be an arbitrary pair of unit vectors orthogonal to each other and to \({\mathbf {y}}_0\), then a plot of the projected residuals

Results from fitting various models to the synthetic data, which were generated from model *M*\(_1\) with H taken to be ESAG, \(n=41\), and using parameters described in Sect. 5.1

Model | Model for \({\mathbf {y}}_i\) | Log-lik. ESAG | Log-lik. Kent | ( | |
---|---|---|---|---|---|

Structure 2 | |||||

| \(\text {H}({\mathbf {B}}_1 {\mathbf {x}}_i,{\mathbf {B}}_2 {\mathbf {x}}_i) \) | 91.7 | 81.5 | (10) | |

| \(\text {H}({\mathbf {B}}_1 {\mathbf {x}}_i, {\varvec{\gamma }}) \) | 63.5 | 62.3 | (8) | |

| \(\text {H}({\mathbf {B}}_1 {\mathbf {x}}_i, {\mathbf {0}}) \) | Isotropic errors | 28.5 | 28.0 | (6) |

| \(\text {H}({\varvec{\mu }},{\varvec{\gamma }}) \) | IID observations | 9.6 | 10.2 | (5) |

Structure 1 | |||||

| \(\text {H}(\kappa , \beta , {\varvec{\Gamma }}({\mathbf {x}}_i)) \) | 89.7 | 88.9 | (10) | |

| \(\text {H}(\kappa , \beta , {\varvec{\Gamma }}({\mathbf {x}}_i)), {\varvec{\beta }}_4 = {\mathbf {0}} \) | 88.6 | 87.6 | (8) | |

| \(\text {H}(\kappa , 0, {\varvec{\Gamma }}({\mathbf {x}}_i)) \) | Isotropic errors | 29.2 | 28.9 | (7) |

| \(\text {H}(\kappa , \beta , {\varvec{\Gamma }}) \) | IID observations | 2.5 | 2.8 | (2) |

## 5 Applications

Here, we consider three applications, in each investigating the spherical regression models towards different statistical goals.

The first involves a simulated data set with a scalar covariate, \(t\in {\mathbb {R}}\). We exploit having a simple data-generating model to illustrate the flexibility within this regression framework for the mean direction and dispersion to depend on the covariate; to investigate the performance of hypothesis tests in detecting anisotropy and regression; and to compare Jupp and \({\varvec{\eta }}\)-residuals in the special setting where the model being fitted is the true one.

The second data set concerns the movement of clouds between two consecutive days. The cloud shapes are represented by landmarks spaced around the cloud outline, and the position of these landmarks is regressed on their positions the previous day. This data set has been considered previously in the context of spherical–spherical regression models with isotropic errors (Rosenthal et al. 2014); hence, it makes for an interesting comparison with the more general framework developed in this paper.

The third data set is derived from vectorcardiogram measurement of heart activity in children. These data too have been studied in the context of spherical–spherical isotropic regression (Chang 1986), but with the non-spherical covariates disregarded. The primary goal is inference, to understand which covariates are significantly related to the response. The framework of the present paper enables us to incorporate easily the additional non-spherical covariates, as well as anisotropic errors, and furthermore then to test formally whether such generalisations are warranted by the data.

### 5.1 Simulated data set (involving a scalar covariate)

We can use the inference procedures described in Sect. 3.3 to test for significance of anisotropy and regression. Table 1 shows the maximised log-likelihood for the true model, \(M_1\), and some different models involving various combinations of the two model structures and two error distributions. Using Wilks’ statistic and the null asymptotic \(\chi ^2\) approximation (21) to compare \(M_1\) with each of models \(M_2\), \(M_3\), \(M_4\) with errors assumed to be ESAG results in *p* values \(<10^{-5}\) in each case, indicating very strong evidence to favour the data-generating model over the simpler alternatives, which include the isotropic (\(M_3\)) and IID (\(M_4\)) models. When Kent errors are assumed for the fitted model, i.e. in contrast to the ESAG errors used in generating the data, the statistical conclusions (and even to some extent the numerical values of the maximised log-likelihoods) are very robust to this misspecification. This is probably a consequence of how similar the ESAG and Kent densities are in practise, especially if the concentration is reasonably high. The table also shows the results of fitting Structure 2 models \(M_5\)–\(M_8\) to the Structure 1-generated data. Here, model \(M_5\) is not favoured strongly over \(M_6\), in contrast to how \(M_1\) is strongly favoured over \(M_2\). The explanation is that models \(M_2\) and \(M_6\) are only loosely analogous as submodels of \(M_1\) and \(M_5\), respectively. A major difference is that \(M_2\) cannot capture the way the orientation of the anisotropy substantially depends on the covariate, because \({\varvec{\gamma }}\) does not depend on the covariate, whereas \(M_6\) can still do so via \({\mathbf {R}}({\mathbf {x}}_i)\) even when \({\mathbf {S}}({\mathbf {x}}_i)\) is fixed to be the identity matrix. The conclusion to reject isotropy (\(M_7\)) and the assumption of IID data (\(M_8\)) in favour of \(M_5\) are both robust to the model misspecification.

### 5.2 Cloud formation data (involving a spherical covariate)

These data involve 29 landmarks spaced around the outline of a cloud to represent its shape on each of two consecutive days, 4th and 5th Sept 2012. The data, see Fig. 2, are from NASA’s Visible Earth project [with original cloud images from XPlanet (2018)] and were used as an application by Rosenthal et al. (2014) in assessing accuracy of their PLT model, albeit with a focus on prediction rather than inference. The goal is to regress the landmarks \(\{{\mathbf {y}}_i\}_{i=1}^{29}\) for the second day on those \(\{{\mathbf {u}}_i\}_{i=1}^{29}\) of the first. We hence define a covariate vector, including “intercept”, as \({\mathbf {x}}_i = \left( 1, \, {\mathbf {u}}_i^\top \right) ^\top \).

Model | \(\mathbf{A}_\text {R}\) params set to zero | \(\mathbf{A}_\text {S}\) params set to zero | Log-lik. ESAG1 | Log-lik Kent1 | ( |
---|---|---|---|---|---|

| − | − | 32.12 | 27.46 | (26) |

| − | \(\beta ^G_2 \) | 31.79 | 27.45 | (25) |

| − | \(\beta ^A_2 \) | 31.22 | 26.70 | (25) |

| − | \(\beta ^G_2, \beta ^A_2\) | 28.78 | 25.34 | (24) |

| \({\pmb \beta }^G_1\) | \( \beta ^G_2 \) | 31.79 | 27.45 | (22) |

| \({\pmb \beta }^A_1\) | \(\beta ^A_2 \) | 31.20 | 26.70 | (22) |

| \({\pmb \beta }^G_1, {\pmb \beta }^A_1\) | \(\beta ^G_2, \beta ^A_2\) | 28.78 | 25.34 | (18) |

| − | \( \mathbf{B}_2^u, \beta ^G_2, \beta ^A_2\) | 30.81 | 26.26 | (21) |

| \({\pmb \beta }^G_1\) | ” | 30.80 | 26.24 | (18) |

| \({\pmb \beta }^A_1\) | ” | 30.06 | 25.47 | (18) |

| \( {\pmb \beta }^G_1, {\pmb \beta }^A_1\) | ” | 28.03 | 24.18 | (15) |

| − | \( \beta ^0_2, \mathbf{B}_2^u, \beta ^G_2, \beta ^A_2\) | 30.79 | 26.12 | (20) |

| \( {\pmb \beta }^G_1\) | ” | 28.68 | 24.87 | (17) |

| \( {\pmb \beta }^A_1\) | ” | 29.95 | 25.15 | (17) |

| \( {\pmb \beta }^G_1, {\pmb \beta }^A_1\) | ” | 28.00 | 23.94 | (14) |

| − | \(\beta = 0\) (isotropic errors) | 10.26 | 4.11 | (18) |

Results for Structure 2 models and submodels fitted to the vectorcardiogram data

Model | \({\varvec{\mu }}\) params set to zero | \({\varvec{\gamma }}\) params set to zero | Log-lik. ESAG2 | Log-lik Kent2 | ( |
---|---|---|---|---|---|

| − | − | 54.88 | 50.54 | (30) |

| − | \({\pmb \beta }^G_2 \) | 50.04 | 47.65 | (28) |

| − | \({\pmb \beta }^A_2 \) | 54.56 | 50.43 | (28) |

| − | \({\pmb \beta }^G_2, {\pmb \beta }^A_2\) | 49.80 | 47.58 | (26) |

| \({\pmb \beta }^G_1\) | \({\pmb \beta }^G_2 \) | 46.78 | 45.35 | (25) |

| \({\pmb \beta }^A_1\) | \({\pmb \beta }^A_2 \) | 48.23 | 42.81 | (25) |

| \({\pmb \beta }^G_1,{\pmb \beta }^A_1\) | \({\pmb \beta }^G_2, {\pmb \beta }^A_2\) | 43.23 | 38.93 | (20) |

| − | \(\mathbf{B}_2^u, {\pmb \beta }^G_2, {\pmb \beta }^A_2\) (\({\varvec{\gamma }}= \text {const})\) | 33.64 | 28.77 | (20) |

| \({\pmb \beta }^G_1\) | ” | 30.94 | 25.96 | (17) |

| \({\pmb \beta }^A_1\) | ” | 29.38 | 22.99 | (17) |

| \({\pmb \beta }^G_1, {\pmb \beta }^A_1\) | ” | 26.16 | 19.87 | (14) |

| − | \( {\pmb \beta }^0_2,\mathbf{B}_2^u, {\pmb \beta }^G_2, {\pmb \beta }^A_2\) (isotropic error) | 25.18 | 15.38 | (18) |

| \({\pmb \beta }^G_1\) | ” | 20.48 | 10.66 | (15) |

| \({\pmb \beta }^A_1\) | ” | 21.06 | 12.56 | (15) |

| \({\pmb \beta }^G_1, {\pmb \beta }^A_1\) | ” | 16.60 | 7.85 | (12) |

The residuals of the fitted ESAG2 model, shown in Fig. 2b, show a small amount of serial correlation (points 21–27), but otherwise little to suggest the model is poorly fitting.

### 5.3 Vectorcardiogram data (involving a mixed-type covariate)

This data set was considered by Chang (1986) in the context of his spherical–spherical regression models. Here, our more general model enables incorporation of other covariates, and of anisotropic errors.

The data themselves are derived from vectorcardiogram measurements of the electrical activity of the heart of children of different ages and genders. The vectorcardiogram involves three leads being connected to the torso produce a time-dependent vector that traces approximately closed curves, each representing a heartbeat cycle, in \({\mathbb {R}}^3\). Sometimes used as a summary for clinical diagnosis is a unit vector defined as the directional component of the vector at a particular extremum across the cycles. The data comprise such unit vectors derived from data for two different lead placement systems, the Frank system (\({\mathbf {y}}_i \in {\mathbb {S}}^2\)) and for the McFee system (\({\mathbf {u}}_i \in {\mathbb {S}}^2\)), for each of 98 children of different ages and gender. Age is represented by a binary variable \(A_i \in \{0,1\}\) (0 meaning aged 2–10 years, and 1 meaning aged 11–18 years) and gender by a variable \(G_i \in \{0,1\}\) (0 for a boy, and 1 for a girl). We aim to regress \({\mathbf {y}}_i\) on the other variables, and hence take the covariate to be \({\mathbf {x}}_i = \left( 1, \, {\mathbf {u}}_i^\top , \, A_i, \, G_i \right) ^\top \), for \(i=1, \ldots , 98\).

Tables 2 and 3, respectively, show the results of fitting Structure 1 and 2 models, and several submodels, to the vectorcardiogram data. Within each table, each of the submodels is nested within \(M_1\), and some of the submodels are further nested within each other. Pairwise comparisons of relevant nested models using likelihood ratio tests described in Sect. 3.3 at 5% level suggest that for both ESAG1 and Kent1 the preferred model is \(M_{15}\). This suggests that Structure 1 is poor for characterising how the response depends on the covariates for this application, to the extent that there is little benefit to retaining the covariates in the model. In contrast, for both ESAG2 and Kent2 the preferred model is \(M_3\), which retains all of the covariates.

Figure 3b shows \({\varvec{\eta }}\)-residuals for ESAG2 models \(M_3\) and \(M_{15}\). For \(M_3\), which is the preferred model, the residuals are reasonably consistent with \(N_2({\mathbf {0}},{\mathbf {I}})\) scatter. For \(M_{15}\), which assumes isotropic errors and neglects the age and gender covariates, the scatter appears less isotropic, and there are slight differences in the scatter according to age and gender, consistent with there being residual variation due to these factors not being incorporated in the model.

## 6 Conclusions

The regression models we have introduced are rather more general than existing regression models in the literature, allowing covariates with general structure, and errors that are nonisotropic. We have also introduced novel model-based residuals that enable simple visual diagnostics to check fitted models, to identify for example any residual structure dependent on a covariate, any serial correlation or any outliers, and to explore adequacy of the error models.

For the anisotropic error model, there is little to choose on statistical grounds between using Kent or ESAG, though we have found occasions for models based on the Kent that the likelihood function is harder to maximise numerically (perhaps owing to roughness in the likelihood approximation arising from approximating the normalising constant). Models based on ESAG are free from this issue, and the computation of the ESAG likelihood is much faster. Of the two model structures we considered, models with Structure 2 tended to perform better; such models are also simpler and enable the influence of particular covariates to be related more directly to the response variable. On the foregoing grounds, ESAG2 models are our preferred ones.

The likelihood framework in which we have developed the models makes it very easy to use classical methods to compare nested models of different complexity, in particular to test hypotheses about significance of regression or the anisotropy of errors. Indeed, applying such tests for the examples considered provides strong support that the regression modelling generalisations we have developed are warranted.

## Notes

## References

- Chang, T.: Spherical regression. Ann. Stat.
**14**, 907–924 (1986)MathSciNetCrossRefzbMATHGoogle Scholar - Cornea, E., Zhu, H., Kim, P., Ibrahim, J.G.: Regression models on Riemannian symmetric spaces. J. R. Stat. Soc. Ser. B
**79**, 463–482 (2017)MathSciNetCrossRefGoogle Scholar - Di Marzio, M., Panzera, A., Taylor, C.: Nonparametric regression for spherical data. J. Am. Stat. Assoc.
**109**, 748–763 (2014)MathSciNetCrossRefzbMATHGoogle Scholar - Fisher, N.I., Lee, A.J.: Regression models for angular response. Biometrics
**48**, 665–677 (1992)MathSciNetCrossRefGoogle Scholar - Hamsici, O.C., Martinez, A.M.: Spherical-homoscedastic distributions: the equivalency of spherical distributions and normal distributions in classification. J. Mach. Learn. Res.
**8**, 1583–1623 (2007)MathSciNetzbMATHGoogle Scholar - Jupp, P.E.: Residuals for directional data. J. Appl. Stat.
**15**, 137–147 (1988)CrossRefGoogle Scholar - Kent, J.T.: The Fisher–Bingham distribution on the sphere. J. R. Stat. Soc. Ser. B
**44**, 71–80 (1982)MathSciNetzbMATHGoogle Scholar - Kent, J.T., Ganeiber, A.M., Mardia, K.V.: A new unified approach for the simulation of a wide class of directional distributions. J. Comput. Graph. Stat.
**27**, 291–301 (2018)MathSciNetCrossRefGoogle Scholar - Kume, A., Sei, T.: On the exact maximum likelihood inference of Fisher–Bingham distributions using an adjusted holonomic gradient method. Stat. Comput.
**28**, 835–847 (2018)MathSciNetCrossRefzbMATHGoogle Scholar - Kume, A., Preston, S.P., Wood, A.T.A.: Saddlepoint approximations for the normalising constant of Fisher–Bingham distributions on products of spheres and Stiefel manifolds. Biometrika
**100**, 971–984 (2013)MathSciNetCrossRefzbMATHGoogle Scholar - Lin, L., St Thomas, B., Zhu, H., Dunson, D.B.: Extrinsic local regression on manifold-valued data. J. Am. Stat. Assoc.
**112**, 1261–1273 (2017)MathSciNetCrossRefGoogle Scholar - Mardia, K.V., Jupp, P.E.: Directional Statistics. Wiley, Chichester (2000)zbMATHGoogle Scholar
- Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979)zbMATHGoogle Scholar
- Paine, P.J., Preston, S.P., Tsagris, M., Wood, A.T.A.: The elliptically symmetric angular Gaussian distribution. Stat. Comput.
**28**, 689–697 (2017)MathSciNetCrossRefzbMATHGoogle Scholar - Presnell, B., Morrison, S.P., Littel, R.C.: Projected multivariate linear models for directional data. J. Am. Stat. Assoc.
**93**, 1068–1077 (1998)MathSciNetCrossRefzbMATHGoogle Scholar - Rivest, L.-P.: Spherical regression for concentrated Fisher–von Mises distributions. Ann. Stat.
**17**, 307–317 (1989)MathSciNetCrossRefzbMATHGoogle Scholar - Rosenthal, M., Wei, W., Klassen, E., Srivastava, A.: Spherical regression models using projective linear transformations. J. Am. Stat. Assoc.
**109**, 1615–1624 (2014)MathSciNetCrossRefzbMATHGoogle Scholar - Scealy, J.L., Welsh, A.H.: Regression for compositional data by using distributions defined on the hyper-sphere. J. R. Stat. Soc. Ser. B
**73**, 351–375 (2011)MathSciNetCrossRefGoogle Scholar - Wang, F., Gelfand, A.E.: Directional data analysis under the general projected normal distribution. Stat. Methodol.
**10**, 113–127 (2013)MathSciNetCrossRefzbMATHGoogle Scholar - XPlanet: Real time cloud map. http://xplanet.sourceforge.net/clouds.php. Accessed 28 June 2018

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.