# An elliptically symmetric angular Gaussian distribution

- 1.1k Downloads

## Abstract

We define a distribution on the unit sphere \(\mathbb {S}^{d-1}\) called the elliptically symmetric angular Gaussian distribution. This distribution, which to our knowledge has not been studied before, is a subfamily of the angular Gaussian distribution closely analogous to the Kent subfamily of the general Fisher–Bingham distribution. Like the Kent distribution, it has ellipse-like contours, enabling modelling of rotational asymmetry about the mean direction, but it has the additional advantages of being simple and fast to simulate from, and having a density and hence likelihood that is easy and very quick to compute exactly. These advantages are especially beneficial for computationally intensive statistical methods, one example of which is a parametric bootstrap procedure for inference for the directional mean that we describe.

### Keywords

Angular Gaussian Bootstrap Kent distribution Spherical distribution## 1 Introduction

A natural way to define a distribution on the unit sphere \(\mathbb {S}^{d-1}\) is to embed \(\mathbb {S}^{d-1}\) in \(\mathbb {R}^d\), specify a distribution for a random variable \(z \in \mathbb {R}^d\), then consider the distribution of *z* either conditioned to lie on, or projected onto, \(\mathbb {S}^{d-1}\). The general Fisher–Bingham and angular Gaussian distributions, defined respectively in Mardia (1975) and Mardia and Jupp (2000) can both be constructed this way by taking *z* to be multivariate Gaussian in \(\mathbb {R}^d\). Then the Fisher–Bingham distribution is the conditional distribution of *z* conditioned on \(\Vert z\Vert = 1\), and the angular Gaussian is the distribution of the projection \(z/ \Vert z\Vert \). The choice of the mean, \({\mu }\), and covariance matrix, \(V\), of *z* controls the concentration and the shape of the contours of the induced probability density on \(\mathbb {S}^{d-1}\).

It is usually not practical to work with the general Fisher–Bingham or angular Gaussian distributions, however, because they have too many free parameters to be identified well by data. This motivates working instead with subfamilies that have fewer free parameters and stronger symmetries.

In the spherical case, \(d=3\), the general distributions have 8 free parameters. Respective subfamilies with 3 free parameters are the Fisher and the isotropic angular Gaussian (IAG) distributions. Both are “isotropic” in the sense that they are rotationally symmetric about the mean direction, i.e., contours on the sphere are small circles centred on the mean direction. Respective subfamilies with 5 free parameters are the Bingham and the central angular Gaussian distributions, both of which are antipodally symmetric.

An important member of this Fisher–Bingham family is the Kent distribution (Kent, 1982). For \(d=3\), it has 5 free parameters, and it has ellipse-like contours on the sphere. This offers a level of complexity well suited to many applications, since the distribution is flexible enough to model anisotropic data yet its parameters can usually be estimated well from data. To our knowledge, nobody to date has considered its analogue in the angular Gaussian family. The purpose of this paper is to introduce such an analogue, which we call the elliptically symmetric angular Gaussian (ESAG), and establish some of its basic properties.

The motivation for doing so is that in some ways the angular Gaussian family (and hence ESAG) is much easier to work with than the Fisher–Bingham family (and hence the Kent distribution). In particular, simulation is easy and fast, not requiring rejection methods (which are needed for the Fisher–Bingham family Kent et al. 2013), and the density is free of awkward normalising constants, so the likelihood can be computed quickly and exactly. Hence in many modern statistical settings the angular Gaussian family is the more natural choice; see for example Presnell et al. (1998) who use it in a frequentist approach for circular data, and Wang and Gelfand (2013) and Hernandez-Stumpfhauser et al. (2017) who use it in Bayesian approaches for circular and spherical data, respectively.

In the following section, we introduce ESAG, first for general *d* before specialising to the case \(d=3\).

## 2 The elliptically symmetric angular Gaussian distribution (ESAG)

### 2.1 The general angular Gaussian distribution

*V*, assumed non-singular, and where |

*V*| denotes the determinant of

*V*. Then, writing \(z=ry\), where \(r=\Vert z\Vert =(z^\top z)^{1/2}\) and \(y=z/\Vert z\Vert \in \mathcal {S}^{d-1}\), and using \(\mathrm {d}z=r^{d-1} \mathrm {d} r\, \mathrm {d} y\), where \(\mathrm {d} y\) denotes Lebesgue, or geometric, measure on the unit sphere \(\mathcal {S}^{d-1}\), and integrating

*r*over \(r>0\), leads to

### 2.2 An elliptically symmetric subfamily

*V*has a unit eigenvalue. If the other eigenvalues are

*V*can be written

*d*parameters in \({\mu }\) are fixed, then from (5) and (6) there are \(d-2\) remaining degrees of freedom for the eigenvalues of

*V*, and \(d(d-1)/2 - (d-1)\) degrees of freedom for its unit eigenvectors. The total number of free parameters is thus \((d-1)(d+2)/2\), the same as for the multivariate normal in a tangent space \(\mathbb {R}^{d-1}\) to the sphere.

*V*. Without loss of generality, suppose that the eigenvectors are parallel to the coordinates axes; that is, each element of the vector \(\xi _j\) equals 0 except the

*j*th which equals 1. Then if \(y=(y_1, \ldots , y_d)^\top \),

### Proposition 1

The proof of Proposition 1 is given in Appendix A. 2.

*i*, where \(\bar{y}^\text {max} = (1/42) \sum _i y_i^\text {max}\). In cases identified as non-unimodal by this criterion, we used

*k*-means clustering to identify \(k=2\) clusters; in each such case, every \(y_i^\text {max}\) was within a distance \(10^{-6}\) of its cluster centre indicating bimodality. In agreement with the conjecture, amongst the \(9^3 = 729\) parameter cases we considered, in every 553 cases with \(\rho _{d-1}\le \mathcal {H}_d(\alpha )\), the foregoing procedure identified the distribution to be unimodal, and in every 176 cases with \(\rho _{d-1} > \mathcal {H}_d(\alpha )\), it identified the distribution to be bimodal.

### Proposition 2

### Remark 1

In the general case, we replace the coordinate vectors \(\xi _1, \ldots , \xi _d\) by an arbitrary orthonormal basis, and then the limit distribution lies in the vector subspace spanned by \(\xi _1, \ldots , \xi _{d-1}\).

### Remark 2

Proposition 2 is noteworthy because it is atypical for high-concentration limits within the angular Gaussian family to be Gaussian.

### 2.3 A parameterisation of ESAG for \(d=3\)

An important practical question is how to specify a convenient parameterisation for the matrix *V* so that it satisfies the constraints (5) and (6). With \(d=3\), such a *V* has two free parameters.

### Lemma 1

*y*with ESAG distribution, we will write \(y \sim {\text {ESAG}}({\mu }, {\gamma })\). The rotationally symmetric isotropic angular Gaussian corresponds to

### Remark 3

(Simulation.) To simulate \(y \sim \text {ESAG}(\mu , \gamma )\), simulate \(z \sim N(\mu , V)\) where \(V=V(\mu , \gamma )\) is defined in (18) then set \(y = z / \Vert z \Vert \).

### Remark 4

(A test for rotational symmetry.) For a sample of observations \(y_1,\dots ,y_n\) assumed independent and identically distributed, a standard large-sample likelihood ratio test can be used to test \(H_0: y_i \sim \text {IAG}({\mu })\) vs \(H_1: y_i \sim \text {ESAG}({\mu },{\gamma })\). Let \(\hat{l}_0\) and \(\hat{l}_1\) be the values of the maximised log-likelihoods under \(H_0\) and \(H_1\), respectively. The models are nested and differ by two degrees of freedom, and by Wilks’ theorem, when *n* is large the statistic \(T = 2(\hat{l}_1 - \hat{l}_0)\) has approximately a \(\chi ^2_2\) distribution if \(H_0\) is true, and \(H_0\) is rejected for large values of *T*. The null distribution can alternatively be approximated using simulation by the parametric bootstrap, that is, by simulating a sample of size *n* from the null model \(H_0\) at the maximum likelihood estimate of the parameters, computing the test statistic, *T*, and then repeating this a large number, say *B*, times. The empirical distribution of the resulting bootstrapped statistics \(T_1^*,\dots ,T_B^*\) approximates the null distribution of *T*.

### Remark 5

Often in applications, there is particular interest in the directional mean, \(m = {\mu }/ \Vert {\mu }\Vert \). A parametric bootstrap procedure to construct confidence regions for *m*, which exploits both the ease of simulation and the parameter orthogonality, is as follows.

### 2.4 Parametric bootstrap confidence regions for \(m = {\mu }/ \Vert {\mu }\Vert \)

is suitable for defining confidence regions for *m* as \(\left\{ m \in \mathbb {S}^2: \, T(m) \le c \right\} \); see Fisher et al. (1996) for discussion of test statistics of this form in the context of non-parametric bootstrap procedures. For a given significance level, \(\alpha \), the constant *c* can be determined as follows: simulate *B* bootstrap samples each of size *n* from ESAG(\(\hat{{\mu }},\hat{{\gamma }}\)), and hence with \(m = \hat{m}\), and for each sample compute the test statistic (19), with \(\hat{\xi }, \hat{{\mu }}, \hat{\Sigma }\) replaced by corresponding quantities calculated from the bootstrap sample; then *c* is the \((1-\alpha )\) quantile of the resulting statistics \(T_1^*(\hat{m}), \dots , T_B^*(\hat{m})\). Examples of confidence regions calculated by this algorithm are shown in Fig. 2 (right).

### 2.5 An example: estimates of the historic position of Earth’s magnetic pole

*p*-value of less than \(10^{-3}\), indicating strong evidence in favour of ESAG over the IAG.

Results of simulation study for fitting the ESAG and Kent distributions to both ESAG and Kent simulated data

Data: ESAG | Data: Kent | |
---|---|---|

\(\mu = (0,0,1.18)^\top \) | \(\Gamma = \mathcal {I}_3\), | |

\( \gamma = (0.29,0)\) | \(\kappa = 2.16\), \(\beta = 0.5\). | |

Sim. time (s) | 2.1799 | 28.456 |

Fit: ESAG | ||

Fit time (s) | 1431.8 | 1493.2 |

Error (\(\hat{m}\)) | 0.1127 | 0.1290 |

Error (\(\hat{\xi }_1\)) | 0.0988 | 0.3017 |

Error (\(\hat{\xi }_2\)) | 0.1248 | 0.3051 |

Fit: Kent | ||

Fit time (s) | 13896.7 | 13860.2 |

Error (\(\hat{m}\)) | 0.1302 | 0.1289 |

Error (\(\hat{\xi }_1\)) | 0.0992 | 0.2901 |

Error (\(\hat{\xi }_2\)) | 0.1248 | 0.2938 |

Data: ESAG | Data: Kent | |
---|---|---|

\(\mu = (0,0,2.6)^\top \) | \(\Gamma = \mathcal {I}_3\), | |

\( \gamma = (0.53,0)\) | \(\kappa = 7.38\), \(\beta = 1.34\). | |

Sim. time (s) | 3.2818 | 37.618 |

Fit: ESAG | ||

Fit time (s) | 1867.4 | 1834.3 |

Error\( (\hat{m} ) \) | 0.0584 | 0.0563 |

Error\( (\hat{\xi }_1 )\) | 0.116 | 0.1487 |

Error\( (\hat{\xi }_2 ) \) | 0.122 | 0.1522 |

Fit Kent | ||

Fit time (s) | 23922.1 | 24191.5 |

Error\( (\hat{m} ) \) | 0.0587 | 0.0562 |

Error\( (\hat{\xi }_1 ) \) | 0.1248 | 0.1339 |

Error\( (\hat{\xi }_2 ) \) | 0.1261 | 0.1379 |

Data: ESAG | Data: Kent | |
---|---|---|

\(\mu = (0,0,3.8)^\top \) | \(\Gamma = \mathcal {I}_3\), | |

\( \gamma = (1.3,0)\) | \(\kappa = 20.61\), \(\beta = 8.9\). | |

Sim. time (s) | 1.6446 | 39.4057 |

Fit: ESAG | ||

Fit time (s) | 2222.3 | 2203.4 |

Error\( (\hat{m})\) | 0.0428 | 0.0405 |

Error\( (\hat{\xi }_1)\) | 0.0453 | 0.0534 |

Error\( ( \hat{\xi }_2 )\) | 0.0581 | 0.0625 |

Fit: Kent | ||

Fit time (s) | 32038.5 | 31541.3 |

Error\( (\hat{m} )\) | 0.0403 | 0.0415 |

Error\( ( \hat{\xi }_1 )\) | 0.0462 | 0.0522 |

Error\( ( \hat{\xi }_2 )\) | 0.0592 | 0.0624 |

## 3 A comparison of ESAG with the Kent distribution

Figure 3 shows contours and transects of the densities of ESAG and Kent distributions. The parameter values for each are computed by fitting the two models to a large sample of independent and identically distributed data from ESAG(\({\mu }\),\({\gamma }\)), with \({\mu }= (0, 0, 2.5)^\top \) and \({\gamma }= (0.75, 0)^\top \). For the inner contours ESAG is more anisotropic than the matched Kent distribution and appears slightly more peaked at the mean. Besides these small differences, the figure shows that ESAG and Kent distributions are very similar distributions in this example, as we have found them to be more generally. Indeed, preliminary results, not presented here, suggest that for typical sample sizes it is usually very difficult to distinguish between them using a statistical criterion. This warrants making the modelling choice between using the Kent distribution or the ESAG on grounds of practical convenience. The Kent distribution is a member of the exponential family, but its density involves a non-closed-form normalising constant, and simulation requires a rejection algorithm (Kent et al. 2013). The ESAG distribution has a density that is less tidy than the Kent density, hence less suited to computing moment estimators, etc., but this is not much of a drawback given that its density can be computed exactly so that the exact likelihood can be easily maximised. Moreover, simulating from ESAG is particularly quick and easy (see Remark 3).

*m*. A measure we use for this is

*i*th run out of

*b*Monte Carlo runs. We also consider accuracy of the major and minor axes of the fitted model. Since the signs of \(\hat{\xi }_1\) and \(\hat{\xi }_2\) are arbitrary, in this case we define

Note that in interpreting the results in Table 1, the different simulation times of ESAG and Kent should be compared across columns, whereas the fitting times and accuracies should be compared across rows.

The results show, as expected, that the accuracy of \(\hat{m}\), \(\hat{\xi }_1\), and \(\hat{\xi }_2\) is typically better when the data-generating model is fitted. However, the accuracy is not dramatically worse when the non-data-generating model is fitted, i.e., when ESAG is fitted to Kent data, or the Kent distribution is fitted to ESAG data. There is a very notable difference in computation times between ESAG and Kent: for both simulation and fitting, ESAG is typically more than an order of magnitude faster than Kent.

## 4 Conclusion

In the pre-computer days of statistical modelling, the Fisher–Bingham family was perhaps favoured over the angular Gaussian family on account of having a simpler density, which makes it more amenable to constructing classical estimators such as moment estimators. However, in the era of computational statistics, the less simple form of the angular Gaussian density is hardly a barrier and is more than compensated by having a normalising constant that is trivial to evaluate. The likelihood can consequently be computed quickly and exactly, and maximised directly. Wang and Gelfand (2013) have recently argued in favour of the general angular Gaussian distribution as a model for Bayesian analysis of circular data. For spherical data, a major obstacle to using the general angular Gaussian distribution is that its parameters are poorly identified by the data. The ESAG subfamily overcomes this problem, and is a direct analogy of the Kent subfamily of the general Fisher–Bingham distribution. Besides having a tractable likelihood, the ease and speed with which ESAG can be simulated makes it especially well suited to methods of simulation-based inference. Natural wider applications of ESAG include using it as an error distribution for spherical regression models with anisotropic errors; for classification on the sphere (as a model for class-conditional densities); and for clustering spherical data (based on ESAG mixture models). Code written in MATLAB for performing calculations in this paper is available at the second author’s web page.

### References

- Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math.
**7**(3), 303–330 (2007). doi: 10.1007/s10208-005-0179-9 MathSciNetCrossRefMATHGoogle Scholar - Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. vol. 15, pp. 1455–1459. http://www.manopt.org (2014)
- Fisher, N., Hall, P., Jing, B.Y., Wood, A.T.A.: Improved pivotal methods for constructing confidence regions with directional data. J. Am. Stat. Assoc.
**91**(435), 1062–1070 (1996)MathSciNetCrossRefMATHGoogle Scholar - Hernandez-Stumpfhauser, D., Breidt, F., van der Woerd, M.: The general projected normal distribution of arbitrary dimension: modeling and Bayesian inference. Bayesian Anal.
**12**(1), 113–133 (2017)MathSciNetCrossRefGoogle Scholar - Kent, J.T.: The Fisher-Bingham distribution on the sphere. J. R. Stat. Soc. Series B
**44**, 71–80 (1982)MathSciNetMATHGoogle Scholar - Kent, J.T., Ganeiber, A., Mardia, K.V.: A new method to simulate the Bingham and related distributions in directional data analysis with applications. arXiv preprint arXiv:1310.8110 (2013)
- Kume, A., Preston, S.P., Wood, A.T.A.: Saddlepoint approximations for the normalizing constant of Fisher-Bingham distributions on products of spheres and Stiefel manifolds. Biometrika
**100**(4), 971–984 (2013)MathSciNetCrossRefMATHGoogle Scholar - Mardia, K.V.: Statistics of directional data (with discussion). J. R. Stat. Soc. Series B
**37**, 349–393 (1975)Google Scholar - Mardia, K.V., Jupp, P.E.: Directional Statistics. Wiley, Hoboken (2000)MATHGoogle Scholar
- Presnell, B., Morrison, S.P., Littell, R.C.: Projected multivariate linear models for directional data. J. Am. Stat. Assoc.
**93**, 1068–1077 (1998)MathSciNetCrossRefMATHGoogle Scholar - Schmidt, P.: The non-uniqueness of the Australian Mesozoic palaeomagnetic pole position. Geophys. J. Int.
**47**(2), 285–300 (1976)CrossRefGoogle Scholar - Teanby, N.: An icosahedron-based method for even binning of globally distributed remote sensing data. Comput. Geosci.
**32**(9), 1442–1450 (2006)CrossRefGoogle Scholar - Wang, F., Gelfand, A.: Directional data analysis under the general projected normal distribution. Stat. Methodol.
**10**(1), 113–127 (2013)MathSciNetCrossRefMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.