Skip to main content
Log in

A structured Dirichlet mixture model for compositional data: inferential and applicative issues

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

The flexible Dirichlet (FD) distribution (Ongaro and Migliorati in J. Multvar. Anal. 114: 412–426, 2013) makes it possible to preserve many theoretical properties of the Dirichlet one, without inheriting its lack of flexibility in modeling the various independence concepts appropriate for compositional data, i.e. data representing vectors of proportions. In this paper we tackle the potential of the FD from an inferential and applicative viewpoint. In this regard, the key feature appears to be the special structure defining its Dirichlet mixture representation. This structure determines a simple and clearly interpretable differentiation among mixture components which can capture the main features of a large variety of data sets. Furthermore, it allows a substantially greater flexibility than the Dirichlet, including both unimodality and a varying number of modes. Very importantly, this increased flexibility is obtained without sharing many of the inferential difficulties typical of general mixtures. Indeed, the FD displays the identifiability and likelihood behavior proper to common (non-mixture) models. Moreover, thanks to a novel non random initialization based on the special FD mixture structure, an efficient and sound estimation procedure can be devised which suitably combines EM-types algorithms. Reliable complete-data likelihood-based estimators for standard errors can be provided as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London (2003)

    MATH  Google Scholar 

  • Azzalini A, Menardi G, Rosolin T (2012) R package pdfCluster: cluster analysis via nonparametric density estimation (version 1.0-0). Università di Padova, Italia. http://cran.r-project.org/web/packages/pdfCluster/index.html

  • Banfield, J.D., Raftery, A.E.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Barndorff-Nielsen, O., Jørgensen, B.: Some parametric models on the simplex. J. Multivar. Anal. 39(1), 106–116 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal 41, 561–575 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 4, 315–332 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux, G., Chauveau, D., Diebolt, J.: Stochastic versions of the EM algorithm: an experimental study in the mixture case. J. Stat. Comput. Simul. 55, 287–314 (1996)

    Article  MATH  Google Scholar 

  • Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)

    Article  MathSciNet  MATH  Google Scholar 

  • Coxeter, H.: Regular Polytopes. Dover Publications, New York (1973)

    MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser B 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Diebolt, J., Ip, E.: Stochastic EM: method and application. In: WR Gilks, S.R., Spiegelhalter, D. (eds.) Markov Chain Monte Carlo in Practice, pp. 259–273. Chapman & Hall, London (1996)

    Google Scholar 

  • Efron, B.: Missing data, imputation, and the bootstrap. J. Am. Stat. Assoc. 89(426), 463–475 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Favaro, S., Hadjicharalambous, G., Prunster, I.: On a class of distributions on the simplex. J. Stat. Plan. Inference 141(426), 2987–3004 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Feng, Z., McCulloch, C.: Using bootstrap likelihood ratio in finite mixture models. J. R. Stat. Soc. B 58, 609–617 (1996)

    MATH  Google Scholar 

  • Forina M, Armanino C, Lanteri S, Tiscornia E (1983) Classification of olive oils from their fatty acid composition. In: Martens, Russwurm (eds) Food Research and Data Anlysis, Dip. Chimica e Tecnologie Farmaceutiche ed Alimentari, University of Genova

  • Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)

    MATH  Google Scholar 

  • Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions. J. Multivar. Anal. 23, 233–256 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, II. Probab. Math. Stat. 12, 291–309 (1991)

    MathSciNet  MATH  Google Scholar 

  • Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, III. J. Multivar. Anal. 43, 29–57 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, IV. J. Multivar. Anal. 54, 1–17 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, V. In: NL Johnson, N.B. (ed.) Advances in the Theory and Practice of Statistics: A Volume in Honour of Samuel Kotz, pp. 377–396. Wiley, New York (1997)

    Google Scholar 

  • Gupta, R.D., Richards, D.S.P.: The covariance structure of the multivariate liouville distributions. Contemp. Math. 287, 125–138 (2001a)

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta, R.D., Richards, D.S.P.: The history of the Dirichlet and Liouville distributions. Int. Stat. Rev. 69(3), 433–446 (2001b)

  • Hathaway, R.J.: A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann. Stat. 13(2), 795–800 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  • Kiefer, J., Wolfowitz, J.: Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Stat. 27(4), 887–906 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  • Lehmann, E., Casella, G.: Theory of Point Estimation. Springer, New York (1998)

    MATH  Google Scholar 

  • Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44(2), 226–233 (1982)

    MathSciNet  MATH  Google Scholar 

  • McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  • Meilijson, I.: A fast improvement to the EM algorithm on its own terms. J. R. Stat. Soc. Ser. B 51(1), 127–138 (1989)

    MathSciNet  MATH  Google Scholar 

  • Meng, X.L., Rubin, D.B.: Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J. Am. Stat. Assoc. 86(416), 899–909 (1991)

    Article  Google Scholar 

  • O’Hagan, A., Murphy, T.B., Gormley, I.C.: Computational aspects of fitting mixture models via the expectation-maximization algorithm. Comput. Stat. Data Anal. 56(12), 3843–3864 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Ongaro, A., Migliorati, S.: A generalization of the Dirichlet distribution. J. Multivar. Anal. 114, 412–426 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Palarea-Albaladejo, J., Martín-Fernández, J., Soto, J.: Dealing with distances and transformations for fuzzy c-means clustering of compositional data. J. Classif. 29, 144–169 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Pawlowsky-Glahn, V., Egozcue, J., Tolosana-Delgado, R.: Modeling and Analysis of Compositional Data. Wiley, New York (2015)

    Google Scholar 

  • Peters, B.C., Walker, H.F.: An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions. SIAM J. Appl. Math. 35(2), 362–378 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2015) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

  • Rayens, W.S., Srinivasan, C.: Dependence properties of generalized Liouville distributions on the simplex. J. Am. Stat. Assoc. 89(428), 1465–1470 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Redner, R.: Note on the consistency of the maximum likelihood estimate for non-identifiable distributions. Ann. Stat. 9, 225–228 (1981)

    Article  MATH  Google Scholar 

  • Rothenberg, T.: Identification in parametric models. Econometrica 39(3), 577–591 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  • Smith, B., Rayens, W.: Conditional generalized Liouville distributions on the simplex. Statistics 36(2), 185–194 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Wald, A.: Note on the consistency of the maximum likelihood estimate. Ann. Math. Stat. 20, 595–601 (1949)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We are grateful to the referees and to the editor for their constructive comments, which helped to improve the paper. Research partially financially supported by the Italian Ministry of University and Research, Grants F.A. 2014 from the University of Milano-Bicocca.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonia Migliorati.

Appendix

Appendix

1.1 Appendix 1: Vertexes of \(\textit{RSP}^{D}\) (see Sect. 3.1)

Having chosen as one dimensional simplex \(\textit{RSP}^{1}\) the line segment with vertexes \(\mathbf v_0=0\) and \(\mathbf v_1=1\), let us recursively determine the vertexes of \(\textit{RSP}^{n}\) (\(n\le D-1\)). Suppose we know the vertexes of \(\textit{RSP}^{n-1}\), that is \(\mathbf v_{0}^{n-1},\mathbf v_{1}^{n-1},\ldots ,\mathbf v_{n-1}^{n-1}\in {\mathbb {R}} ^{n-1}\), \(n\ge 2\). Then, \(\textit{RSP}^{n}\) can be obtained by adding to the n vertexes of \(\textit{RSP}^{n-1}\) a new vertex \(\mathbf v_n\) with the same distance 1 from all old vertexes. Therefore, the first n vertexes \(\mathbf v_{0}^{n},\mathbf v_{1}^{n},\ldots ,\mathbf v_{n-1}^{n}\) \(\in {\mathbb {R}} ^{n}\) of \(\textit{RSP}^{n}\) are obtained by adding a further coordinate equal to 0 to the \(\textit{RSP}^{n-1}\) vertexes (geometrically the first n vertexes of \(\textit{RSP}^{n}\) coincide with the vertexes of \(\textit{RSP}^{n-1}\)).

The last vertex \(\mathbf v_{n}^{n}\), having the same distance from all previous vertexes, has the first \(n-1\) coordinates equal to the barycenter \(\mathbf B^{n-1}\) of \(\textit{RSP}^{n-1}\). The last coordinate is then obtained by imposing that the distance between \(\mathbf v_{n}^{n}\) and one of the previous vertexes is one. In particular, we can choose \(\left\| \mathbf v_{n}^{n}- \mathbf v_{0}^{n}\right\| =\left\| \mathbf v_{n}^{n}\right\| =1\).

It is left to determine the coordinates of the generic barycenter \(\mathbf B^{n}\). Again we will proceed recursively, starting from \(\mathbf B^{1}=1/2\). As \(\mathbf B^{n}\) has the same distance from all vertexes of \(\textit{RSP}^{n}\), its first \(n-1\) coordinates must be equal to \(\mathbf B^{n-1}\). The last coordinate can be obtained by imposing that \(\left\| \mathbf B^{n}-\mathbf v_{n}^{n}\right\| \) is equal to the distance between \(\mathbf B^{n}\) and one of the first n vertexes of \(\textit{RSP}^{n}\). For example, we can set \(\left\| \mathbf B^{n}-\mathbf v_{n}^{n}\right\| =\left\| \mathbf B^{n}-\mathbf v_{0}^{n}\right\| =\left\| \mathbf B^{n}\right\| \). As the first \(n-1\) coordinates of \(\mathbf v_{n}^{n}\) are equal to \(\mathbf B^{n-1}\) and \(\left\| \mathbf v_{n}^{n}\right\| =1\), we can write

$$\begin{aligned} \left\| \mathbf B^{n}-\mathbf v_{n}^{n}\right\|= & {} B_n^n-\sqrt{1-\left\| \mathbf B^{n-1}\right\| ^2}\nonumber \\= & {} \sqrt{\left\| \mathbf B^{n-1}\right\| ^2+(B_n^n)^2}= \left\| \mathbf B^{n}\right\| \end{aligned}$$
(16)

where \(B_n^n\) is the last coordinate of \(\mathbf B^{n}\). One can then find \(B_n^n\) as a function of \(\left\| \mathbf B^{n-1}\right\| \) by using the second equality in (16). By plugging this expression in the last equality of (16), after some manipulation one arrives at the following recursive relation:

$$\begin{aligned} \left\| \mathbf B^{n}\right\| ^2=\left[ 4\left( 1-\left\| \mathbf B^{n-1}\right\| ^2\right) \right] ^{-1},\quad \quad n\ge 2. \end{aligned}$$

This recursive equation, together with the initial value \(\left\| \mathbf B^{1}\right\| ^2=1/4\), admits the explicit solution given by \(\left\| \mathbf B^{n}\right\| ^2=n/[2(1+n)]\). By (16) one then has

$$\begin{aligned} B_n^n=\left[ 2 n (1+n)\right] ^{-1/2} \end{aligned}$$

which coincides with the quantity \(a_i\) with \(i=n\) defined in Sect. 3.1. As the first \(n-1\) coordinates of \(\mathbf B^{n}\) are equal to \(\mathbf B^{n-1}\), by induction one explicitly determines \(\mathbf B^{n}\): its i-th coordinate is \(B_i^n=\left[ 2 i (1+i)\right] ^{-1/2}=a_i\), \(i=1,\ldots ,n\), \(n\ge 1\). By applying recursively the above described procedure for the derivation of the vertexes \(\mathbf v_{0}^{n},\mathbf v_{1}^{n},\ldots ,\mathbf v_{n}^{n}\) of \(\textit{RSP}^{n}\) from the vertexes \(\mathbf v_{0}^{n-1},\mathbf v_{1}^{n-1},\) \(\ldots ,\mathbf v_{n-1}^{n-1}\) of \(\textit{RSP}^{n-1}\) one obtains the result.

1.2 Appendix 2: Lemma 1—Rate of divergence of Dirichlet log-likelihood

Let \(\mathbf x_1,\ldots ,\mathbf x_n\) be an i.i.d. sample from a Dirichlet \(\mathcal{D}(\mathbf x;\varvec{\alpha })\). Then the log-likelihood diverges to \(-\infty \) when at least one of the \(\alpha _i\)’s goes to zero.

Moreover, suppose \(n\ge 2\). Then a.s. the log-likelihood:

  1. 1.

    is bounded from above;

  2. 2.

    diverges to \(-\infty \) when at least one of the \(\alpha _i\)’s diverges to \(+\infty \), with a rate of divergence not smaller than \(n\, \alpha ^+ \log \sum _{i=1}^D x_i^{(0)}\).

Here \(x_i^{(0)}=\prod _{j=1}^n x_{ji}^{1/n}\) denotes the geometric mean of the i-th element of the observations.

Suppose now \(n=1\). Then the log-likelihood:

  1. 1.

    is unbounded (from above and from below);

  2. 2.

    a necessary condition for its divergence to \(+\infty \) is that all \(\alpha _i\)’s diverge to \(+\infty \) and the rate of divergence is not larger than \(\frac{1}{2}(D-1) \log \alpha ^+\);

  3. 3.

    diverges to \(-\infty \) if at least one of the \(\alpha _i\)’s diverges to \(+\infty \) and at least one does not.

Proof

The Dirichlet log-likelihood \(l(\varvec{\alpha })\) can be written as

$$\begin{aligned} \frac{1}{n} l(\varvec{\alpha })=\log \varGamma (\alpha ^+)-\sum _{i=1}^D \log \varGamma (\alpha _i)+\sum _{i=1}^D \alpha _i\log x_i^{(0)}. \end{aligned}$$

As \(\log \varGamma (y)\approx -\log y\) as \(y\rightarrow 0\), then \(l(\varvec{\alpha })\rightarrow -\infty \) when one or more of the \(\alpha _i\)’s go to 0 and the others are fixed. Because \(l(\varvec{\alpha })\) is a regular function, this implies that it is bounded on the set \(\{\alpha ^+\in (0,k]\}\) for any \(k>0\) and, therefore, it may diverge only if \(\alpha ^+\rightarrow +\infty \). Thus, suppose M (\(1\le M\le D\)) \(\alpha _i\)’s go to \(+\infty \), say \(\alpha _1,\ldots ,\alpha _M\) without loss of generality, and denote \(\alpha _1^+\) their sum and \(\alpha _2^+=\alpha ^+-\alpha _1^+\). Then, the following approximation holds

$$\begin{aligned} \frac{1}{n} l(\varvec{\alpha })\approx & {} \alpha _1^+\sum _{i=1}^M \eta _i\log \frac{x_i^{(0)}}{\eta _i}+\alpha _2^+\log \alpha _1^+\nonumber \\&\quad +\frac{1}{2} \left( \sum _{i=1}^M \log \alpha _i-\log \alpha _1^+\right) \end{aligned}$$
(17)

where \(\eta _i=\alpha _i/\alpha _1^+\). Formula (17) can be obtained by means of a careful expansion of the terms of \(l(\varvec{\alpha })\) based on the two following approximations valid as \(y\rightarrow \infty \):

$$\begin{aligned} \log \varGamma (y)= & {} \left( y-\frac{1}{2}\right) \log y-y+\frac{1}{2}\log \pi +O \left( \frac{1}{y}\right) \\&\quad \frac{\varGamma (y+a)}{\varGamma (y)}\approx y^a \end{aligned}$$

the latter holding for fixed positive a.

The relation between geometric and arithmetic means implies that

$$\begin{aligned} \prod _{i=1}^M\left( \frac{x_i^{(0)}}{\eta _i}\right) ^{\eta _i}\le \sum _{i=1}^M \frac{x_i^{(0)}}{\eta _i} \eta _i \end{aligned}$$

and therefore:

$$\begin{aligned} \sum _{i=1}^M \eta _i \log \frac{x_i^{(0)}}{\eta _i}\le \log \sum _{i=1}^M x_i^{(0)}. \end{aligned}$$

Now, suppose \(n\ge 2\). Then all elements of the observations \(\mathbf x_1,\ldots ,\mathbf x_n\) are distinct a.s.. Thus \(\sum _{i=1}^M x_i^{(0)}<\sum _{i=1}^M \bar{x}_i\le 1\). It follows that \(l(\varvec{\alpha })\) goes to \(-\infty \) when at least one \(\alpha _i\rightarrow +\infty \) and the rate of divergence is not smaller than \(n\, \alpha ^+\log \sum _{i=1}^D x_i^{(0)}\). It also follows that \(l(\varvec{\alpha })\) is bounded a.s..

Suppose now that \(n=1\), so that \(x_i^{(0)}=\bar{x}_i=x_{1i}\) (\(i=1,\ldots ,D\)). If \(1\le M<D\) then \(\sum _{i=1}^M x_i^{(0)}=\sum _{i=1}^M x_{1i}<1\), and therefore the log-likelihood decreases to \(-\infty \). If \(M=D\), then \(\sum _{i=1}^D\eta _i\log \frac{x_{1i}}{\eta _i}\) achieves its maximum equal to zero if we set \(\eta _i=x_{1i}\) (\(i=1,\ldots ,D\)). Thus, if \(M=D\), the behavior of \(l(\varvec{\alpha })\) is determined by the term \((\sum _{i=1}^D\log \alpha _i-\log \alpha ^+)/2\). The latter can be shown to be smaller or equal to \((D-1)\log \alpha ^+-D\log D\) again by using the relation between geometric and arithmetic means. Hence, the rate of divergence of \(l(\varvec{\alpha })\) is not larger than \((D-1)\log \alpha ^+\).This rate can be exactly achieved if \(\eta _i=x_{1i}\) (\(i=1,\ldots ,D\)), which also shows that \(l(\varvec{\alpha })\) is indeed unbounded.

Note that the above arguments also imply that the log-likelihood diverges to \(-\infty \) when at least one of the \(\alpha _i\)’s goes to zero for any n even if \(M<D\) of the other \(\alpha _i\)’s diverge to \(+\infty \). \(\square \)

1.3 Appendix 3: Proof of Proposition 3

To prove a.s. boundedness of the log-likelihood \(l(\varvec{\theta })\) and the existence of a maximum we shall use the following upper bound:

$$\begin{aligned} l(\varvec{\theta })= & {} \sum _{j=1}^n\log \sum _{i=1}^D p_i f_{D}(\mathbf {x}_j;\varvec{\alpha _i})\nonumber \\\le & {} \max _{I_j,j=1,\ldots ,n}\;\sum _{j=1}^n\log f_{D}(\mathbf {x}_j;\varvec{\alpha }_{I_j})= l_U(\varvec{\alpha } , \tau ) \end{aligned}$$
(18)

where \(I_j\in \{1,\ldots ,D\}\) can be interpreted as the cluster to which observation \(\mathbf {x}_j\) has been assigned.

For ease of exposition, let us show boundedness first. By formula (18), we have:

$$\begin{aligned} \sup _{\varvec{\theta }} l(\varvec{\theta })\le & {} \sup _{\varvec{\theta }} \;\max _{I_j,j=1,\ldots ,n}\sum _{j=1}^n\log f_{D}(\mathbf {x}_j;\varvec{\alpha }_{I_j})\\= & {} \max _{I_j,j=1,\ldots ,n}\; \sup _{\varvec{\theta }} \sum _{j=1}^n\log f_{D}(\mathbf {x}_j;\varvec{\alpha }_{I_j}). \end{aligned}$$

Therefore, it is enough to show that, for any given allocation of the observations to the D Dirichlet clusters (i.e. for any \(I_j, j=1,\ldots ,n\)), the corresponding log-likelihood is bounded from above. Indeed, such log-likelihood coincides with the classified log-likelihood given by (14) and can be viewed as the sum of D Dirichlet log-likelihoods:

$$\begin{aligned} \sum _{j=1}^n\log f_{D}(\mathbf {x}_j;\varvec{\alpha }_{I_j})=\sum _{i=1}^D \sum _{j\in A_i} \log f_{D}(\mathbf {x}_j;\varvec{\alpha }_{i}) \end{aligned}$$

where \(A_i=\{j:z_j=i\}\) identify the observations assigned to cluster i. By Lemma 1, a necessary condition for any of the D Dirichlet log-likelihoods to be unbounded is that there exists at least one cluster with only one observation and all \(\alpha _i\)’s diverge to \(+\infty \) except at most one (in which case \(\tau \) goes to \(+\infty \) as well). In this case the sum of the log-likelihoods of the clusters (at most \(D-1\)) with only one observation diverges with a rate not larger than \(c_1\log (\alpha ^++\tau )\) where \(c_1\) is a positive constant. On the other hand, there must exist at least one cluster with two or more observations. Therefore, when some \(\alpha _i\)’s go to \(+\infty \), by Lemma 1 the corresponding log-likelihood tends a.s. to \(-\infty \) with a rate not smaller than \(-c_2(\alpha ^++\tau )\) where \(c_2\) is a positive constant. Thus, in this case, the classified likelihood diverges to \(-\infty \) which implies that \(l(\varvec{\theta })\) is a.s. bounded.

Let us now prove existence of a maximum. As the log-likelihood \(l(\varvec{\theta })\) is a regular and differentiable function, the existence of a global maximum can be proved by showing that the supremum is not reached at the boundary of the parameter space. More precisely, consider the frontier of \(\varvec{\varTheta }\) defined as the set of boundary points which are not actually in \(\varvec{\varTheta }\). We shall show that, when \(\varvec{\theta }\) tends to the frontier (i.e. \(\alpha _i\rightarrow 0\) or \(\alpha _i\rightarrow \infty \), \(p_i\rightarrow 1\) \(i=1,\ldots ,D\), \(\tau \rightarrow 0\) or \(\tau \rightarrow \infty \)), then the log-likelihood tends either to \(-\infty \) or to values not larger than the log-likelihood (based on the whole sample) of the Dirichlet distribution. As this distribution corresponds to interior points of the parameter space of the FD, such limiting values are dominated by \(l(\varvec{\theta })\) computed at those interior points. They can therefore be discarded.

To obtain the above limits, we shall study the upper bound \(l_U(\varvec{\alpha } , \tau )\) of \(l(\varvec{\theta })\) given in (18).

Suppose first that at least one of the \(\alpha _i\)’s goes to \(+\infty \), irrespectively of the behavior of the other parameters. For any given allocation of the observations to clusters (i.e. for any given \(I_1,\ldots ,I_n\)), there must exist at least one cluster with two or more observations. By Lemma 1, the corresponding Dirichlet log-likelihood tends to \(-\infty \) with rate dominating the log-likelihood of possible one-observation clusters. Thus, for any given allocation, the classified log-likelihood tends to \(-\infty \) and so does the upper bound \(l_U(\varvec{\alpha } , \tau )\). An analogous argument shows that \(l(\varvec{\theta })\) tends to \(-\infty \) even when \(\tau \) tends to \(+\infty \).

Suppose now that two or more \(\alpha _i\)’s go to zero. Then, whatever the allocation of the observations, in all Dirichlet cluster log-likelihoods there exists one parameter going to zero. Therefore, each Dirichlet log-likelihood goes to \(-\infty \), implying that \( l_U(\varvec{\alpha } , \tau )\) diverges as well.

Consider, instead, the case of a single \(\alpha \), say \(\alpha _1\), going to zero. Then, for all allocations with at least one observation not assigned to the first cluster, the corresponding classified log-likelihood tends to \(-\infty \). This is because the term corresponding to the first cluster tends to a finite value while all the others tend to \(-\infty \). On the other hand, if all observations are assigned to the first cluster, then the classified log-likelihood tends to a Dirichlet log-likelihood, computed on the whole sample, with parameter \((\tau , \alpha _2, \ldots , \alpha _D)\). It follows that \( l_U(\varvec{\alpha } , \tau )\) tends to the same limit as well. The latter limit is dominated by \(l(\varvec{\theta })\) computed at an interior point of \(\varvec{\theta }\).

Finally, if \(p_i\rightarrow 1\), \(i=1,\ldots ,D\), or \(\tau \rightarrow 0\), then it is straightforward to see that \(l(\varvec{\theta })\) converges to a Dirichlet log-likelihood and, again, it is dominated by the value of \(l(\varvec{\theta })\) at an interior point.

1.4 Appendix 4: Score statistic and information matrix of the complete-data likelihood

The elements \(s_c(\theta _r)=\partial \log L_c \left( \varvec{\theta }\right) /\partial \theta _r\) (\(r=1,\ldots ,\) 2D) of the score statistic \(\mathbf {S}_c\left( \varvec{\theta };\mathbf {X}_c\right) \) computed from (12) have the form:

$$\begin{aligned} s_c(p_i)= & {} \frac{z_{.i}}{p_i}-\frac{z_{.D}}{p_D} \quad \quad (i=1,\ldots ,D-1)\nonumber \\ s_c(\alpha _i)= & {} n \psi (\alpha ^+ +\tau )-z_{.i}\left[ \psi (\alpha _i+\tau )-\psi (\alpha _i)\right] \nonumber \\&-n\psi (\alpha _i)+\sum _{j=1}^n\log x_{ji}\;\; (i=1,\ldots ,D)\nonumber \\ s_c(\tau )= & {} n \psi (\alpha ^+ +\tau )-\sum _{i=1}^Dz_{.i}\psi (\alpha _i+\tau )\nonumber \\&+\sum _{i=1}^D\sum _{j=1}^nz_{ji}\log x_{ji} \end{aligned}$$
(19)

where \(z_{.i}=\sum _{j=1}^n z_{ji}\), \((i=1,\ldots ,D)\).

The elements \(i_c(\theta _r,\theta _p)=\partial ^2 \log L_c \left( \varvec{\theta }\right) /\partial \theta _r\partial \theta _p\) (r, \(p=1,\ldots ,2D\)) of the \(2D\times 2D\) matrix \(\mathbf {I}_c\left( \varvec{\theta };\mathbf {X}_c\right) \) assume the following form:

$$\begin{aligned} i_c(p_i,p_i)= & {} \frac{z_{.i}}{p_i^2}+\frac{z_{.D}}{p_D^2} \quad (i=1,\ldots , D-1)\nonumber \\ i_c(p_i,p_h)= & {} \frac{z_{.D}}{p_D^2} \quad (i\ne h;\;\; i,h=1,\ldots , D-1)\nonumber \\ i_c(p_i,\alpha _h)= & {} 0 \quad (i=1,\ldots , D-1) \quad (h=1,\ldots , D)\nonumber \\ i_c(p_i,\tau )= & {} 0 \quad (i=1,\ldots , D-1) \nonumber \\ i_c(\alpha _i,\alpha _i)= & {} -n\psi ^{\prime }(\alpha ^++\tau )+z_{.i}\left[ \psi ^{\prime }(\alpha _i+\tau )- \psi ^{\prime }(\alpha _i)\right] \nonumber \\&\quad +n\psi ^{\prime }(\alpha _i)\;\;(i=1,\ldots , D) \nonumber \\ i_c(\alpha _i,\alpha _h)= & {} -n\psi ^{\prime }(\alpha ^++\tau ) \quad (i\ne h;\;\; i,h=1,\ldots , D)\nonumber \\ i_c(\alpha _i,\tau )= & {} -n\psi ^{\prime }(\alpha ^++\tau )+z_{.i}\psi ^{\prime }(\alpha _i+\tau ) (i=1,\ldots , D)\nonumber \\ i_c(\tau ,\tau )= & {} -n\psi ^{\prime }(\alpha ^++\tau )+\sum _{i=1}^Dz_{.i}\psi ^{\prime }(\alpha _i+\tau ) \end{aligned}$$
(20)

where \(\psi ^{\prime }(\cdot )\) denotes the trigamma function.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Migliorati, S., Ongaro, A. & Monti, G.S. A structured Dirichlet mixture model for compositional data: inferential and applicative issues. Stat Comput 27, 963–983 (2017). https://doi.org/10.1007/s11222-016-9665-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9665-y

Keywords

Navigation