Computational Statistics

, Volume 16, Issue 4, pp 539–558 | Cite as

Computing the Standards Errors of Mixture Model Parameters with EM when Classes are Well Separated

  • Michel Wedel


It is shown that for finite mixtures the missing information tends to zero as the number of observations on each subject increases. Then, the classes become perfectly separated (i.e. the posterior membership probabilities are close to 0 or 1), the observed information tends to the complete information and the class-specific parameters in the mixture model become information orthogonal across classes. Then the asymptotic standard errors of parameter estimates can be obtained directly from the EM algorithm. The degree of class-separation is derived for which the amount of missing observation is approximately negligible and the asymptotic standard errors based on the complete information matrix are sufficiently accurate. Empirical illustrations are provided. A Monte Carlo study is performed to examine the extent to which the approximation is adequate. A comparison is made with other methods to approximate the observed information matrix. It is concluded that if the entropy of the posterior probabilities is larger than 0.95 the proposed approximation is reasonably accurate.

Key Words

Mixture Models EM algorithm Observed Information Complete Information Missing Information Information Orthogonality Entropy 



The author is greatly indebted to Paul Bekker, Frenkel ter Hofstede, Ton Steerneman and two anonymous reviewers for their very useful comments and suggestions.


  1. Amemiya, T. (1985), Advanced Econometrics, Harvard University Press — Cambridge.Google Scholar
  2. Aptech (1994), GAUSS Reference Manual, Aptech Systems Inc — Maple Valley.Google Scholar
  3. Bozdogan, H. (1987), Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions, Psychometrika, 52, 345–370.MathSciNetCrossRefGoogle Scholar
  4. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977), Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, B39, 1–38.MathSciNetzbMATHGoogle Scholar
  5. DeSarbo, W.S. and Cron, W. (1988). A maximum likelihood methodology for clusterwise linear regression, Journal of Classification 5, 249–282MathSciNetCrossRefGoogle Scholar
  6. Dolan, C.V. and Maas, H.L.J, van der (1998), Fitting multivariate normal finite mixtures subject to structural equation modeling, Psychometrika, 63, 227–253.CrossRefGoogle Scholar
  7. Jedidi, K., Jagpal, H.S. and DeSarbo, W.S. (1997), STEMM: A general finite-mixture structural equation model, Journal of Classification, 14, 23–50.CrossRefGoogle Scholar
  8. Little, R.J.A. and Rubin, D.B. (1987), Statistical Analysis with Missing Data, Wiley — New York.zbMATHGoogle Scholar
  9. Lindsey, J.K. (1996), Parametric Statistical Inference, The Clarendon Press — OxfordzbMATHGoogle Scholar
  10. McLachlan, G.J. and Krishnan, T. (1997), The EM algorithm and Extensions, John Wiley — New York.zbMATHGoogle Scholar
  11. Meng, X.L. and Rubin, D.B. (1991), Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm, Journal of the American Statistical Association, 86, 899–909.CrossRefGoogle Scholar
  12. Meileijson, I (1989). A Fast Improvement to the EM Algorithm on Its Own terms, Journal of the Royal Statistical Society, B, 51, 127–138.MathSciNetGoogle Scholar
  13. Ramaswamy, V., DeSarbo, W.S., Reibstein, D.J., and Robinson, W.T. (1992), An Empirical pooling approach for estimating marketing mix elasticities with PIMS data, Marketing Science, 12, 241–254.Google Scholar
  14. Schiffman, S.S., Reynolds, M.L. and Young, F.W. (1981), Introduction to multidimensional Scaling, Academic Press — London.zbMATHGoogle Scholar
  15. Stuart, A. and Ord, K. (1994), Kendalls Advanced Theory of Statistics, Edward Arnold — London.Google Scholar
  16. Titterington, D.M., Smith, A.F.M. and Makov, U.E. (1985), Statistical Analysis of Finite Mixture Distributions, John Wiley and Sons — New York.zbMATHGoogle Scholar
  17. Wedel, M. and DeSarbo, W.S. (1993), A Latent Class Binomial Logit Methodology for the Analysis of Paired Comparison Choice Data: An Application Reinvestigating the Determinants of Perceived Risk, Decision Sciences, 24, 1157–1170.CrossRefGoogle Scholar
  18. Wedel, M. and DeSarbo, W.S. (1995), A Mixture Likelihood Approach for Generalized Linear Models, Journal of Classification, 12, 1–35.CrossRefGoogle Scholar
  19. Wedel, M. and DeSarbo, W.S. (1996), An Exponential Family Multidimensional Scaling Mixture Methodology, Journal of Business and Economic Statistics, 14, 447–459.Google Scholar
  20. Wedel, M., Kamakura, W.A., DeSarbo, W.S., and Hofstde, F. ter (1995), Implications for asymetry, nonproportionality and heterogeneity in brand switching from piece-wise exponential hazard models, Journal of Marketing Research, 32, 457–462.CrossRefGoogle Scholar

Copyright information

© Physica-Verlag 2001

Authors and Affiliations

  • Michel Wedel
    • 1
    • 2
  1. 1.Faculty of EconomicsUniversity of GroningenGroningenThe Netherlands
  2. 2.The University of Michigan Business SchoolAnn ArborUSA

Personalised recommendations