Two-Stage Multi-Sample Cluster Analysis as a General Approach to Discriminant Analysis

  • Dorothea Eisenblätter
  • Hamparsum Bozdogan
Part of the Theory and Decision Library book series (TDLB, volume 8)


This paper introduces Two-Stage Multi-Sample Cluster Analysis (TSMSCA), i.e., the problem of grouping samples and improving upon homogeneity via reassigning individual objects, as a general approach to ‘classical’ discriminant analysis (DA).

Akaike’s Information Criterion (AIC) and Bozdogan’s CAIC are derived and used in TSMSCA to choose the best fitting model and the best partition among all possible clustering alternatives. With this approach the dimension of the discriminant space is determined, and using a decision-tree classifier, the best lower dimensional models are identified, yielding a hierarchy of efficient separation and assignment rules. On each step of the hierarchy, the performance of the classification of the best discriminant model is evaluated either by a cross-validation method or the method of conditional clustering.

Cross-validation reassigns one object at a time based only on the tentatively updated model, whereas the conditional clustering method actually executes reassignments of objects via a transfer and swapping algorithm given the best discriminant model as the initial partition.

Numerical examples are carried out on real data sets to demonstrate the generality and versatility of the proposed new approach.

Key words and phrases

Two-Stage Multi-Sample Cluster Analysis Cluster Analysis Discriminant Analysis AIC CAIC 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. (1973). ‘Information Theory and an Extension of the Maximum Likelihood Principle’ in Second International Symposium on Information Theory, (B.N. Petrov and F. Csaki, editors). Akademiai Kiado: Budapest, 267–281.Google Scholar
  2. Akaike, H. (1974). ‘A New Look at the Statistical Model Identification,’ IEEE Transactions on Automatic Control AC-19, 716–723.MathSciNetCrossRefGoogle Scholar
  3. Akaike, H. (1977). ‘On Entropy Maximization Principle’ in Proceedings on Applications of Statistics (P.R. Krishnaiah, editor). North-Holland: Amsterdam, 27–47.Google Scholar
  4. Akaike, H. (1979). ‘A Bayesian Analysis of the Minimum AIC Procedure,’ Annals of the Institute of Statistical Mathematics (Part A) 30, 9–14.MathSciNetCrossRefGoogle Scholar
  5. Akaike, H. (1981). ‘Likelihood of a Model and Information Criteria,’ Journal of Econometrics 16, 3–14.zbMATHCrossRefGoogle Scholar
  6. Andrews, D.F., and Herzberg, A.M. (1985). Data. A Collection of Problems from Many Fields for the Student and Research Worker. Springer: New York.zbMATHGoogle Scholar
  7. Banfield, C.F., and Bassill, L.C. (1977). ‘Algorithm AS 113: A Transfer Algorithm for Non-hierarchical Classification,’ Applied Statistics 26, 206–210.CrossRefGoogle Scholar
  8. Box, G.E.P. (1949). ‘A General Distribution Theory for a Class of Likelihood Criteria,’ Biometrika 36, 317–346.MathSciNetGoogle Scholar
  9. Box, G.E.P., and Cox, D.R. (1964). ‘An Analysis of Transformations,’ (with discussion), Journal of the Royal Statistical Society (B) 26, 211–252.MathSciNetzbMATHGoogle Scholar
  10. Bozdogan, H. (1983). ‘Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria,’ Technical Report No. UIC/DQM/A83-1, June 16, 1983, Army Research Office Contract DAAG29-82-K-0155, University of Illinois at Chicago, Box 4348, Chicago, Illinois 60680.Google Scholar
  11. Bozdogan, H. (1984). ‘AIC-Replacements for Multivariate Multi-Sample Conventional Tests of Homogeneity Models,’ Technical Paper #4 in Statistics, Department of Mathematics, University of Virginia, Charlottesville, VA, 22903.Google Scholar
  12. Bozdogan, H. (1986). ‘Multi-Sample Cluster Analysis as a General Alternative to Multiple Comparison Procedures,’ Bulletin of Informatics and Cybernetics Research Association of Statistical Sciences 22, 95–130.zbMATHGoogle Scholar
  13. Bozdogan, H. (1987). ‘Model Selection and Akaike’s Information Criterion (AIC): The General Theory and Its Analytical Extensions,’ (to appear in the Special Issue of Psychometrika).Google Scholar
  14. Bozdogan, H., and Sclove, S.L. (1984). ‘Multi-Sample Cluster Analysis Using Akaike’s Information Criterion,’ Annals of the Institute of Statistical Mathematics (Part B) 36, 243–253.Google Scholar
  15. Duran, B.S., and Odell, P.L. (1974). Cluster Analysis: A Survey. Springer: New York.zbMATHGoogle Scholar
  16. Eisenblätter, D. (1987). Two-Stage Multi-Sample Cluster Analysis, Ph.D. Thesis (anticipated), Seminar für Wirtschafts- und Sozialstatistik der Universität zu Köln.Google Scholar
  17. Fahrmeir, L., and Hamerle, A., editors (1984). Multivariate statistische Verfahren, de Gruyter: Berlin.zbMATHGoogle Scholar
  18. Fisher, R.A. (1936). ‘The Use of Multiple Measurements in Taxonomic Problems,’ Annals of Eugenics 7, 179–188.CrossRefGoogle Scholar
  19. Ganesalingam, S., and McLachlan, G.J. (1979). ‘A Case Study of Two Clustering Methods Based on Maximum Likelihood,’ Statistical Neerlandica 33, 81–90.MathSciNetzbMATHCrossRefGoogle Scholar
  20. Johnson, R.A., and Wichern, D. (1983). Applied Multivariate Statistical Analysis. Prentice Hall: New York.Google Scholar
  21. Lachenbruch, P.A. (1975). Discriminant Analysis. Hafner Press: New York.zbMATHGoogle Scholar
  22. Lachenbruch, P.A., and Mickey, M.R. (1968). ‘Estimation of Error Rates in Discriminant Analysis,’ Technometrics 10, 1–11.MathSciNetCrossRefGoogle Scholar
  23. Mardia, K.V., Kent, J.T., and Bibby, J.M. (1979). Multivariate Analysis. Academic Press: New York.zbMATHGoogle Scholar
  24. Schwarz, G. (1978). ‘Estimating the Dimension of a Model,’ Annals of Statistics 6, 461–464.MathSciNetzbMATHCrossRefGoogle Scholar
  25. Sclove, S.C. (1977). ‘Population Mixture Models and Clustering Algorithms,’ Communications in Statistics A 6, 417–434.MathSciNetCrossRefGoogle Scholar
  26. Sclove, S.C. (1983). ‘Application of the Conditional Population Mixture Model to Image Segmentation,’ IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 428–433.CrossRefGoogle Scholar
  27. Seber, G.A. (1984). Multivariate Observations. Wiley: New York.zbMATHCrossRefGoogle Scholar
  28. Späth, H. (1975). Cluster-Analyse-Algorithmen. Oldenbourg: München.zbMATHGoogle Scholar
  29. Späth, H. (1983). Cluster-Formation und -Analyse. Oldenbourg: München.zbMATHGoogle Scholar
  30. Symons, M.J. (1981). ‘Clustering Criteria and Multivariate Normal Mixtures,’ Biometrics 37, 35–43.MathSciNetzbMATHCrossRefGoogle Scholar
  31. Titterington, D.M., Smith, A.F.M., and Makov, U.E. (1985). Statistical Analysis of Finite Mixture Distributions. Wiley: New York.zbMATHGoogle Scholar
  32. Wilks, S.S. (1932). ‘Certain Generalization in the Analysis of Variance,’ Biometrika 24, 471–494.Google Scholar

Copyright information

© D. Reidel Publishing Company, Dordrecht, Holland 1987

Authors and Affiliations

  • Dorothea Eisenblätter
    • 1
  • Hamparsum Bozdogan
    • 2
  1. 1.Seminar für Wirtschaftsund SozialstatistikUniversität zu KölnKöln 41Federal Republic of Germany
  2. 2.Department of Mathematics Math./ Astronomy BuildingUniversity of VirginiaCharlottesvilleUSA

Personalised recommendations