Much of the classification literature ignores notions of probability. In our view, this is due in part to a dominant tendency in the early days of computers for developing heuristic clustering algorithms and in part due to long traditions in classification outside the statistical/probabilistic orbit, of which biological taxonomy and book classification are primary examples. Statisticians have rightly stressed the role of probabilistic concepts in formulating classification problems and in interpreting classifications but we believe that they are wrong in suggesting, as they sometimes seem to, that other approaches are unsatisfactory. Probability has its proper place in classification but it is neither an essential nor always an appropriate tool. We discuss circumstances where non- probabilistically-based classifications are fully justified.
Considerations influencing the differences between the two approaches include: 1) Irrespective of whether things are to be assembled into classes (arranged hierarchically or not) or assigned to previously recognised classes, methodology depends on whether the things may be regarded as representing groups or as samples from groups; 2) Models are basic to the formulation of statistically based classifications, but they may also underpin nonprobabilistic classifications; overt models are not a characteristic of heuristic classification algorithms; 3) In principle, probabilistic models allow the significance and number of clusters justified by data to be assessed. In non-probabilistic classifications (probabilistic too), the eighteenth century concept of approximation offers a good basis for assessing the adequacy and stability of clusters.
KeywordsProbabilistic Classification Non-probabilistic Classification Classes Groups Assignment Class Construction
Unable to display preview. Download preview PDF.
- Gower, J. C. (1998). Classification in: Encyclopaedia of Biostatistics, Armitage, P. and Colton, T. (Eds.), Wiley, Chichester, (in press).Google Scholar