Cluster analysis is an important problem of unsupervised machine learning. Model-based clustering is one of the most popular clustering techniques based on finite mixture models. Upon fitting of a mixture model, one question naturally arises as to how many misclassifications there are in the partition. At the same time, rather limited literature is devoted to developing diagnostic tools for obtained clustering solution. In this paper, an algorithm is developed for efficiently estimating the misclassification probability. The confusion probability map and classification confidence region are proposed for predicting the confusion matrix, identifying which cluster causes the most confusion, and understand the distribution of misclassifications. Application to real-life datasets illustrates the developed technique with promising results.
Finite mixture models Classification confidence region Diagnostics Misclassification
This is a preview of subscription content, log in to check access.
The research is partially funded by the University of Louisville EVPRI internal research grant from the Office of the Executive Vice President for Research and Innovation.
Anderson E (1935) The Irises of the Gaspe peninsula. Bull Am Iris Soc 59:2–5Google Scholar
Azzalini A, Bowman AW (1990) A look at some data on the old faithful geyser. J R Stat Soc C 39:357–365zbMATHGoogle Scholar