Extending the rand, adjusted rand and jaccard indices to fuzzy partitions
- 490 Downloads
The first stage of knowledge acquisition and reduction of complexity concerning a group of entities is to partition or divide the entities into groups or clusters based on their attributes or characteristics. Clustering is one of the most basic processes that are performed in simplifying data and expressing knowledge in a scientific endeavor. It is akin to defining classes. Since the output of clustering is a partition of the input data, the quality of the partition must be determined as a way of measuring the quality of the partitioning (clustering) process. The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. This paper looks at some commonly used clustering measures including the rand index (RI), adjusted RI (ARI) and the jaccuard index(JI) that are already defined for crisp clustering and extends them to fuzzy clustering measures giving FRI,FARI and FJI. These new indices give the same values as the original indices do in the special case of crisp clustering. The extension is made by first finding equivalent expressions for the parameters, a, b, c, and d of these indices in the case of crisp clustering. A relationship called bonding that describes the degree to which two cluster members are in the same cluster or class is first defined. Through use in crisp clustering and fuzzy clustering the effectiveness of the indices is demonstrated.
KeywordsMeasures of agreement Measures of association Consensus indices Fuzzy clustering Clustering quality Array processing language J programming language J notation
The support of an Natural Sciences and Engineering Research Council grant 227338-04 from the Canadian Government is greatly appreciated as is the support of the Department of Mechanical and Mechatronics Engineering of the University of Stellenbosch. The work of the reviewers in making this a better paper is also appreciated.
- Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.Google Scholar
- Brouwer, R. K. (2006). Clustering without use of prototypes. Kamloops: Thompson Rivers University press TR TRU-CS-CIG-2006-01, May 1.Google Scholar
- Brouwer, R. K. (2008) A clustering quality measure based on the proximity matrices for the pattern vectors and the membership vectors. International Journal Pattern Recognition and Artificial Intelligence (vol. under review).Google Scholar
- Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.Google Scholar
- Graves, D. (2006). Clustering quality measures. Kamloops: Thompson Rivers University press TR TRU-CIG-2006-07, July.Google Scholar
- Hoppner, F., Klawonn, F., Kruse, R., & Runkler, T. (1999). Fuzzy cluster analysis. Hoboken: Wiley.Google Scholar
- Li, T., Ma, S., & Ogihara, M. (2004a). Entropy-based criterion in categorical clustering. in 21st International Conference on Machine learning, Banff, Canada, 4–8 July.Google Scholar
- Li, H., Zhang, K., & Jiang, T. (2004b). Minimum entropy clustering and applications to gene expression analysis. IEEE Computational Systems Bioinformatics Conference, 142–151. DOI 10.1109/CSB.2004.1332427.
- Saporta, G., & Youness, G. (2002). Comparing two partitions: Some proposals and experiments. in COMPSTAT, 15th Conference on Computational Statistics. Berlin: Institute of Statistics and Econometrics.Google Scholar