Advertisement

Journal of Intelligent Information Systems

, Volume 32, Issue 3, pp 213–235 | Cite as

Extending the rand, adjusted rand and jaccard indices to fuzzy partitions

  • Roelof K. BrouwerEmail author
Article

Abstract

The first stage of knowledge acquisition and reduction of complexity concerning a group of entities is to partition or divide the entities into groups or clusters based on their attributes or characteristics. Clustering is one of the most basic processes that are performed in simplifying data and expressing knowledge in a scientific endeavor. It is akin to defining classes. Since the output of clustering is a partition of the input data, the quality of the partition must be determined as a way of measuring the quality of the partitioning (clustering) process. The problem of comparing two different partitions of a finite set of objects reappears continually in the clustering literature. This paper looks at some commonly used clustering measures including the rand index (RI), adjusted RI (ARI) and the jaccuard index(JI) that are already defined for crisp clustering and extends them to fuzzy clustering measures giving FRI,FARI and FJI. These new indices give the same values as the original indices do in the special case of crisp clustering. The extension is made by first finding equivalent expressions for the parameters, a, b, c, and d of these indices in the case of crisp clustering. A relationship called bonding that describes the degree to which two cluster members are in the same cluster or class is first defined. Through use in crisp clustering and fuzzy clustering the effectiveness of the indices is demonstrated.

Keywords

Measures of agreement Measures of association Consensus indices Fuzzy clustering Clustering quality Array processing language J programming language J notation 

Notes

Acknowledgement

The support of an Natural Sciences and Engineering Research Council grant 227338-04 from the Canadian Government is greatly appreciated as is the support of the Department of Mechanical and Mechatronics Engineering of the University of Stellenbosch. The work of the reviewers in making this a better paper is also appreciated.

References

  1. Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.Google Scholar
  2. Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.zbMATHGoogle Scholar
  3. Brouwer, R. K. (2006). Clustering without use of prototypes. Kamloops: Thompson Rivers University press TR TRU-CS-CIG-2006-01, May 1.Google Scholar
  4. Brouwer, R. K. (2008) A clustering quality measure based on the proximity matrices for the pattern vectors and the membership vectors. International Journal Pattern Recognition and Artificial Intelligence (vol. under review).Google Scholar
  5. Brouwer, R. K., & Groenwold, A. (2007). A method of proximity matrix based fuzzy clustering. In L. Wang, & Y. Jin (Eds.) FSKD 2007 Fourth International Conference on Fuzzy Systems and Knowledge Discovery (pp. 91–97). New York: Springer.CrossRefGoogle Scholar
  6. Collins, L. M., & Dent, C. W. (1988). Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions. Multivariate Behavioural Research, 23, 231–42.CrossRefGoogle Scholar
  7. DeRisi, J. I., Iyer, V. R., & Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680–686.CrossRefGoogle Scholar
  8. Eisen, M. B., Spellman, P. T., & Brown, P. O. (1998). Cluster analysis and display of genome-wide expression patterns. National Academy of Science of the United States of America, 95, 14863–14868.CrossRefGoogle Scholar
  9. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.Google Scholar
  10. Gentleman, R., Carey, V. J., Huber, W., Irizarry, R. A., & Dudoit, S. (2005). Bioinformatics and computational biology solutions using R and bioconductor. New York: Springer.zbMATHCrossRefGoogle Scholar
  11. Graves, D. (2006). Clustering quality measures. Kamloops: Thompson Rivers University press TR TRU-CIG-2006-07, July.Google Scholar
  12. Hoppner, F., Klawonn, F., Kruse, R., & Runkler, T. (1999). Fuzzy cluster analysis. Hoboken: Wiley.Google Scholar
  13. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–198.CrossRefGoogle Scholar
  14. Hubert, L. J., & Golledge, R. G. (1981). A heuristic method for the comparison of related structures. Journal of Mathematical Psychology, 23, 214–226.CrossRefGoogle Scholar
  15. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Upper Saddle River: Prentice Hall.zbMATHGoogle Scholar
  16. Li, T., Ma, S., & Ogihara, M. (2004a). Entropy-based criterion in categorical clustering. in 21st International Conference on Machine learning, Banff, Canada, 4–8 July.Google Scholar
  17. Li, H., Zhang, K., & Jiang, T. (2004b). Minimum entropy clustering and applications to gene expression analysis. IEEE Computational Systems Bioinformatics Conference, 142–151. DOI  10.1109/CSB.2004.1332427.
  18. Milligan, G. W., & Cooper, M. C. (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21, 441–558.CrossRefGoogle Scholar
  19. Morey, L. C., & Agresti, A. (1984). The measurement of classification agreement: An adjustment to the rand statistic for chance agreement. Educational and Psychological Measurement, 44, 3–37.CrossRefGoogle Scholar
  20. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.CrossRefGoogle Scholar
  21. Robert, S., & Ken, S. (1996). A computer program to calculate Hubert and Arabie's adjusted rand index. Journal of Classification, 13, 169–172.CrossRefGoogle Scholar
  22. Saporta, G., & Youness, G. (2002). Comparing two partitions: Some proposals and experiments. in COMPSTAT, 15th Conference on Computational Statistics. Berlin: Institute of Statistics and Econometrics.Google Scholar
  23. Thalamuthu, A., Mukhopadhyay, I., Zheng, X., & Tseng, G. C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 22, 2405–2412 October 1.CrossRefGoogle Scholar
  24. Wen, X., Fuhrman, S., Michaels, G. S., Carr, D. B., Smith, S., Barker, J. I., et al. (1998). Large-scale temporal gene expression mapping of central nervous system development. National Academy of Science of the United States of America, 95, 334–339.CrossRefGoogle Scholar
  25. Yeung, K. Y., Haynor, D. R., & Ruzzo, W. L. (2001). Validating clustering for gene expression data. Bioinformatics, 17, 309–318 April 1.CrossRefGoogle Scholar
  26. Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17, 763–774 September 1.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Mechanical Engineering and Mechatronics EngineeringUniversity of StellenboschMatielandSouth Africa
  2. 2.Department of Computing ScienceThompson Rivers UniversityKamloopsCanada

Personalised recommendations