Gaussian Fuzzy Index (GFI) for Cluster Validation: Identification of High Quality Biologically Enriched Clusters of Genes and Selection of Some Possible Genes Mediating Lung Cancer
In this article, we propose an index, called Gaussian Fuzzy-index (GFI), based on the notion of fuzzy set theory, for validating the clusters obtained by a clustering algorithm. This index is then used to identify some genes that have altered quite significantly from normal stage to diseased stage with respect to their expression patterns. Thus we can predict some possible disease mediating genes from microarray gene expression data. The methodology has been demonstrated on the gene expression data set dealing with human lung cancer. The performance of GFI is compared with 8 existing cluster validity indices. The results are appropriately validated using biochemical pathways. We have also implemented different cluster validity indices to demonstrate superior capability of GFI over the others.
KeywordsCluster Algorithm Fuzzy Cluster Human Lung Cancer Microarray Gene Expression Data Separation Index
- 3.Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)Google Scholar
- 17.Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceeding of fifth Fuzzy Syst. Symp., pp. 247–250 (1989)Google Scholar
- 22.Beer, G.D., et al.: Gene-expression profilespredict survival of patients with lung adenocarcinoma. Nature Medicine 8, 816–823 (2002)Google Scholar
- 23.Dubes, R.C., Jain, A.K.: Algorithms for clustering data. Prentice Hall (1988)Google Scholar