Gaussian Fuzzy Index (GFI) for Cluster Validation: Identification of High Quality Biologically Enriched Clusters of Genes and Selection of Some Possible Genes Mediating Lung Cancer

  • Anupam Ghosh
  • Rajat K. De
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8251)


In this article, we propose an index, called Gaussian Fuzzy-index (GFI), based on the notion of fuzzy set theory, for validating the clusters obtained by a clustering algorithm. This index is then used to identify some genes that have altered quite significantly from normal stage to diseased stage with respect to their expression patterns. Thus we can predict some possible disease mediating genes from microarray gene expression data. The methodology has been demonstrated on the gene expression data set dealing with human lung cancer. The performance of GFI is compared with 8 existing cluster validity indices. The results are appropriately validated using biochemical pathways. We have also implemented different cluster validity indices to demonstrate superior capability of GFI over the others.


Cluster Algorithm Fuzzy Cluster Human Lung Cancer Microarray Gene Expression Data Separation Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bezdek, J.C.: On clustering validation techniques. J. Cybernet. 17, 58–73 (1974)MathSciNetGoogle Scholar
  2. 2.
    Deborah, L.J., Baskaran, R., Kannan, A.: A survey on internal validity measure for cluster validation. IJCSES 1, 85–102 (2010)CrossRefGoogle Scholar
  3. 3.
    Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)Google Scholar
  4. 4.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Machine Intell. 1, 224–227 (1979)CrossRefGoogle Scholar
  5. 5.
    Rousseeuw, P.J.: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)CrossRefzbMATHGoogle Scholar
  6. 6.
    Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychologie 29, 190–241 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Goodman, L., Kruskal, W.: Measures of associations for cross-validations. J. Am. Stat. Assoc. 49, 732–764 (1954)zbMATHGoogle Scholar
  8. 8.
    Pauwels, E.J., Frederix, G.: Finding salient regions in images: nonparametric clustering for image segmentation and grouping. Computer Vision and Image Understanding 75, 73–85 (1999)CrossRefGoogle Scholar
  9. 9.
    Trauwaert, E.: On the meaning of dunn’s partition coefficient for fuzzy clusters. Fuzzy Sets Systems 25, 217–242 (1988)CrossRefzbMATHGoogle Scholar
  10. 10.
    Yun, X.U., Brereton, G.R.: A comparative study of cluster validation indices applied to genotyping data. Chemometrics and Intelligent Laboratory Systems 78, 30–40 (2005)CrossRefGoogle Scholar
  11. 11.
    Bensaid, A.M., Hall, L.O., Bezdek, J., Clarke, L.P., Silbiger, M.L., Arrington, J.A., Murtagh, R.F.: Validity-guided (re) clustering with applications to imige segmentation. IEEE Transactions on Fuzzy Systems 4, 112–123 (1996)CrossRefGoogle Scholar
  12. 12.
    Wu, K., Yang, M.: A cluster validity index for fuzzy clustering. Pattern Recognition Lett. 26, 1275–1291 (2005)CrossRefGoogle Scholar
  13. 13.
    Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Zadeh, L.A.: A fuzzy-set-theoretic interpretation of linguistic hedges. Journal of Cybernetics 2, 4–34 (1972)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Bandler, W., Kohout, L.J.: Fuzzy power sets and fuzzy implication operators. Fuzzy Sets and Systems 4, 13–30 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Xie, X.L., Beni, G.A.: Validity measure for fuzzy clustering. IEEE Trans. PAMI 3, 841–846 (1991)CrossRefGoogle Scholar
  17. 17.
    Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceeding of fifth Fuzzy Syst. Symp., pp. 247–250 (1989)Google Scholar
  18. 18.
    Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Machine Intell. 11, 773–781 (1989)CrossRefGoogle Scholar
  19. 19.
    Dave, R.N.: Validating fuzzy partition obtained through c-shells clustering. Pattern Recognition Lett. 17, 613–623 (1996)CrossRefGoogle Scholar
  20. 20.
    Akaike, H.: A bayesian extension of the minimum aic procedure of autoregressive model fitting. Biometrika 66, 237–242 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Pakhira, M., Bandyopadhyay, S., Maulik, U.: A study of some fuzzy cluster validity indices, genetic clustering and application to pixel classification. Fuzzy Sets and Systems 155, 191–214 (2005)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Beer, G.D., et al.: Gene-expression profilespredict survival of patients with lung adenocarcinoma. Nature Medicine 8, 816–823 (2002)Google Scholar
  23. 23.
    Dubes, R.C., Jain, A.K.: Algorithms for clustering data. Prentice Hall (1988)Google Scholar
  24. 24.
    Bezdek, J.: Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York (1981)CrossRefzbMATHGoogle Scholar
  25. 25.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)CrossRefGoogle Scholar
  26. 26.
    Gibbons, F.D., Roth, F.P.: Judging the quality of gene expression-based clustering methods using gene annotation. Genome Research 12, 1574–1581 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Anupam Ghosh
    • 1
  • Rajat K. De
    • 2
  1. 1.Department of Computer Science and EngineeringNetaji Subhash Engineering CollegeKolkataIndia
  2. 2.Department of Machine Intelligence UnitIndian Statistical InstituteKolkataIndia

Personalised recommendations