An Ensemble Feature Ranking Algorithm for Clustering Analysis

  • Jaehong Yu
  • Hua Zhong
  • Seoung Bum KimEmail author


Feature ranking is a widely used feature selection method. It uses importance scores to evaluate features and selects those with high scores. Conventional unsupervised feature ranking methods do not consider the information on cluster structures; therefore, these methods may be unable to select the relevant features for clustering analysis. To address this limitation, we propose a feature ranking algorithm based on silhouette decomposition. The proposed algorithm calculates the ensemble importance scores by decomposing the average silhouette widths of random subspaces. By doing so, the contribution of a feature in generating cluster structures can be represented more clearly. Experiments on different benchmark data sets examined the properties of the proposed algorithm and compared it with the existing ensemble-based feature ranking methods. The experiments demonstrated that the proposed algorithm outperformed its existing counterparts.


Ensemble importance score Random subspace method Silhouette decomposition Unsupervised feature ranking 



  1. Andrews, J. L., & Mcnicholas, P. D. (2014). Variable selection for clustering and classification. Journal of Classification, 31(2), 136–153.MathSciNetzbMATHGoogle Scholar
  2. Arbelaitz, O., Gurrutxaga, I., Muguerrza, J., Pèrez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243–256.Google Scholar
  3. Arthur, D., and Vassilvitskii, S. (2007). “k-means++: the advantages of careful seeding”, in Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, 2007, pp. 1027–1035.Google Scholar
  4. Ayad, H. G., & Kamel, M. S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1), 160–173.Google Scholar
  5. Boutsidis, C., Drineas, P., and Mahoney, M.W. (2009), “Unsupervised feature selection for the k-means clustering problem”, in Proceedings of the Advances in Neural Information Processing Systems, pp. 153–161.Google Scholar
  6. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.zbMATHGoogle Scholar
  7. Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1), 1–27.MathSciNetzbMATHGoogle Scholar
  8. Chiang, M. M. T., & Mirkin, B. (2010). Intelligent choice of the number of clusters in k-means clustering: an experimental study with different cluster spreads. Journal of Classification, 27(1), 3–40.MathSciNetzbMATHGoogle Scholar
  9. de Amorim, R. C. (2016). A survey on feature weighting based k-means algorithms. Journal of Classification, 33(2), 210–242.MathSciNetzbMATHGoogle Scholar
  10. de Amorim, R. C., & Mirkin, B. (2012). Minkowski metric, feature weighting and anomalous cluster initializing in k-means clustering. Pattern Recognition, 45(3), 1061–1075.Google Scholar
  11. de Amorim, R. C., Makarenkov, V., & Mirkin, B. (2016). A-Ward: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialization. Information Sciences, 370, 343–354.Google Scholar
  12. de Amorim, R. C., Shestakov, A., Mirkin, B., & Makarenkov, V. (2017). The Minkowski central partition as a pointer to a suitable distance exponent and consensus partitioning. Pattern Recognition, 67, 62–72.Google Scholar
  13. Dy, J. G., & Brodley, C. E. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, 845–889.MathSciNetzbMATHGoogle Scholar
  14. Elghazel, H., & Aussem, A. (2015). Unsupervised feature selection with ensemble learning. Machine Learning, 98(1–2), 157–180.MathSciNetzbMATHGoogle Scholar
  15. Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 835–850.Google Scholar
  16. Guerra, L., Robles, V., Bielza, C., & Larrañaga, P. (2012). A comparison of clustering quality indices using outliers and noise. Intelligent Data Analysis, 16(4), 703–715.Google Scholar
  17. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.zbMATHGoogle Scholar
  18. Handl, J., & Knowles, J. (2006). Feature subset selection in unsupervised learning via multiobjective optimization. International Journal of Computational Intelligence Research, 2(3), 217–238.MathSciNetGoogle Scholar
  19. Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C: Applied Statistics, 28(1), 100–108.zbMATHGoogle Scholar
  20. HE, X., CAI, D., and NIYOGI, P. (2005), “Laplacian score for feature selection”, in Proceedings of the Advances in Neural Information Processing Systems, pp. 507–514.Google Scholar
  21. Herrero, J., Dìaz-uriarte, R., & Dopazo, J. (2003). Gene expression data preprocessing. Bioinformatics, 19(5), 655–656.Google Scholar
  22. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.Google Scholar
  23. Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008a). Consensus unsupervised feature ranking from multiple views. Pattern Recognition Letters, 29(5), 595–602.Google Scholar
  24. Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008b). Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognition, 41(9), 2742–2756.zbMATHGoogle Scholar
  25. Huang, J. Z., Ng, M. K., Rong, H., & LI, Z. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 657–668.Google Scholar
  26. Hubrert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.Google Scholar
  27. Iam-on, N., Boongoen, T., Garrett, S., & Price, C. (2011). A link-based approach to the cluster ensemble problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2396–2409.Google Scholar
  28. Ketchen, D. J., Jr., & Shook, C. L. (1996). The application of cluster analysis in strategic management research: an analysis and critique. Strategic Management Journal, 17, 441–458.Google Scholar
  29. Kim, S. B., & Rattakorn, P. (2011). Unsupervised feature selection using weighted principal components. Expert Systems with Applications, 38(5), 5704–5710.Google Scholar
  30. Kim, E. Y., Kim, S. Y., Ashlock, D., & Nam, D. (2009). MULTI-K: Accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinformatics, 10(1), 260.Google Scholar
  31. Kuncheva, L. I., & Vetrov, D. P. (2006). Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 1798–1808.Google Scholar
  32. Lai, C., Reinders, M. J., & Wessels, L. (2006). Random subspace method for multivariate feature selection. Pattern Recognition Letters, 27(10), 1067–1076.Google Scholar
  33. Li, F., Zhang, Z., & Jin, C. (2016). Feature selection with partition differentiation entropy for large-scale data sets. Information Sciences, 329, 690–700.zbMATHGoogle Scholar
  34. Liu, Y., Li, Z., Xiong, H., Gao, X., and Wu, J. (2010), “Understanding of internal clustering validation measures”, in Proceedings of IEEE 10th International Conference on Data Mining (ICDM), pp. 911–916.Google Scholar
  35. MacQueen, J. (1967), “Some methods for classification and analysis of multivariate observations”, in Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, 1(14), pp. 281–297.Google Scholar
  36. Makarenkov, V., & Legendre, P. (2001). Optimal variable weighting for ultrametric and additive trees and K-means partitioning: methods and software. Journal of Classification, 18(2), 245–271.MathSciNetzbMATHGoogle Scholar
  37. Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Unsupervised feature selection using feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3), 301–312.Google Scholar
  38. Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: a scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.Google Scholar
  39. Panday, D., de Amorim, R. C., & Lane, P. (2018). Feature weighting as a tool for unsupervised feature selection. Information Processing Letters, 129, 44–52.MathSciNetzbMATHGoogle Scholar
  40. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.zbMATHGoogle Scholar
  41. Steinley, D., & Brusco, M. J. (2007). Initializing k-means batch clustering: a critical evaluation of several techniques. Journal of Classification, 24(1), 99–121.MathSciNetzbMATHGoogle Scholar
  42. Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston: Addison-Wesley.Google Scholar
  43. Vendramin, L., Campello, R. J., & Hruschka, E. R. (2010). Relative clustering validity criteria: a comparative overview. Statistical Analysis and Data Mining: the ASA Data Science Journal, 3(4), 209–235.MathSciNetGoogle Scholar
  44. Xu, R. F., & Lee, S. J. (2015). Dimensionality reduction by feature clustering for regression problems. Information Sciences, 299, 42–57.MathSciNetzbMATHGoogle Scholar
  45. Yang, C., Wan, B., and Gao, X. (2006), “Effectivity of Internal Validation Techniques for Gene Clustering”, in Proceedings of International Symposium on Biological and Medical Data Analysis, pp. 49–59.Google Scholar
  46. Yu, J., & Kim, S. B. (2016). A density-based Noisy graph partitioning algorithm. Neurocomputing, 175, 473–491.Google Scholar
  47. Yu, Z., Wang, D., You, J., Wong, H. S., Wu, S., Zhang, J., & Han, G. (2016). Progressive subspace ensemble learning. Pattern Recognition, 60, 692–705.Google Scholar
  48. Zhang, S., Wong, H. S., Shen, Y., & Xie, D. (2012). A new unsupervised feature ranking method for gene expression data based on consensus affinity. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4), 1257–1263.Google Scholar
  49. Zhong, C., Yue, X., Zhang, Z., & Lei, J. (2015). A clustering ensemble: Two-level refined co-association matrix with path-based transformation. Pattern Recognition, 48(8), 2699–2709.zbMATHGoogle Scholar

Copyright information

© The Classification Society 2019

Authors and Affiliations

  1. 1.Department of Population Health, School of MedicineNew York UniversityNew YorkUSA
  2. 2.Department of Industrial Management EngineeringKorea UniversitySeoulSouth Korea

Personalised recommendations