A Novel Stability Based Feature Selection Framework for k-means Clustering

Mavroeidis, Dimitrios; Marchiori, Elena

doi:10.1007/978-3-642-23783-6_27

Dimitrios Mavroeidis²³ &
Elena Marchiori²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6912))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3262 Accesses
1 Citations

Abstract

Stability of a learning algorithm with respect to small input perturbations is an important property, as it implies the derived models to be robust with respect to the presence of noisy features and/or data sample fluctuations. In this paper we explore the effect of stability optimization in the standard feature selection process for the continuous (PCA-based) k-means clustering problem. Interestingly, we derive that stability maximization naturally introduces a tradeoff between cluster separation and variance, leading to the selection of features that have a high cluster separation index that is not artificially inflated by the feature’s variance. The proposed algorithmic setup is based on a Sparse PCA approach, that selects the features that maximize stability in a greedy fashion. In our study, we also analyze several properties of Sparse PCA relevant to stability that promote Sparse PCA as a viable feature selection mechanism for clustering. The practical relevance of the proposed method is demonstrated in the context of cancer research, where we consider the problem of detecting potential tumor biomarkers using microarray gene expression data. The application of our method to a leukemia dataset shows that the tradeoff between cluster separation and variance leads to the selection of features corresponding to important biomarker genes. Some of them have relative low variance and are not detected without the direct optimization of stability in Sparse PCA based k-means.

This work was partially supported by the Netherlands Organization for Scientific Research (NWO) within NWO project 612.066.927.

Download to read the full chapter text

Chapter PDF

A new hybrid stability measure for feature selection

Article 10 June 2020

A method for searching for a globally optimal k-partition of higher-dimensional datasets

Article 13 February 2024

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Article Open access 21 March 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: ACM SIGKDD (2010)
Google Scholar
Chomez, P., Backer, O.D., Bertrand, M., Plaen, E.D., Boon, T., Lucas, S.: An overview of the mage gene family with the identification of all human members of the family. Cancer Research 15, 6 (2001)
Google Scholar
d’Aspremont, A., Bach, F.R., Ghaoui, L.E.: Full regularization path for sparse principal component analysis. In: ICML (2007)
Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: ACM SIGKDD (2001)
Google Scholar
Ding, C.H.Q., He, X.: K-means clustering via principal component analysis. In: ICML (2004)
Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Han, Y., Yu, L.: A variance reduction framework for stable feature selection. In: IEEE ICDM (2010)
Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS (2005)
Google Scholar
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)
Article Google Scholar
Loscalzo, S., Yu, L., Ding, C.H.Q.: Consensus group stable feature selection. In: ACM SIGKDD (2009)
Google Scholar
Mackey, L.: Deflation methods for sparse pca. In: NIPS (2008)
Google Scholar
Mavroeidis, D., Vazirgiannis, M.: Stability based sparse LSI/PCA: Incorporating feature selection in LSI and PCA. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 226–237. Springer, Heidelberg (2007)
Chapter Google Scholar
Munson, M.A., Caruana, R.: On feature selection, bias-variance, and bagging. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 144–159. Springer, Heidelberg (2009)
Chapter Google Scholar
Nicolas, E., Ramus, C., Berthier, S., Arlotto, M., Bouamrani, A., Lefebvre, C., Morel, F., Garin, J., Ifrah, N., Berger, F., Cahn, J.Y., Mossuz, P.: Expression of s100a8 in leukemic cells predicts poor survival in de novo aml patients. Leukemia 25, 57–65 (2011)
Article Google Scholar
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)
Chapter Google Scholar
Scupoli, M., Donadelli, M., Cioffi, F., Rossi, M., Perbellini, O., Malpeli, G., Corbioli, S., Vinante, F., Krampera, M., Palmieri, M., Scarpa, A., Ariola, C., Foa, R., Pizzolo, G.: Bone marrow stromal cells and the upregulation of interleukin-8 production in human t-cell acute lymphoblastic leukemia through the cxcl12/cxcr4 axis and the nf-kappab and jnk/ap-1 pathways. Haematologica 93(4), 524–532 (2008)
Article Google Scholar
Shahzad, A., Knapp, M., Lang, I., Kohler, G.: Interleukin 8 (il-8) - a universal biomarker? International Archives of Medicine 3(11) (2010)
Google Scholar
Stewart, G.W., Sun, J.G.: Matrix Perturbation Theory (Computer Science and Scientific Computing). Academic Press, London (1990)
Google Scholar
Waugh, D., Wilson, C.: The interleukin-8 pathway in cancer. Clinical Cancer Research (2008)
Google Scholar
Wolf, L., Shashua, A.: Feature selection for unsupervised and supervised inference: The emergence of sparsity in a weight-based approach. J. Mach. Learn. Res. (2005)
Google Scholar
Yu, L., Ding, C.H.Q., Loscalzo, S.: Stable feature selection via dense feature groups. In: ACM SIGKDD (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
Dimitrios Mavroeidis & Elena Marchiori

Authors

Dimitrios Mavroeidis
View author publications
You can also search for this author in PubMed Google Scholar
Elena Marchiori
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mavroeidis, D., Marchiori, E. (2011). A Novel Stability Based Feature Selection Framework for k-means Clustering. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23783-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-23783-6_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23782-9
Online ISBN: 978-3-642-23783-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Novel Stability Based Feature Selection Framework for k-means Clustering

Abstract

Chapter PDF

Similar content being viewed by others

A new hybrid stability measure for feature selection

A method for searching for a globally optimal k-partition of higher-dimensional datasets

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Novel Stability Based Feature Selection Framework for k-means Clustering

Abstract

Chapter PDF

Similar content being viewed by others

A new hybrid stability measure for feature selection

A method for searching for a globally optimal k-partition of higher-dimensional datasets

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation