Abstract
k-means clustering is one of the popular procedures for multivariate analysis in which observations are classified into a reduced number of clusters. The resulting centroid matrix is refereed to capture variables which characterize clusters, but between-clusters contrasts in the centroid matrix are not always clear and thus difficult to interpret. In this research, we address the problem in interpretation and propose a new procedure of k-means clustering which produces a sparse and thus interpretable centroid matrix. The proposed procedure is called SPARK. In SPARK, the sparseness of the centroid matrix is constrained and therefore it contains a number of exact zero elements. Because of this, the contrasts between-clusters are highlighted and it allows us to interpret clusters easier in comparison with the standard k-means clustering. A sparsity selection procedure for determining the optimal sparsity of the centroid with reduced computational load is also proposed. Behaviors of the proposed procedure are evaluated by two real data examples, and the results indicate that SPARK performs well for dealing with real world problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity. CRC press.
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml
Lorenzo-Seva, U. (2003). A factor simplicity index. Psychometrika, 68(1), 49–60.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, 281–297.
Sun, W., Wang, J., & Fang, Y. (2012). Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics, 6, 148–167.
Witten, D., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), 713–726.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Yamashita, N., Adachi, K. (2018). SPARK: A New Clustering Algorithm for Obtaining Sparse and Interpretable Centroids. In: Wiberg, M., Culpepper, S., Janssen, R., González, J., Molenaar, D. (eds) Quantitative Psychology. IMPS 2017. Springer Proceedings in Mathematics & Statistics, vol 233. Springer, Cham. https://doi.org/10.1007/978-3-319-77249-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-77249-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77248-6
Online ISBN: 978-3-319-77249-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)