Abstract
Large-scale spectral clustering in high-dimensional space is among the most popular unsupervised problems. Existed sampling schemes have different limitations on high-dimensional data. This paper proposes an improved Nyström extension based spectral clustering algorithm with a designed sampling scheme for high-dimensional data. We first take insight into some existed sampling schemes. We illustrate their defects especially in high dimension scene. Furthermore we provide theoretical analysis on how the similarity between the sample set and non-sampling set influences the approximation error, and propose an improved sampling scheme, the minimum similarity sampling (MSS) for high-dimensional space clustering. Experiments on both synthetic and real datasets show that the proposed sampling scheme outperforms other algorithms when applied in Nyström based spectral clustering with higher accuracy, and lowers the time consumption for sampling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Belabbas, M.A., Wolfe, P.J.: Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences 106(2), 369–374 (2009)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: AAAI (2011)
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the nystrom method. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2), 214–225 (2004)
Huang, L., Yan, D., Taft, N., Jordan, M.I.: Spectral clustering with perturbed data. In: Advances in Neural Information Processing Systems, pp. 705–712 (2008)
Hunter, B., Strohmer, T.: Performance analysis of spectral clustering on compressed, incomplete and inaccurate measurements. arXiv preprint arXiv:1011.0997 (2010)
Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. Journal of the ACM (JACM) 51(3), 497–515 (2004)
MeilPa, M., Shi, J.: Learning segmentation by random walks (2000)
Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 2, 849–856 (2002)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Shinnou, H., Sasaki, M.: Spectral clustering for a large data set by reducing the similarity matrix size. In: Preceedings of the Sixth International Language Resouces and Evaluation, LREC (2008)
Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)
Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)
Zhang, K., Tsang, I.W., Kwok, J.T.: Improved nyström low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1232–1239. ACM (2008)
Zhang, X., You, Q.: Clusterability analysis and incremental sampling for nyström extension based spectral clustering. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 942–951. IEEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zeng, Z., Zhu, M., Yu, H., Ma, H. (2014). Minimum Similarity Sampling Scheme for Nyström Based Spectral Clustering on Large Scale High-Dimensional Data. In: Ali, M., Pan, JS., Chen, SM., Horng, MF. (eds) Modern Advances in Applied Intelligence. IEA/AIE 2014. Lecture Notes in Computer Science(), vol 8482. Springer, Cham. https://doi.org/10.1007/978-3-319-07467-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-07467-2_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07466-5
Online ISBN: 978-3-319-07467-2
eBook Packages: Computer ScienceComputer Science (R0)