Abstract
Principal Component Analysis (PCA) is a very famous statistical tool for representing the data within lower dimension embedding. K-means is a prototype (centroid)-based clustering technique used in unsupervised learning tasks. Random Projection (RP) is another widely used technique for reducing the dimensionality. RP uses projection matrix to project the data into a feature space. Here, we prove the effectiveness of these methods by combining them for efficiently clustering the low as well as high-dimensional data. Our proposed algorithms works by combining Principal Component Analysis (PCA) with Random Projection (RP) to project the data into feature space, then performs K-means clustering on that reduced space (feature space). We compare the proposed algorithm’s performance with simple K-means and PCA-K-means algorithms on 12 benchmark datasets. Of these, 4 are low-dimensional and 8 are high-dimensional datasets. Our proposed algorithms outperform the other methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jolliffe, I.: Principal component analysis. Wiley Online Library. USA
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982)
Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26, 189–206 (1984)
Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recognit. Lett. 33, 1749–1755 (2012)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22, 60–65 (2003)
Dasgupta, S.: Experiments with random projection. In: Proceedings of the Sixteenth Conference on Uncertainity in Artificial Intelligence (UAI-2000), pp. 143–151 (2000)
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference of Machine Learning (2003)
Deegalla, S., Bostrom, H.: Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In: Proceedings of the 5th International Conference on Machine Learning and Applications (ICMLA), FL, pp. 245–250 (2006)
Bouveyron, C., Girard, S., Schmid, C.: High dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007)
Assent, I.: Clustering high dimensional data. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 2(4), 340–350 (2012)
Ding, C., He, X.: K-means clustering via principal component analysis. In: Proceedings of the 21st International Conference on Machine Learning, ACM (2004)
Qi, H., Hughes, S.M.: Invariance of principal components under low-dimensional random projection of the data. In: Proceedings of ICIP 2012 IEEE, pp. 937–940
Zhang, L., Cao, Q.: A novel ant-based clustering algorithm using the kernel method. Inf. Sci. 181, 4672–6658 (2011)
Alshamiri, A.K., Singh, A., Surampudi, B.R.: Combining ELM with random projections for low and high dimensional data classification and clustering. In: Proceedings of the Fifth International Conference on Fuzzy and Neuro Computing (FANCCO), IDRBT, Hyderabad, India, pp. 89–106 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pasunuri, R., Venkaiah, V.C., Srivastava, A. (2019). Clustering High-Dimensional Data: A Reduction-Level Fusion of PCA and Random Projection. In: Kalita, J., Balas, V., Borah, S., Pradhan, R. (eds) Recent Developments in Machine Learning and Data Analytics. Advances in Intelligent Systems and Computing, vol 740. Springer, Singapore. https://doi.org/10.1007/978-981-13-1280-9_44
Download citation
DOI: https://doi.org/10.1007/978-981-13-1280-9_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1279-3
Online ISBN: 978-981-13-1280-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)