Abstract
In this paper, we propose a three-way decision clustering approach for high-dimensional data. First, we propose a three-way K-medoids clustering algorithm, which produces clusters represented by three regions. Objects in the positive region of a cluster certainly belong to the cluster, objects in the negative region of a cluster definitively do not belong to the cluster, and objects in the boundary region of a cluster may belong to multiple clusters. Then, we propose the novel three-way decision clustering approach using random projection method. The basic idea is to apply the three-way K-medoids several times, increasing the dimensionality of the data after each iteration of three-way K-medoids. Because the center of the project result is used to be the initial center of the next projection, the time of computing is greatly reduced. Experimental results show that the proposed clustering algorithm is suitable for high-dimensional data and has a higher accuracy and does not sacrifice the computing time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C.: On high himensional projected clustering of uncertain data streams. In: Proceedings of the 25th International Conference on Data Engineering, pp. 1152–1154 (2009)
Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33(13), 1749–1755 (2012)
Choi, Y.K., Park, C.H., Kweon, I.S.: Accelerated k-means clustering using binary random projection. In: 12th Asian Conference on Computer Vision, pp. 257–272 (2014)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22(1), 60–65 (2003)
Deng, Z.H., Choi, K.S., Jiang, Y.Z., Wang, J., Wang, S.T.: A survey on soft subspace clustering. Inf. Sci. 348, 84–106 (2016)
Gan, G., Wu, J., Yang, Z.-J.: A fuzzy subspace algorithm for clustering high dimensional data. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 271–278. Springer, Heidelberg (2006)
Gunnemann, S., Kremer, H., Seidl, T.: Subspace clustering for uncertain data. In: Proceedings of the SIAM International Conference on Data Mining, SDM 2010, PP. 385–396 (2010)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: 30th Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM Press (1998)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26, 189C–206 (1984)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Kaur, H., Khanna, P.: Gaussian random projection based non-invertible cancelable biometric templates. Procedia Comput. Sci. 54, 661–670 (2015)
Kriegel, H.P., Kroger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering and correlation clustering. ACM Trans. Knowl. Disc. Data 3(1), 337–348 (2009)
Liu, K., Guo, Y., Pan, Y.: Information system evaluation based on the multidimensional utility mergence method. Inf. Stud. Theor. Appl. 35(3), 103–108 (2012). (In Chinese)
Murtagh, F., Contreras, P.: Random projection towards the baire metric for high dimensional clustering. In: Gammerman, A., Vovk, V., Papadopoulos, H. (eds.) SLDS 2015. LNCS, vol. 9047, pp. 424–431. Springer, Heidelberg (2015). doi:10.1007/978-3-319-17091-6_37
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: a probabilistic analysis. In: 7th ACM Symposium on Principles of Database Systems, pp. 159–168. ACM Press (1998)
Zhang, X., Gao, L., Yu, H.: Constraint based subspace clustering for high dimensional uncertain data. In: Khan, L., Bailey, J., Washio, T., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 271–282. Springer, Heidelberg (2016). doi:10.1007/978-3-319-31750-2_22
Xu, D.K., Tian, Y.J.: A Comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015)
Yao, Y.: An outline of a theory of three-way decisions. In: Yao, J.T., Yang, Y., Słowiński, R., Greco, S., Li, H., Mitra, S., Polkowski, L. (eds.) RSCTC 2012. LNCS, vol. 7413, pp. 1–17. Springer, Heidelberg (2012)
Yao, Y.Y.: Three-way decisions and cognitive computing. Cogn. Comput. 8, 543–554 (2016)
Yu, H., Liu, Z.G., Wang, G.Y.: An automatic method to determine the number of clusters using decision-theoretic rough set. Int. J. Approximate Reason. 55(1), 101–115 (2014)
Yu, H., Zhang, C., Wang, G.Y.: A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl. Based Syst. 91, 189–203 (2016)
Wang, Y.T., Chen, L.H., Mei, J.P.: Incremental fuzzy clustering with multiple medoids for large data. IEEE Trans. Fuzzy Syst. 22(6), 1557–1568 (2014). IEEE Press
http://personalpages.manchester.ac.uk/mbs/Julia.Handl/generators.html
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61379114 & 61533020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Yu, H., Zhang, H. (2016). A Three-Way Decision Clustering Approach for High Dimensional Data. In: Flores, V., et al. Rough Sets. IJCRS 2016. Lecture Notes in Computer Science(), vol 9920. Springer, Cham. https://doi.org/10.1007/978-3-319-47160-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-47160-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47159-4
Online ISBN: 978-3-319-47160-0
eBook Packages: Computer ScienceComputer Science (R0)