Abstract
Cluster analysis is a fundamental technique in pattern recognition. It is difficult to cluster data on complex data sets. This paper presents a new algorithm for clustering. There are three key ideas in the algorithm: using mutual neighborhood graphs to discover knowledge and cluster data; using eigenvalues of local covariance matrixes to express knowledge and form a knowledge embedded space; and using a denoising trick in knowledge embedded space to implement clustering. Essentially, it learns a new distance metric by knowledge embedding and makes clustering become easier under this distance metric. The experiment results show that the algorithm can construct a quality neighborhood graph from a complex and noisy data set and well solve clustering problems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)
Ng, R., Han, J.: Efficient and effective clustering method for spatial data mining. In: Proc. of the 20th VLDB Conference, Santiago, Chile, pp. 144–155 (1994)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD, 226-231 (1996)
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proc. of 1998 ACM-SIGMOD Int. Conf. on Management of Data (1998)
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Intl Conf. on Data Eng. (1999)
Karypis, G., Han, E., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Saul, L.K., Roweis, S.T.: An introduction to locally linear embedding. Tech. rep., AT&T Labs - Research (2001)
Fukunaga, K., Olsen, D.R.: An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers 20, 176–183 (1971)
Pettis, K., Bailey, I., Jain, T., Dubes, R.: An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 25–37 (1979)
Kambhatla, N., Leen, T.K.: Dimension reduction by local principal component analysis. Neural Computation 9(7), 1493–1516 (1997)
Schölkopf, B., Smola, A.J., Müller, K.R.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation 10, 1299–1319 (1998)
Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.R., Raetsch, G., Smola, A.: Input Space vs. Feature Space in Kernel-Based Methods. IEEE Trans. on NN 10(5), 1000–1017 (1999)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization and Beyond. MIT Press, Cambridge, Massachusetts (2002)
Jain, A.K., Robert, P.W., Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence (1999)
Harel, D., Koren, Y.: Clustering Spatial Data Using Random Walks. In: Proceedings of The 7th ACM Int. Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 281–286. ACM Press, New York (2001)
Tenenbaum, J.B., de Silvam, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Zaïane, O.R., Foss, A., Lee, C.-H., Wang, W.: On Data Clustering Analysis: Scalability, Constraints and Validation. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 28. Springer, Heidelberg (2002)
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3, 209–226 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Zhang, C., Wang, S. (2003). Clustering in Knowledge Embedded Space. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_43
Download citation
DOI: https://doi.org/10.1007/978-3-540-39857-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive