Clustering in Knowledge Embedded Space

  • Yungang Zhang
  • Changshui Zhang
  • Shijun Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2837)


Cluster analysis is a fundamental technique in pattern recognition. It is difficult to cluster data on complex data sets. This paper presents a new algorithm for clustering. There are three key ideas in the algorithm: using mutual neighborhood graphs to discover knowledge and cluster data; using eigenvalues of local covariance matrixes to express knowledge and form a knowledge embedded space; and using a denoising trick in knowledge embedded space to implement clustering. Essentially, it learns a new distance metric by knowledge embedding and makes clustering become easier under this distance metric. The experiment results show that the algorithm can construct a quality neighborhood graph from a complex and noisy data set and well solve clustering problems.


Input Space Neighborhood Graph Nonlinear Dimensionality Reduction False Edge Intrinsic Dimensionality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  2. 2.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)Google Scholar
  3. 3.
    Ng, R., Han, J.: Efficient and effective clustering method for spatial data mining. In: Proc. of the 20th VLDB Conference, Santiago, Chile, pp. 144–155 (1994)Google Scholar
  4. 4.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD, 226-231 (1996)Google Scholar
  5. 5.
    Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proc. of 1998 ACM-SIGMOD Int. Conf. on Management of Data (1998)Google Scholar
  6. 6.
    Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Intl Conf. on Data Eng. (1999)Google Scholar
  7. 7.
    Karypis, G., Han, E., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)Google Scholar
  8. 8.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)CrossRefGoogle Scholar
  9. 9.
    Saul, L.K., Roweis, S.T.: An introduction to locally linear embedding. Tech. rep., AT&T Labs - Research (2001)Google Scholar
  10. 10.
    Fukunaga, K., Olsen, D.R.: An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers 20, 176–183 (1971)zbMATHCrossRefGoogle Scholar
  11. 11.
    Pettis, K., Bailey, I., Jain, T., Dubes, R.: An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 25–37 (1979)zbMATHCrossRefGoogle Scholar
  12. 12.
    Kambhatla, N., Leen, T.K.: Dimension reduction by local principal component analysis. Neural Computation 9(7), 1493–1516 (1997)CrossRefGoogle Scholar
  13. 13.
    Schölkopf, B., Smola, A.J., Müller, K.R.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation 10, 1299–1319 (1998)CrossRefGoogle Scholar
  14. 14.
    Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.R., Raetsch, G., Smola, A.: Input Space vs. Feature Space in Kernel-Based Methods. IEEE Trans. on NN 10(5), 1000–1017 (1999)Google Scholar
  15. 15.
    Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization and Beyond. MIT Press, Cambridge, Massachusetts (2002)Google Scholar
  16. 16.
    Jain, A.K., Robert, P.W., Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence (1999) Google Scholar
  17. 17.
    Harel, D., Koren, Y.: Clustering Spatial Data Using Random Walks. In: Proceedings of The 7th ACM Int. Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 281–286. ACM Press, New York (2001)CrossRefGoogle Scholar
  18. 18.
    Tenenbaum, J.B., de Silvam, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)CrossRefGoogle Scholar
  19. 19.
    Zaïane, O.R., Foss, A., Lee, C.-H., Wang, W.: On Data Clustering Analysis: Scalability, Constraints and Validation. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 28. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3, 209–226 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Yungang Zhang
    • 1
  • Changshui Zhang
    • 1
  • Shijun Wang
    • 1
  1. 1.State Key Laboratory of Intelligent Technology and Systems, Department of AutomationTsinghua UniversityBeijingChina

Personalised recommendations