Clustering in Knowledge Embedded Space

Zhang, Yungang; Zhang, Changshui; Wang, Shijun

doi:10.1007/978-3-540-39857-8_43

Yungang Zhang¹⁰,
Changshui Zhang¹⁰ &
Shijun Wang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2837))

Included in the following conference series:

European Conference on Machine Learning

2090 Accesses
5 Citations

Abstract

Cluster analysis is a fundamental technique in pattern recognition. It is difficult to cluster data on complex data sets. This paper presents a new algorithm for clustering. There are three key ideas in the algorithm: using mutual neighborhood graphs to discover knowledge and cluster data; using eigenvalues of local covariance matrixes to express knowledge and form a knowledge embedded space; and using a denoising trick in knowledge embedded space to implement clustering. Essentially, it learns a new distance metric by knowledge embedding and makes clustering become easier under this distance metric. The experiment results show that the algorithm can construct a quality neighborhood graph from a complex and noisy data set and well solve clustering problems.

Download to read the full chapter text

Chapter PDF

Learning with ℓ 1-Graph for High Dimensional Data Analysis

A New Model and Algorithm for Clustering

Efficient regularized spectral data embedding

Article 24 February 2020

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)
Google Scholar
Ng, R., Han, J.: Efficient and effective clustering method for spatial data mining. In: Proc. of the 20th VLDB Conference, Santiago, Chile, pp. 144–155 (1994)
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD, 226-231 (1996)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proc. of 1998 ACM-SIGMOD Int. Conf. on Management of Data (1998)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Intl Conf. on Data Eng. (1999)
Google Scholar
Karypis, G., Han, E., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Saul, L.K., Roweis, S.T.: An introduction to locally linear embedding. Tech. rep., AT&T Labs - Research (2001)
Google Scholar
Fukunaga, K., Olsen, D.R.: An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers 20, 176–183 (1971)
Article MATH Google Scholar
Pettis, K., Bailey, I., Jain, T., Dubes, R.: An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 25–37 (1979)
Article MATH Google Scholar
Kambhatla, N., Leen, T.K.: Dimension reduction by local principal component analysis. Neural Computation 9(7), 1493–1516 (1997)
Article Google Scholar
Schölkopf, B., Smola, A.J., Müller, K.R.: Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation 10, 1299–1319 (1998)
Article Google Scholar
Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.R., Raetsch, G., Smola, A.: Input Space vs. Feature Space in Kernel-Based Methods. IEEE Trans. on NN 10(5), 1000–1017 (1999)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization and Beyond. MIT Press, Cambridge, Massachusetts (2002)
Google Scholar
Jain, A.K., Robert, P.W., Duin, Jianchang Mao: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence (1999)
Google Scholar
Harel, D., Koren, Y.: Clustering Spatial Data Using Random Walks. In: Proceedings of The 7th ACM Int. Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 281–286. ACM Press, New York (2001)
Chapter Google Scholar
Tenenbaum, J.B., de Silvam, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Zaïane, O.R., Foss, A., Lee, C.-H., Wang, W.: On Data Clustering Analysis: Scalability, Constraints and Validation. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 28. Springer, Heidelberg (2002)
Chapter Google Scholar
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software 3, 209–226 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Tsinghua University, Beijing, 100084, China
Yungang Zhang, Changshui Zhang & Shijun Wang

Authors

Yungang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Changshui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shijun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Rudjer Bošković Institute, Bijenička 54, 10000, Zagreb, Croatia
Dragan Gamberger
Leiden Institute of Advanced Computer Science, Leiden University,
Hendrik Blockeel
Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Zhang, C., Wang, S. (2003). Clustering in Knowledge Embedded Space. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds) Machine Learning: ECML 2003. ECML 2003. Lecture Notes in Computer Science(), vol 2837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39857-8_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-39857-8_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20121-2
Online ISBN: 978-3-540-39857-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Clustering in Knowledge Embedded Space

Abstract

Chapter PDF

Similar content being viewed by others

Learning with ℓ 1-Graph for High Dimensional Data Analysis

A New Model and Algorithm for Clustering

Efficient regularized spectral data embedding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Clustering in Knowledge Embedded Space

Abstract

Chapter PDF

Similar content being viewed by others

Learning with ℓ 1-Graph for High Dimensional Data Analysis

A New Model and Algorithm for Clustering

Efficient regularized spectral data embedding

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation