Abstract
The clustering problem, which aims at identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity clusters, has been widely studied. Traditional clustering algorithms use distance functions to measure similarity and are not suitable for high dimensional spaces. In this paper, we propose CoFD algorithm, which is a non-distance based clustering algorithm for high dimensional spaces. Based on the maximum likelihood principle, CoFD is to optimize parameters to maximize the likelihood between data points and the modelgenerated by the parameters. Experimental results on both synthetic data sets and a realdata set show the efficiency and effectiveness of CoFD.
The project is supported in part by NIH Grants 5-P41-RR09283, RO1-AG18231, and P30-AG18254 and by NSF Grants EIA-0080124, NSF CCR-9701911, and DUE- 9980943.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Aggarwal and P. S. Yu. Finiding generalized projected clusters in high dimensionalspaces. In SIGMOOD-00, 2000.
C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. Soo Park. Fast algorithms for projected clustering. In ACM SIGMOD Conference, 1999.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering for high dimensional data for data mining applications. In SIGMOD-98, 1998.
J. S. Albus. A new approach to manipulator control: The cerebellar model articlatioon controller (CMAC). Trans. of the ASME, J. Dynamic Systems, Meaasurement, and Control, 97(3):220–227, sep 1975.
M. Berger and I. Rigoutsos. An algorithm for point clustering and grid generation. IEEE Trans. on Systems, Man and Cybernetics, 21(5):1278–1286, 1991.
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? In ICDT Conference, 1999.
M.R. Brito, E. Chavez, A. Quiroz, and J. Yukich. Connectivity of the mutualKNearest-Neighbor graph for clustering and outlier detection. Statistics and Probability Letters, 35:33–42, 1997.
P. Cheeseman, J. Kelly, and M. Self. AutoClass: A bayesian classification system. In ICML’88, 1988.
C-H Cheng, A. W-C Fu, and Y. Zhang. Entropy-based subspace clustering for mining numericaldata. In KDD-99, 1999.
P. A. Chou, T. Lookabaugh, and R. M. Gray. Entropy-constrained vector quantization. IEEE Trans., ASSP-37(1):31, 1989.
D. Fasulo. An analysis of recent work on clustering algorithms. Technical Report 01-03-02, U. of Washington, Dept. of Comp. Sci. & Eng., 1999.
Douglas H. Fisher. Iterative optimization and simplification of hierarchical clusterings. Technical Report CS-95-01, Vanderbilt U., Dept. of Comp. Sci., 1995.
S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large database. In Proceedings of the 1998 ACM SIGMOD Conference, 1998.
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
Michael I. Jordan. Graphical Models: Foundations of Neural Computation. MIT Press, 2001.
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper method: overfitting and dynamic search space technology. In KDD-95, 1995.
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD-96, 1996.
A. Strehl and J. Ghosh. A scalable approach to balanced, high-dimensional clustering of market-baskets. In HiPC-2000, 2000.
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In ACM SIGMOD Conference, 1996.
Shenghuo Zhu and Tao Li. An algorithm for non-distance based clustering in high dimensionalspaces. Technical Report 763, University of Rochester, Computer Science Department, Rochester, NY, 2002.
Shenghuo Zhu and Tao Li. A non-distance based clustering algorithm. In Proc. of IJCNN 2002, 2002. To appear.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, S., Li, T., Ogihara, M. (2002). CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces*. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_6
Download citation
DOI: https://doi.org/10.1007/3-540-46145-0_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive