CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces*

Zhu, Shenghuo; Li, Tao; Ogihara, Mitsuonri

doi:10.1007/3-540-46145-0_6

Shenghuo Zhu⁷,
Tao Li⁷ &
Mitsuonri Ogihara⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1266 Accesses
4 Citations

Abstract

The clustering problem, which aims at identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity clusters, has been widely studied. Traditional clustering algorithms use distance functions to measure similarity and are not suitable for high dimensional spaces. In this paper, we propose CoFD algorithm, which is a non-distance based clustering algorithm for high dimensional spaces. Based on the maximum likelihood principle, CoFD is to optimize parameters to maximize the likelihood between data points and the modelgenerated by the parameters. Experimental results on both synthetic data sets and a realdata set show the efficiency and effectiveness of CoFD.

The project is supported in part by NIH Grants 5-P41-RR09283, RO1-AG18231, and P30-AG18254 and by NSF Grants EIA-0080124, NSF CCR-9701911, and DUE- 9980943.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Aggarwal and P. S. Yu. Finiding generalized projected clusters in high dimensionalspaces. In SIGMOOD-00, 2000.
Google Scholar
C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. Soo Park. Fast algorithms for projected clustering. In ACM SIGMOD Conference, 1999.
Google Scholar
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering for high dimensional data for data mining applications. In SIGMOD-98, 1998.
Google Scholar
J. S. Albus. A new approach to manipulator control: The cerebellar model articlatioon controller (CMAC). Trans. of the ASME, J. Dynamic Systems, Meaasurement, and Control, 97(3):220–227, sep 1975.
MATH Google Scholar
M. Berger and I. Rigoutsos. An algorithm for point clustering and grid generation. IEEE Trans. on Systems, Man and Cybernetics, 21(5):1278–1286, 1991.
Article Google Scholar
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? In ICDT Conference, 1999.
Google Scholar
M.R. Brito, E. Chavez, A. Quiroz, and J. Yukich. Connectivity of the mutualKNearest-Neighbor graph for clustering and outlier detection. Statistics and Probability Letters, 35:33–42, 1997.
Article MATH MathSciNet Google Scholar
P. Cheeseman, J. Kelly, and M. Self. AutoClass: A bayesian classification system. In ICML’88, 1988.
Google Scholar
C-H Cheng, A. W-C Fu, and Y. Zhang. Entropy-based subspace clustering for mining numericaldata. In KDD-99, 1999.
Google Scholar
P. A. Chou, T. Lookabaugh, and R. M. Gray. Entropy-constrained vector quantization. IEEE Trans., ASSP-37(1):31, 1989.
MathSciNet Google Scholar
D. Fasulo. An analysis of recent work on clustering algorithms. Technical Report 01-03-02, U. of Washington, Dept. of Comp. Sci. & Eng., 1999.
Google Scholar
Douglas H. Fisher. Iterative optimization and simplification of hierarchical clusterings. Technical Report CS-95-01, Vanderbilt U., Dept. of Comp. Sci., 1995.
Google Scholar
S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large database. In Proceedings of the 1998 ACM SIGMOD Conference, 1998.
Google Scholar
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.
Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
Google Scholar
Michael I. Jordan. Graphical Models: Foundations of Neural Computation. MIT Press, 2001.
Google Scholar
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper method: overfitting and dynamic search space technology. In KDD-95, 1995.
Google Scholar
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD-96, 1996.
Google Scholar
A. Strehl and J. Ghosh. A scalable approach to balanced, high-dimensional clustering of market-baskets. In HiPC-2000, 2000.
Google Scholar
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In ACM SIGMOD Conference, 1996.
Google Scholar
Shenghuo Zhu and Tao Li. An algorithm for non-distance based clustering in high dimensionalspaces. Technical Report 763, University of Rochester, Computer Science Department, Rochester, NY, 2002.
Google Scholar
Shenghuo Zhu and Tao Li. A non-distance based clustering algorithm. In Proc. of IJCNN 2002, 2002. To appear.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Rochester, 14620, Rochester, NY
Shenghuo Zhu, Tao Li & Mitsuonri Ogihara

Authors

Shenghuo Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuonri Ogihara
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, 606-8501, Kyoto, Japan
Yahiko Kambayashi
Institute for Computer Science and Business Informatics, University of Vienna, Liebiggasse 4, 1010, Vienna, Austria
Werner Winiwarter
Center for Spatial Information Science (CSIS), University of Tokyo, 4-6-1, Komaba, Meguro-ku, 153-8904, Tokyo, Japan
Masatoshi Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, S., Li, T., Ogihara, M. (2002). CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces^*. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_6

Download citation

DOI: https://doi.org/10.1007/3-540-46145-0_6
Published: 02 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces^*

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces*

Abstract

Access this chapter

Preview

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces^*