Skip to main content

CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces*

  • Conference paper
  • First Online:
Data Warehousing and Knowledge Discovery (DaWaK 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2454))

Included in the following conference series:

Abstract

The clustering problem, which aims at identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity clusters, has been widely studied. Traditional clustering algorithms use distance functions to measure similarity and are not suitable for high dimensional spaces. In this paper, we propose CoFD algorithm, which is a non-distance based clustering algorithm for high dimensional spaces. Based on the maximum likelihood principle, CoFD is to optimize parameters to maximize the likelihood between data points and the modelgenerated by the parameters. Experimental results on both synthetic data sets and a realdata set show the efficiency and effectiveness of CoFD.

The project is supported in part by NIH Grants 5-P41-RR09283, RO1-AG18231, and P30-AG18254 and by NSF Grants EIA-0080124, NSF CCR-9701911, and DUE- 9980943.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Aggarwal and P. S. Yu. Finiding generalized projected clusters in high dimensionalspaces. In SIGMOOD-00, 2000.

    Google Scholar 

  2. C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. Soo Park. Fast algorithms for projected clustering. In ACM SIGMOD Conference, 1999.

    Google Scholar 

  3. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering for high dimensional data for data mining applications. In SIGMOD-98, 1998.

    Google Scholar 

  4. J. S. Albus. A new approach to manipulator control: The cerebellar model articlatioon controller (CMAC). Trans. of the ASME, J. Dynamic Systems, Meaasurement, and Control, 97(3):220–227, sep 1975.

    MATH  Google Scholar 

  5. M. Berger and I. Rigoutsos. An algorithm for point clustering and grid generation. IEEE Trans. on Systems, Man and Cybernetics, 21(5):1278–1286, 1991.

    Article  Google Scholar 

  6. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? In ICDT Conference, 1999.

    Google Scholar 

  7. M.R. Brito, E. Chavez, A. Quiroz, and J. Yukich. Connectivity of the mutualKNearest-Neighbor graph for clustering and outlier detection. Statistics and Probability Letters, 35:33–42, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  8. P. Cheeseman, J. Kelly, and M. Self. AutoClass: A bayesian classification system. In ICML’88, 1988.

    Google Scholar 

  9. C-H Cheng, A. W-C Fu, and Y. Zhang. Entropy-based subspace clustering for mining numericaldata. In KDD-99, 1999.

    Google Scholar 

  10. P. A. Chou, T. Lookabaugh, and R. M. Gray. Entropy-constrained vector quantization. IEEE Trans., ASSP-37(1):31, 1989.

    MathSciNet  Google Scholar 

  11. D. Fasulo. An analysis of recent work on clustering algorithms. Technical Report 01-03-02, U. of Washington, Dept. of Comp. Sci. & Eng., 1999.

    Google Scholar 

  12. Douglas H. Fisher. Iterative optimization and simplification of hierarchical clusterings. Technical Report CS-95-01, Vanderbilt U., Dept. of Comp. Sci., 1995.

    Google Scholar 

  13. S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clustering algorithm for large database. In Proceedings of the 1998 ACM SIGMOD Conference, 1998.

    Google Scholar 

  14. Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2000.

    Google Scholar 

  15. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.

    Google Scholar 

  16. Michael I. Jordan. Graphical Models: Foundations of Neural Computation. MIT Press, 2001.

    Google Scholar 

  17. R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper method: overfitting and dynamic search space technology. In KDD-95, 1995.

    Google Scholar 

  18. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD-96, 1996.

    Google Scholar 

  19. A. Strehl and J. Ghosh. A scalable approach to balanced, high-dimensional clustering of market-baskets. In HiPC-2000, 2000.

    Google Scholar 

  20. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In ACM SIGMOD Conference, 1996.

    Google Scholar 

  21. Shenghuo Zhu and Tao Li. An algorithm for non-distance based clustering in high dimensionalspaces. Technical Report 763, University of Rochester, Computer Science Department, Rochester, NY, 2002.

    Google Scholar 

  22. Shenghuo Zhu and Tao Li. A non-distance based clustering algorithm. In Proc. of IJCNN 2002, 2002. To appear.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, S., Li, T., Ogihara, M. (2002). CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces*. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-46145-0_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44123-6

  • Online ISBN: 978-3-540-46145-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics