Advertisement

A Feature Weighting Approach to Building Classification Models by Interactive Clustering

  • Liping Jing
  • Joshua Huang
  • Michael K. Ng
  • Hongqiang Rong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3131)

Abstract

In using a classified data set to test clustering algorithms, the data points in a class are considered as one cluster (or more than one) in space. In this paper we adopt this principle to build classification models through interactively clustering a training data set to construct a tree of clusters. The leaf clusters of the tree are selected as decision clusters to classify new data based on a distance function. We consider the feature weights in calculating the distances between a new object and the center of a decision cluster. The new algorithm, W-k-means, is used to automatically calculate the feature weights from the training data. The Fastmap technique is used to handle outliers in selecting decision clusters. This step increases the stability of the classifier. Experimental results on public domain data sets have shown that the models built using this clustering approach outperformed some popular classification algorithms.

Keywords

DCC classification clustering data mining feature weight 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Huang, Z., Ng, M., Li, Z., Rong, H.: Automated variable weighting k-means type clustering (2003) (submitted)Google Scholar
  2. 2.
    Huang, Z., Lin, T.: A visual method of cluster validation with fastmap. In: PAKDD2000 (2000)Google Scholar
  3. 3.
    Blake, C., Merz, C.: uci repository of machine learning databases. Department of Information and Computer Science(1998), [Online]. Available:http://www.ics.uci.edu/m~learn/MLRepository.html
  4. 4.
    Mui, J., Fu, K.: Automated classification of nucleated blood cells using a binary tree classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence 2(5), 429–443 (1980)Google Scholar
  5. 5.
    Lin, Y., Fu, K.: Automatic classification of cervical cells using a binary tree classifier. Pattern Recognition 16(1), 68–80 (1983)Google Scholar
  6. 6.
    Ankerst, M., Elsen, C., Ester, M., Kriegel, H.-P.: Visual classification: An interactive approach to decision tree construction. In: 5th Proceeding of Knowledge Discovery and Data Mining (1999)Google Scholar
  7. 7.
    Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  8. 8.
    Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufman, San Francisco (1993)Google Scholar
  9. 9.
    Jain, A., Dubes, R.: Algorithm for clustering data. Prentice-Hall Advanced Reference Series (1988)Google Scholar
  10. 10.
    Faloulsos, C., Lin, K., 163–174: Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of ACM SIGMOD Conference, pp. 163–174 (1995)Google Scholar
  11. 11.
    Bezdek, J.C.: A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-2, 1–8 (1980)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Liping Jing
    • 1
  • Joshua Huang
    • 2
  • Michael K. Ng
    • 1
  • Hongqiang Rong
    • 2
  1. 1.Department of MathematicsThe University of Hong KongHong KongChina
  2. 2.E-Business Technology InstituteThe University of Hong KongHong KongChina

Personalised recommendations