A Feature Weighting Approach to Building Classification Models by Interactive Clustering
In using a classified data set to test clustering algorithms, the data points in a class are considered as one cluster (or more than one) in space. In this paper we adopt this principle to build classification models through interactively clustering a training data set to construct a tree of clusters. The leaf clusters of the tree are selected as decision clusters to classify new data based on a distance function. We consider the feature weights in calculating the distances between a new object and the center of a decision cluster. The new algorithm, W-k-means, is used to automatically calculate the feature weights from the training data. The Fastmap technique is used to handle outliers in selecting decision clusters. This step increases the stability of the classifier. Experimental results on public domain data sets have shown that the models built using this clustering approach outperformed some popular classification algorithms.
KeywordsDCC classification clustering data mining feature weight
Unable to display preview. Download preview PDF.
- 1.Huang, Z., Ng, M., Li, Z., Rong, H.: Automated variable weighting k-means type clustering (2003) (submitted)Google Scholar
- 2.Huang, Z., Lin, T.: A visual method of cluster validation with fastmap. In: PAKDD2000 (2000)Google Scholar
- 3.Blake, C., Merz, C.: uci repository of machine learning databases. Department of Information and Computer Science(1998), [Online]. Available:http://www.ics.uci.edu/m~learn/MLRepository.html
- 4.Mui, J., Fu, K.: Automated classification of nucleated blood cells using a binary tree classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence 2(5), 429–443 (1980)Google Scholar
- 5.Lin, Y., Fu, K.: Automatic classification of cervical cells using a binary tree classifier. Pattern Recognition 16(1), 68–80 (1983)Google Scholar
- 6.Ankerst, M., Elsen, C., Ester, M., Kriegel, H.-P.: Visual classification: An interactive approach to decision tree construction. In: 5th Proceeding of Knowledge Discovery and Data Mining (1999)Google Scholar
- 7.Quinlan, J.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
- 8.Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufman, San Francisco (1993)Google Scholar
- 9.Jain, A., Dubes, R.: Algorithm for clustering data. Prentice-Hall Advanced Reference Series (1988)Google Scholar
- 10.Faloulsos, C., Lin, K., 163–174: Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of ACM SIGMOD Conference, pp. 163–174 (1995)Google Scholar