Abstract
Clustering is an exploratory data analysis task. It aims to find the intrinsic structure of data by organizing data objects into similarity groups or clusters. It is often called unsupervised learning because no class labels denoting an a priori partition of the objects are given. This is in contrast with supervised learning (e.g., classification) for which the data objects are already labeled with known classes. Past research in clustering has produced many algorithms. However, these algorithms have some shortcomings. In this paper, we propose a novel clustering technique, which is based on a supervised learning technique called decision tree construction. The new technique is able to overcome many of these shortcomings. The key idea is to use a decision tree to partition the data space into cluster (or dense) regions and empty (or sparse) regions (which produce outliers and anomalies). We achieve this by introducing virtual data points into the space and then applying a modified decision tree algorithm for the purpose. The technique is able to find “natural” clusters in large high dimensional spaces efficiently. It is suitable for clustering in the full dimensional space as well as in subspaces. It also provides easily comprehensible descriptions of the resulting clusters. Experiments on both synthetic data and real-life data show that the technique is effective and also scales well for large high dimensional datasets.
Preview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Editor information
Rights and permissions
About this chapter
Cite this chapter
Liu, B., Xia, Y., Yu, P. Clustering Via Decision Tree Construction. In: Chu, W., Young Lin, T. (eds) Foundations and Advances in Data Mining. Studies in Fuzziness and Soft Computing, vol 180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11362197_5
Download citation
DOI: https://doi.org/10.1007/11362197_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25057-9
Online ISBN: 978-3-540-32393-8
eBook Packages: EngineeringEngineering (R0)