Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes

Li, Yan; Hung, Edward

doi:10.1007/978-3-642-05224-8_21

Yan Li²¹ &
Edward Hung²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5828))

Included in the following conference series:

Asian Conference on Machine Learning

2294 Accesses
5 Citations

Abstract

In this paper, a decision cluster forest classification model is proposed for high dimensional data with multiple classes. A decision cluster forest (DCF) consists of a set of decision cluster trees, in which the leaves of each tree are clusters labeled with the same class that determines the class of new objects falling in the clusters. By recursively calling a variable weighting k-means algorithm, a decision cluster tree can be generated from a subset of the training data that contains the objects in the same class. The set of m decision cluster trees grown from the subsets of m classes constitute the decision cluster forest. Anderson-Darling test is used to determine the stopping condition of tree growing. A DCF classification (DCFC) model is selected from all leaves of the m decision cluster trees in the forest. A series of experiments on both synthetic and real data sets have shown that the DCFC model performed better in accuracy and scalability than the single decision cluster tree method and the methods of k-NN, decision tree and SVM. This new model is particularly suitable for large, high dimensional data with many classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., Zaki, M.: What are the grand challenges for data mining? In: KDD 2006 panel report. SIGKDD Explorations, vol. 8, pp. 70–77 (2006)
Google Scholar
Li, Y., Hung, E., Chung, K., Huang, J.: Building a decision cluster classification model by a variable weighting k-means method. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 337–347. Springer, Heidelberg (2008)
Chapter Google Scholar
Huang, J., Ng, M., Rong, H., Li, Z.: Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 657–668 (2005)
Article Google Scholar
Jing, L., Huang, J., Ng, M.K., Rong, H.: A feature weighting approach to building classification models by interactive clustering. In: Torra, V., Narukawa, Y. (eds.) MDAI 2004. LNCS (LNAI), vol. 3131, pp. 284–294. Springer, Heidelberg (2004)
Google Scholar
Dietterich, T., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 263–286 (1995)
MATH Google Scholar
Anderson, T.W., Darling, D.A.: Asymptotic theory of certain ”goodness-of-fit” criteria based on stochastic processes. The Annals of Mathematical Statistics 23, 193–212 (1952)
Article MATH MathSciNet Google Scholar
Stephens, M.A.: Edf statistics for goodness of fit and some comparisons. Journal of the American Statistical Association 69, 730–737 (1974)
Article Google Scholar
Kyriakopoulou, A., Kalamboukis, T.: Text classification using clustering. In: ECML-PKDD Discovery Challenge Workshop Proceedings (2006)
Google Scholar
Zhang, B., Srihari, S.N.: Fast k-nearest neighbor classification using cluster-based trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 525–528 (2004)
Article Google Scholar
Mui, J., Fu, K.: Automated classification of nucleated blood cells using a binary tree classifier. IEEE Transactions on Pattern Analysis and Machine Intelligence 2, 429–443 (1980)
Google Scholar
Huang, Z., Ng, M., Lin, T., Cheung, D.: An interactive approach to building classification models by clustering and cluster validation. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 23–28. Springer, Heidelberg (2000)
Google Scholar
Huang, Z., Lin, T.: A visual method of cluster validation with fastmap. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 153–164. Springer, Heidelberg (2000)
Chapter Google Scholar
Blockeel, H., Raedt, L., Ramong, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University Hung Hom, Hong Kong
Yan Li & Edward Hung

Authors

Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Edward Hung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory for Novel Software Technology, Nanjing University, 22 Hankou Road, 210093, Nanjing, China
Zhi-Hua Zhou
The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567, Osaka, Ibaraki, Japan
Takashi Washio

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Hung, E. (2009). Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes. In: Zhou, ZH., Washio, T. (eds) Advances in Machine Learning. ACML 2009. Lecture Notes in Computer Science(), vol 5828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05224-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-05224-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05223-1
Online ISBN: 978-3-642-05224-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics