Abstract
Using a large national health database, we propose an enhanced SVM-based model called Hierarchical Clustering Support Vector Machine (HCSVM) that utilizes multiple levels of clusters to classify patients diagnosed with type-2 diabetes. Multiple HCSVMs are trained for clusters at different levels of the hierarchy. Some clusters at certain levels of the hierarchy capture more separable sample spaces than the others. As a result, HCSVMs at different levels may develop different classification capabilities. Since the locations of the superior SVMs are data dependent, the HCSVM model in this study takes advantage of an adaptive strategy to select the most suitable HCSVM for classifying the testing samples. This model solves the large data set problem inherent with the traditional single SVM model because the entire data set is partitioned into smaller and more homogenous clusters. Other approaches also use clustering and multiple SVM to solve the problem of large datasets. These approaches typical employed only one level of clusters. However, a single level of clusters may not provide an optimal partition of the sample space for SVM trainings. On the contrary, HCSVMs utilize multiple partitions available in a multilevel tree to capture a more separable sample space for SVM trainings. Compared with the traditional single SVM model and one-level multiple SVMs model, the HCSVM Model markedly improves the accuracy for classifying testing samples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, D.K.: Shrinkage estimator generalizations of proximal support vector machines. In: Proc. of the 8th ACM SIGKDD international conference of knowledge Discovery and data mining, Edmonton, Canada (2002)
Award, M., Khan, L., Bastani, F., Yen, I.: An Effective Support Vector Machines(SVMs) Performance Using Hierarchical Clustering. In: Proc. of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004) (2004)
Balcazar, J.L., Dai, Y., Watanabe, O.: Provably Fast Training Algorithms for Support Vector Machines. In: Proc. of the 1st IEEE International Conference on Data mining, pp. 43–50. IEEE Computer Society, Los Alamitos (2001)
Breault, J.L., Goodall, C.R., Fos, P.J.: Data Mining a Diabetic Data Warehouse. Artificial Intelligence in Medicine 26, 37–54 (2002)
Chang, C.C., Lin, C.J.: Training nu-support vector classifiers: Theory and algorithms. Neural Computations 13, 2119–2147 (2001)
Daniael, B., Cao, D.: Training Support Vector Machines Using Adaptive Clustering. In: Proc. of SIAM International Conference on Data Mining, Lake Buena Vista, FL, USA (2004)
Dowell, M.A., Rozell, B., Roth, D., Delugach, H., Chaloux, P., Dowell, J.: Economic and Clinical Disparities in Hospitalized Patients with Type-2 Diabetes. Journal of Nursing Scholarship 36, 66–72 (2004)
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Proc. Of IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285 (1997)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: advances in Kerenel Methods-Support Vector Learning, pp. 185–208 (1999)
Scholkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods-Support Vec-tor Learning. MIT Press, Cambridge, MA (1999)
US Department of Health and Human Services, Centers for Disease Control and Prevention: Prevalence of diabetes and impaired fasting glucose in adults-United States 1999–2000, Morbidity and Mortality Weekly Report 52, 833–835 (2003)
Valentini, G., Dietterich, T.G.: Low Bias Bagged Support vector Machines. In: Proc. of the 20th International Conference on Machine Learning ICML 2003, Washington D.C. USA, pp. 752–759 (2003)
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Inc., New York (1998)
Vavasis, S.A.: Nonlinear Optimization: Complexity Issues. Oxford Science, New York (1991)
Yao, Y.Y.: Perspectives of Granular Computing. In: IEEE Conference on Granular Computing (to appear, 2005)
Yu, H., Yang, J., Han, J.: Classifying Large Data sets Using SVMs with Hierarchical Clusters. In: Proc. Of the 9th ACM SIGKDD 2003 (2003)
Zagrovic, B., Pande, V.S.: How does averaging affect protein structure comparison on the ensemble level? Biophysical Journal 87, 2240–2246 (2004)
Zhong, W., He, J., Harrison, R., Tai, P.C., Pan, Y.: Clustering Support Vector Machines for Protein Local Structure Prediction. Expert Systems With Applications 32, 518–526 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhong, W., Chow, R., Stolz, R., He, J., Dowell, M. (2008). Hierarchical Clustering Support Vector Machines for Classifying Type-2 Diabetes Patients. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2008. Lecture Notes in Computer Science(), vol 4983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-79450-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-79450-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-79449-3
Online ISBN: 978-3-540-79450-9
eBook Packages: Computer ScienceComputer Science (R0)