Hierarchical Clustering Support Vector Machines for Classifying Type-2 Diabetes Patients

  • Wei Zhong
  • Rick Chow
  • Richard Stolz
  • Jieyue He
  • Marsha Dowell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4983)


Using a large national health database, we propose an enhanced SVM-based model called Hierarchical Clustering Support Vector Machine (HCSVM) that utilizes multiple levels of clusters to classify patients diagnosed with type-2 diabetes. Multiple HCSVMs are trained for clusters at different levels of the hierarchy. Some clusters at certain levels of the hierarchy capture more separable sample spaces than the others. As a result, HCSVMs at different levels may develop different classification capabilities. Since the locations of the superior SVMs are data dependent, the HCSVM model in this study takes advantage of an adaptive strategy to select the most suitable HCSVM for classifying the testing samples. This model solves the large data set problem inherent with the traditional single SVM model because the entire data set is partitioned into smaller and more homogenous clusters. Other approaches also use clustering and multiple SVM to solve the problem of large datasets. These approaches typical employed only one level of clusters. However, a single level of clusters may not provide an optimal partition of the sample space for SVM trainings. On the contrary, HCSVMs utilize multiple partitions available in a multilevel tree to capture a more separable sample space for SVM trainings. Compared with the traditional single SVM model and one-level multiple SVMs model, the HCSVM Model markedly improves the accuracy for classifying testing samples.


Hierarchical Clustering Support Vector Machines Classification Clustering Algorithm Type-2 Diabetes 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, D.K.: Shrinkage estimator generalizations of proximal support vector machines. In: Proc. of the 8th ACM SIGKDD international conference of knowledge Discovery and data mining, Edmonton, Canada (2002)Google Scholar
  2. 2.
    Award, M., Khan, L., Bastani, F., Yen, I.: An Effective Support Vector Machines(SVMs) Performance Using Hierarchical Clustering. In: Proc. of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004) (2004)Google Scholar
  3. 3.
    Balcazar, J.L., Dai, Y., Watanabe, O.: Provably Fast Training Algorithms for Support Vector Machines. In: Proc. of the 1st IEEE International Conference on Data mining, pp. 43–50. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  4. 4.
    Breault, J.L., Goodall, C.R., Fos, P.J.: Data Mining a Diabetic Data Warehouse. Artificial Intelligence in Medicine 26, 37–54 (2002)CrossRefGoogle Scholar
  5. 5.
    Chang, C.C., Lin, C.J.: Training nu-support vector classifiers: Theory and algorithms. Neural Computations 13, 2119–2147 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Daniael, B., Cao, D.: Training Support Vector Machines Using Adaptive Clustering. In: Proc. of SIAM International Conference on Data Mining, Lake Buena Vista, FL, USA (2004)Google Scholar
  7. 7.
    Dowell, M.A., Rozell, B., Roth, D., Delugach, H., Chaloux, P., Dowell, J.: Economic and Clinical Disparities in Hospitalized Patients with Type-2 Diabetes. Journal of Nursing Scholarship 36, 66–72 (2004)CrossRefGoogle Scholar
  8. 8.
    Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Proc. Of IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285 (1997)Google Scholar
  9. 9.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  10. 10.
    Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: advances in Kerenel Methods-Support Vector Learning, pp. 185–208 (1999)Google Scholar
  11. 11.
    Scholkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods-Support Vec-tor Learning. MIT Press, Cambridge, MA (1999)Google Scholar
  12. 12.
    US Department of Health and Human Services, Centers for Disease Control and Prevention: Prevalence of diabetes and impaired fasting glucose in adults-United States 1999–2000, Morbidity and Mortality Weekly Report 52, 833–835 (2003) Google Scholar
  13. 13.
    Valentini, G., Dietterich, T.G.: Low Bias Bagged Support vector Machines. In: Proc. of the 20th International Conference on Machine Learning ICML 2003, Washington D.C. USA, pp. 752–759 (2003)Google Scholar
  14. 14.
    Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Inc., New York (1998)zbMATHGoogle Scholar
  15. 15.
    Vavasis, S.A.: Nonlinear Optimization: Complexity Issues. Oxford Science, New York (1991)zbMATHGoogle Scholar
  16. 16.
    Yao, Y.Y.: Perspectives of Granular Computing. In: IEEE Conference on Granular Computing (to appear, 2005)Google Scholar
  17. 17.
    Yu, H., Yang, J., Han, J.: Classifying Large Data sets Using SVMs with Hierarchical Clusters. In: Proc. Of the 9th ACM SIGKDD 2003 (2003)Google Scholar
  18. 18.
    Zagrovic, B., Pande, V.S.: How does averaging affect protein structure comparison on the ensemble level? Biophysical Journal 87, 2240–2246 (2004)CrossRefGoogle Scholar
  19. 19.
    Zhong, W., He, J., Harrison, R., Tai, P.C., Pan, Y.: Clustering Support Vector Machines for Protein Local Structure Prediction. Expert Systems With Applications 32, 518–526 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Wei Zhong
    • 1
  • Rick Chow
    • 1
  • Richard Stolz
    • 3
  • Jieyue He
    • 4
  • Marsha Dowell
    • 2
  1. 1.Divison of Math and Computer Science 
  2. 2.School of Nursing 
  3. 3.School of Business Administration and EconomicsUniversity of South Carolina UpstateSpartanburgUSA
  4. 4.School of Computer Science and EngineeringSoutheast UniversityNanjingChina

Personalised recommendations