Skip to main content

A Hybrid Model of Clustering and Classification to Enhance the Performance of a Classifier

  • Conference paper
  • First Online:
Book cover Advanced Informatics for Computing Research (ICAICR 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1076))

  • 657 Accesses

Abstract

Clustering and Classification are significant and widely used task in data mining. Their incorporation together is rare. When we integrate them together they can give more promising, accurate and robust results compare to - unaccompanied. The integration of these methods can be done by an ensemble method or hybrid method. This paper uses a hybrid model; K-means clustering method for the preprocessing of the data. Pre-learning by K-means clustering keeps similar cases in the same group. This improves the on-hand classifier’s performance. To demonstrate applicability of this new hybrid approach the experiments on PIMA diabetic datasets from UCI repository were conducted and the results are compared on several parameters. Clustering before classification provides an added description to the data and improves the effectiveness of the classification task. This model can be deployed with any classification algorithms to improve its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kauffmann Publishers, San Francisco (2001)

    MATH  Google Scholar 

  2. Elrahman, S.M.A., Abraham, A.: A review of class imbalance problem. J. Netw. Innovative Comput. 1, 332–340 (2013). ISSN 2160-2174

    Google Scholar 

  3. Karegowda, A.G., et al.: Cascading K-means clustering and K-nearest neighbor classifier for categorization of diabetic patients. Int. J. Eng. Adv. Technol. (IJEAT) 1(3), 147–151 (2012). ISSN: 2249 – 8958

    Google Scholar 

  4. Kyriakopoulou, A.: Text classification aided by clustering: a literature review. In: Fritzsche, P. (ed.) Tools in Artificial Intelligence (2008). ISBN: 978-953-7619-03-9

    Google Scholar 

  5. Zeng, H.-J., et al.: CBC: clustering based text classification requiring minimal labeled. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM 2003). IEEE (2003)

    Google Scholar 

  6. Zehra, A.: A comparative study on the pre-processing and mining of Pima Indian Diabetes Dataset. In: ICSEC 2014: The International Computer Science and Engineering Conference (ICSEC), pp. 1–10 (2014)

    Google Scholar 

  7. Shekhar, R., et al.: K-means + ID3: a novel method for supervised anomaly detection by cascading K-means clustering and ID3 decision tree learning methods. IEEE Trans. Knowl. Data Eng. 19(3), 345–354 (2007)

    Article  Google Scholar 

  8. Buana, P.W., Jannet, S.L., et al.: Combination of K-nearest neighbor and K-means based on term re-weighting for classify Indonesian news. Int. J. Comput. Appl. 50(11), 37–42 (2012)

    Google Scholar 

  9. Ahmed, M.S., Khan, L.: SISC: a text classification approach using semi-supervised subspace clustering. In: 2009 IEEE International Conference on Data Mining Workshops (2009)

    Google Scholar 

  10. López, M.I., Luna, J.M., Romero, C., Ventura, S.: Classification via clustering for predicting final marks based on student participation in forums. In: Proceedings of the 5th International Conference on Educational Data Mining (2012)

    Google Scholar 

  11. Kyriakopoulou, A., Kalamboukis, T.: Combining clustering with classification for spam detection in social bookmarking systems. In: ECML/PKDD 2008 Discovery Challenge (2008)

    Google Scholar 

  12. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: 23rd International Conference on Machine Learning, Pittsburgh, PA (2006)

    Google Scholar 

  13. Sumana, B.V., Santhanam, T.: Prediction of diseases by cascading clustering and classification. In: International Conference on Advances in Electronics, Computers, and Communications (ICAECC). IEEE (2014)

    Google Scholar 

  14. Yong, Z., Li, Y., Shixiong, X.: An improved KNN text classification algorithm based on clustering. J. Comput. 4(3), 230–237 (2009)

    Google Scholar 

  15. Breault, J.L.: Data mining diabetic databases: are rough sets a useful addition? (2001). http://www.galaxy.gmu.edu/interface/I01/I2001Proceedings/Jbreault

  16. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  Google Scholar 

  17. Witten, I.H., et al.: Weka: practical machine learning tools and techniques with Java implementations. (Working paper 99/11). Department of Computer Science, University of Waikato, Hamilton, New Zealand (1999)

    Google Scholar 

  18. loizou, G., Maybank, S.J.: The nearest neighbor and the bayes error rates. IEEE Trans. Pattern Anal. Mach. Learn. 9, 254–263 (1987)

    Article  Google Scholar 

  19. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering a review. ACM Comput. Surv. (CSUR) 31, 264–323 (1999)

    Article  Google Scholar 

  20. UCI machine learning repository. http://archive.ics.uci.edu/ml

  21. Weka Data mining with open source machine learning software. http://www.cs.waikato.ac.nz/ml/weka/

  22. Fayyad, U.M., Smyth, P.: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Menlo Park (1996)

    Google Scholar 

  23. Boudour, M., Hellal, A.: Combined use of supervised and unsupervised learning for power system dynamic security mapping. Eng. Appl. Artif. Intell. 18, 673–683 (2005)

    Article  Google Scholar 

  24. King, R.D., Feng, C., Sutherland, A.: Comparison of classification algorithms on large real-world problems. Appl. Artif. Intell. 9(3), 289–333 (1995)

    Article  Google Scholar 

  25. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)

    Book  Google Scholar 

  26. Lim, T., Loh, W., Shih, Y.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203–228 (2000)

    Article  Google Scholar 

  27. Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. In: Proceedings of the ACM-SIGMOD International Conference Management of Data (SIGMOD 1998), pp. 73–84 (1998)

    Article  Google Scholar 

  28. EL-Manzalawy, Y., Honavar, V.: LSVM: integrating LibSVM into Weka environment (2005). http://www.cs.iastate.edu/~yasser/wlsvm

  29. Rastogi, R., Shim, K.: Public: a decision tree classifier that integrates building and pruning. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 404–415 (1998)

    Google Scholar 

  30. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014141

    Chapter  Google Scholar 

  31. Li, Y., Hung, E., Chung, K., Huang, J.: Building a decision cluster classification model for high dimensional data by a variable weighting k-means method. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 337–347. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89378-3_33

    Chapter  Google Scholar 

  32. Mac Queen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium Mathematical Statistics, pp. 281–297 (1967)

    Google Scholar 

  33. Kaur, G., Chhabra, A.: Improved J48 classification algorithm for the prediction of diabetes. Int. J. Comput. Appl. (0975 – 8887) 98(22), 13–17 (2014)

    Google Scholar 

  34. Ashwin Kumar, U.M., Ananda Kumar, KR.: Predicting early detection of cardiac and diabetes symptoms using data mining techniques. In: IEEE, pp. 161–165 (2011)

    Google Scholar 

  35. http://www.cs.waikato.ac.nz/ml/weka/

  36. http://transact.dl.sourceforge.net/sourcefor

  37. Hardin, J.M., Chhieng, D.C.: Data mining and clinical decision support systems. In: Hannah, K.J., Ball, M.J. (eds.) Clinical Decision Support Systems. Health Informatics. Springer, Cham (2007). https://doi.org/10.1007/978-0-387-38319-4_3

    Chapter  Google Scholar 

  38. Pao, Y., Sobajic, D.J.: Combined use of unsupervised and supervised learning for dynamic security assessment. Trans. Power Syst. 7(2), 878–884 (1992)

    Article  Google Scholar 

  39. Smuc, T., Gamberger, D., Krstacic, G.: Combining unsupervised and supervised machine learning in analysis of the CHD patient database. In: Quaglini, S., Barahona, P., Andreassen, S. (eds.) AIME 2001. LNCS, vol. 2101, pp. 109–112. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-48229-6_14

    Chapter  Google Scholar 

  40. Delen, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artif. Intell. Med. 34, 113–127 (2005)

    Article  Google Scholar 

  41. Namburu, S.M., Tu, H., Luo, J., Pattipati, K.R.: Experiments on supervised learning algorithms for text categorization. In: 2005 IEEE Aerospace Conference (2005)

    Google Scholar 

  42. Huang, A.: Similarity measures for text document clustering. In: The New Zealand Computer Science Research Student Conference (2008)

    Google Scholar 

  43. Kesavaraj, G., Sukumaran, S.: A study on classification techniques in data mining. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) (2013)

    Google Scholar 

  44. Smitha, T., Sundaram, V.: Comparative study of data mining algorithms for high dimensional data analysis. Int. J. Adv. Eng. Technol. 4, 173 (2012). IJAET ISSN: 2231-1963

    Google Scholar 

  45. Bhargavi, P., Jyothi, S.: Soil classification using data mining techniques: a comparative study. Int. J. Eng. Trends Technol. 2 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subodhini Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gupta, S., Parekh, B., Jivani, A. (2019). A Hybrid Model of Clustering and Classification to Enhance the Performance of a Classifier. In: Luhach, A., Jat, D., Hawari, K., Gao, XZ., Lingras, P. (eds) Advanced Informatics for Computing Research. ICAICR 2019. Communications in Computer and Information Science, vol 1076. Springer, Singapore. https://doi.org/10.1007/978-981-15-0111-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0111-1_34

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0110-4

  • Online ISBN: 978-981-15-0111-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics