Advertisement

A New Approach for Tuned Clustering Analysis

  • Roni Ben IshayEmail author
  • Maya Herman
  • Chaim Yosefy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10934)

Abstract

In this work, we present a new data mining (DM) approach (called tuned clustering analysis), which integrates clustering, and tuned clustering analysis. Usually, clusters which contain borderline results may be dismissed or ignored during the analysis stage. As a result, hidden insights that may be represented by these clusters, may not be revealed. This may harm the overall DM quality and especially, important hidden insights may be uncovered. Our new approach offers an iterative process which assist the data miner to make appropriate analysis decisions, and avoid dismissing possible insights. The idea is to apply an iterative DM process: clustering, analyzing, presenting new insights, or tuning and re-clustering those clusters which have borderline values. Clusters with borderline values are chosen and a new sub-database is built. Then, the sub-database is split, based on the attribute with the highest Entropy value. The tuning iterations, continues until new insights were found, or if the clusters quality are below a certain threshold. We demonstrated the tuned clustering analysis on real Echo heart measurements, using km-Impute clustering algorithm. During the implementation, initial clusters were produced. Although the quality of the clusters was high, no new medical insights were revealed. Therefore, we applied a clustering tuning and succeeded in finding new medical insights such as the influence of gender and the age on cardiac functioning and clinical modifications, with regard to resilience to diastolic disorder. Applying our approach has successfully managed to reveal new medical insights which were restored from borderline value clusters. This stands in contrast to traditional analysis methods, in which these potential insights may be missed or ignored.

Keywords

Data mining Clustering Clustering analysis Imputation Missing values Medical data mining 

References

  1. 1.
    Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)zbMATHGoogle Scholar
  2. 2.
    Srinivas, K., Rani, B.K., Govrdhan, A.: Applications of data mining techniques in healthcare and prediction of heart attacks. Int. J. Comput. Sci. Eng. (IJCSE) 2(02), 250–255 (2010)Google Scholar
  3. 3.
    Ben Ishay, R., Herman, M.: A novel algorithm for the integration of the imputation of missing values and clustering. In: Perner, P. (ed.) MLDM 2015. LNCS (LNAI), vol. 9166, pp. 115–129. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-21024-7_8CrossRefGoogle Scholar
  4. 4.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml. Accessed 1 May 2013
  5. 5.
    Kremer, H., et al.: An effective evaluation measure for clustering on evolving data streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 868–876. ACM, San Diego (2011)Google Scholar
  6. 6.
    Na, Y., et al.: HS-measure: a hybrid clustering validity measure to interpret road traffic data. In: Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools, pp. 274–280. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Paris (2011)Google Scholar
  7. 7.
    Guo, A.: A new framework for clustering algorithm evaluation in the domain of functional genomics. In: Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 143–146. ACM, Nicosia (2004)Google Scholar
  8. 8.
    Tsipouras, M.G., et al.: Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans. Inf. Technol. Biomed. 12(4), 447–458 (2008)CrossRefGoogle Scholar
  9. 9.
    Soni, J., et al.: Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int. J. Comput. Appl. 17(8), 43–48 (2011)Google Scholar
  10. 10.
    Palaniappan, S., Awang, R.: Intelligent heart disease prediction system using data mining techniques. In: IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008. IEEE (2008)Google Scholar
  11. 11.
    Bhatla, N., Jyoti, K.: An analysis of heart disease prediction using different data mining techniques. Int. J. Eng. 1(8), 1–4 (2012)Google Scholar
  12. 12.
    Anbarasi, M., Anupriya, E., Iyengar, N.: Enhanced prediction of heart disease with feature subset selection using genetic algorithm. Int. J. Eng. Sci. Technol. 2(10), 5370–5376 (2010)Google Scholar
  13. 13.
    Wosiak, A., Zakrzewska, D.: On integrating clustering and statistical analysis for supporting cardiovascular disease diagnosis. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE (2015)Google Scholar
  14. 14.
    Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5–6), 375–381 (2003)CrossRefGoogle Scholar
  15. 15.
    Chobanian, A.V., et al.: The seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure: the JNC 7 report. JAMA 289(19), 2560–2571 (2003)CrossRefGoogle Scholar
  16. 16.
    Zhao, R., et al.: Influences of age, gender, and circadian rhythm on deceleration capacity in subjects without evident heart diseases. Ann. Noninvasive Electrocardiol. 20(2), 158–166 (2015)CrossRefGoogle Scholar
  17. 17.
    Adams, K.F., et al.: Relation between gender, etiology and survival in patients with symptomatic heart failure. J. Am. Coll. Cardiol. 28(7), 1781–1788 (1996)CrossRefGoogle Scholar
  18. 18.
    Leinwand, L.A.: Gender is a potent modifier of the cardiovascular system. J. Clin. Invest. 112(3), 302–307 (2003)CrossRefGoogle Scholar
  19. 19.
    Karavidas, A., et al.: Aging and the cardiovascular system. Hell. J. Cardiol. 51(5), 421–427 (2010)Google Scholar
  20. 20.
    Mirkin, B.: Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/CRC Computer Science). Chapman & Hall/CRC (2005)Google Scholar
  21. 21.
    Gandrud, C.: Reproducible research with R and R studio. Chapman and Hall/CRC (2016)Google Scholar
  22. 22.
    RStudio: An open source statistical language (2017). https://www.rstudio.com

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.The Open University of IsraelRaananaIsrael
  2. 2.The Barzili Medical Center CampusBen-Gurion UniversityAshkelonIsrael

Personalised recommendations