Medical Data Mining for Heart Diseases and the Future of Sequential Mining in Medical Field

  • Carine Bou Rjeily
  • Georges BadrEmail author
  • Amir Hajjarm El Hassani
  • Emmanuel Andres
Part of the Intelligent Systems Reference Library book series (ISRL, volume 149)


Data Mining in general is the act of extracting interesting patterns and discovering non-trivial knowledge from a large amount of data. Medical data mining can be used to understand the events happened in the past, i.e. studying a patients vital signs to understand his complications and discover why he has died, or to predict the future by analyzing the events that had happened. In this chapter we are presenting an overview on studies that use data mining to predict heart failure and heart diseases classes. We will also focus on one of the trendiest data-mining field, namely the Sequential Mining, which is a very promising paradigm. Due to its important results in many fields, this chapter will also cover all its extensions from Sequential Pattern Mining, to Sequential Rule Mining and Sequence Prediction. Pattern Mining is the discovery of important and unexpected patterns or information and was introduced in 1990 with the well-known Apriori. Sequential Patterns Mining aims to extract and analyze frequent subsequences from sequences of events or items with time constraint. The importance of a sequence can be measured based on different factors such as the frequency of their occurrence, their length and their profit. In 1995, Agrawal et al. introduced a new Apriori algorithm supporting time constraints named AprioriAll. The algorithm studied the transactions through time, in order to extract frequent patterns from the sequences of products related to a customer. Time dimension is a very important factor in analyzing medical data, making it necessary to present a positioning of Sequential Mining in the medical domain.


Data mining Healthcare Heart disease Sequential pattern mining Algorithms 


  1. 1.
    Ponikowski, P., Voors, A.A., Anker, S.D., Bueno, H., Cleland, J.G., Coats, A.J., Falk, V., González-Juanatey, J.R., Harjola, V.P., Jankowska, E.A., et al.: 2016 ESC guidelines for the diagnosis and treatment of acute and chronic heart failure: the task force for the diagnosis and treatment of acute and chronic heart failure of the European society of cardiology (ESC) developed with the special contribution of the heart failure association (HFA) of the ESC. Eur. Heart J. 37(27), 2129–2200 (2016)CrossRefGoogle Scholar
  2. 2.
    Aljaaf, A., Al-Jumeily, D., Hussain, A., Dawson, T., Fergus, P., Al-Jumaily, M.: Predicting the likelihood of heart failure with a multi level risk assessment using decision tree. In: 2015 Third International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), pp. 101–106. IEEE (2015)Google Scholar
  3. 3.
    Cowie, M.: The Heart Failure Epidemic. Medicographia (2012)Google Scholar
  4. 4.
    Son, C.S., Kim, Y.N., Kim, H.S., Park, H.S., Kim, M.S.: Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. J. Biomed. Inform. 45(5), 999–1008 (2012)CrossRefGoogle Scholar
  5. 5.
    Roger, V.L.: The heart failure epidemic. Int. J. Environ. Res. Public Health 7(4), 1807–1830 (2010)CrossRefGoogle Scholar
  6. 6.
    Hartmann, C., Varshney, P., Mehrotra, K., Gerberich, C.: Application of information theory to the construction of efficient decision trees. IEEE Trans. Inf. Theory 28(4), 565–577 (1982)CrossRefGoogle Scholar
  7. 7.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier (2014)Google Scholar
  8. 8.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  9. 9.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  10. 10.
    Murty, M.N., Devi, V.S.: Bayes Classifier, pp. 86–102. Springer, London (2011)Google Scholar
  11. 11.
    Haykin, S.: Neural Networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River, NJ, USA (1998)zbMATHGoogle Scholar
  12. 12.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press (1984)Google Scholar
  13. 13.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  14. 14.
    Zadeh, L.A.: Fuzzy sets. In: Lotfi A.Z. (ed.) Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems, pp. 394–432. World Scientific (1996)Google Scholar
  15. 15.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a K-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  16. 16.
    Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)Google Scholar
  17. 17.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRefGoogle Scholar
  18. 18.
    Suganya, S., Selvy, P.T.: A proficient heart diseases prediction method using fuzzy-cart algorithm. Int. J. Sci. Eng. Appl. Sci. 2(1) (2016)Google Scholar
  19. 19.
    Masetic, Z., Subasi, A.: Congestive heart failure detection using random forest classifier. Comput. Methods Progr. Biomed. 130, 54–64 (2016)CrossRefGoogle Scholar
  20. 20.
    Goldberger, A.L., Amaral, L.A., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E.: Physiobank, physiotoolkit, and physionet. Circulation 101(23), e215–e220 (2000)CrossRefGoogle Scholar
  21. 21.
    Pandey, A.K., Pandey, P., Jaiswal, K.: A heart disease prediction model using decision tree. IUP J. Comput. Sci. 7(3), 43 (2013)Google Scholar
  22. 22.
    UCI-Repository: Heart disease dataset, Center for Machine Learning and Intelligent Systems.
  23. 23.
    Bashir, S., Qamar, U., Javed, M.Y.: An ensemble based decision support framework for intelligent heart disease diagnosis. In: 2014 International Conference on Information Society (i-Society), pp. 259–264. IEEE (2014)Google Scholar
  24. 24.
    Chaurasia, V., Pal, S.: Early prediction of heart diseases using data mining techniques. Caribb. J. Sci. Technol. 1, 208–217 (2013)Google Scholar
  25. 25.
    Gharehchopogh, F.S., Khalifelu, Z.A.: Neural network application in diagnosis of patient: a case study. In: 2011 International Conference on Computer Networks and Information Technology (ICCNIT), pp. 245–249. IEEE (2011)Google Scholar
  26. 26.
    Uppin, S., Anusuya, M.: Expert system design to predict heart and diabetes diseases. Int. J. Sci. Eng. Technol. 3(8), 1054–1059 (2014)Google Scholar
  27. 27.
    Shouman, M., Turner, T., Stocker, R.: Integrating decision tree and k-means clustering with different initial centroid selection methods in the diagnosis of heart disease patients. In: Proceedings of the International Conference on Data Mining (DMIN), The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), pp. 1. (2012)Google Scholar
  28. 28.
    Bohacik, J., Kambhampati, C., Davis, D., Cleland, J.: Alternating decision tree applied to risk assessment of heart failure patients. J. Inf. Technol. 6(2), 25–33 (2013)Google Scholar
  29. 29.
    Melillo, P., De Luca, N., Bracale, M., Pecchia, L.: Classification tree for risk assessment in patients suffering from congestive heart failure via long-term heart rate variability. IEEE J. Biomed. Health Inform. 17(3), 727–733 (2013)CrossRefGoogle Scholar
  30. 30.
    Sathish, M., Sridhar, D.: Prediction of heart diseases in data mining techniques. Int. J. Comput. Trends Technol. 24 (2015)CrossRefGoogle Scholar
  31. 31.
    Isler, Y.: Discrimination of systolic and diastolic dysfunctions using multi-layer perceptron in heart rate variability analysis. Comput. Biol. Med. 76, 113–119 (2016)CrossRefGoogle Scholar
  32. 32.
    Shah, S.J., Katz, D.H., Selvaraj, S., Burke, M.A., Yancy, C.W., Gheorghiade, M., Bonow, R.O., Huang, C.C., Deo, R.C.: Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation 114 (2014)Google Scholar
  33. 33.
    Srinivas, K., Rao, G.R., Govardhan, A.: Analysis of attribute association in heart disease using data mining techniques. Int. J. Eng. Res. Appl. 1680–1683 (2012)Google Scholar
  34. 34.
    Nahar, J., Imam, T., Tickle, K.S., Chen, Y.P.P.: Association rule mining to detect factors which contribute to heart disease in males and females. Expert Syst. Appl. 40(4), 1086–1093 (2013)CrossRefGoogle Scholar
  35. 35.
    Methaila, A., Kansal, P., Arya, H., Kumar, P.: Early heart disease prediction using data mining techniques. Comput. Sci. Inf. Technol. J. 53–59 (2014)Google Scholar
  36. 36.
    Sudhakar, K., Manimekalai, D.M.: Study of heart disease prediction using data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1) (2014)Google Scholar
  37. 37.
    Ilayaraja, M., Meyyappan, T.: Efficient data mining method to predict the risk of heart diseases through frequent itemsets. Procedia Comput. Sci. 70, 586–592 (2015)CrossRefGoogle Scholar
  38. 38.
    Subramanian, S., Mohanapriya, S., Nagasandhiyalakshmi, B., Shanmugapriya, N.: Prediction of outbreak heart diseases using text mining. Discovery 1070–1077 (2016)Google Scholar
  39. 39.
    Yang, G., Ren, Y., Pan, Q., Ning, G., Gong, S., Cai, G., Zhang, Z., Li, L., Yan, J.: A heart failure diagnosis model based on support vector machine. In: 2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI), vol. 3, pp. 1105–1108. IEEE (2010)Google Scholar
  40. 40.
    Bou Rjeily, C., Badr, G., El Hassani, A.H., Andres, E.: Sequence prediction algorithm for heart failure prediction. In: International Conference e-Health, pp. 109–116 (2017)Google Scholar
  41. 41.
    Bou Rjeily, C., Badr, G., El Hassani, A.H., Andres, E.: Predicting heart failure class using a sequence prediction algorithm. In: 2017 International Conference on Advances in Biomedical Engineering, IEEE (2017)Google Scholar
  42. 42.
    Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001)CrossRefGoogle Scholar
  43. 43.
    Reps, J., Garibaldi, J.M., Aickelin, U., Soria, D., Gibson, J.E., Hubbard, R.B.: Discovering sequential patterns in a UK general practice database. In: 2012 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 960–963. IEEE (2012)Google Scholar
  44. 44.
    Batal, I., Valizadegan, H., Cooper, G.F., Hauskrecht, M.: A pattern mining approach for classifying multivariate temporal data. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 358–365. IEEE (2011)Google Scholar
  45. 45.
    Wright, A.P., Wright, A.T., McCoy, A.B., Sittig, D.F.: The use of sequential pattern mining to predict next prescribed medications. J. Biomed. Inf. 53, 73–80 (2015)CrossRefGoogle Scholar
  46. 46.
    Bou Rjeily, C., Badr, G., El Hassani, A.H., Andres, E.: Overview on Sequential Mining Algorithms and Their Extensions. Springer (2017)Google Scholar
  47. 47.
    Bou Rjeily, C., Badr, G., El Hassani, A.H., Andres, E.: Sequential mining classification. In: IEEE International Conference on Computer and Applications (ICCA), pp. 190–194. IEEE (2017)Google Scholar
  48. 48.
    Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Advances in Database TechnologyEDBT’96, pp. 1–17 (1996)Google Scholar
  49. 49.
    Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)Google Scholar
  50. 50.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)Google Scholar
  51. 51.
    Han, J., Pei, J., Kamber, M.: Data Mining: concepts and techniques. Elsevier (2011)CrossRefGoogle Scholar
  52. 52.
    Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2), 107–144 (2007)MathSciNetCrossRefGoogle Scholar
  53. 53.
    Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 58–67. IEEE (2010)Google Scholar
  54. 54.
    Fu, T.c.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  55. 55.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)CrossRefGoogle Scholar
  56. 56.
    Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429–435. ACM (2002)Google Scholar
  57. 57.
    Aseervatham, S., Osmani, A., Viennet, E.: bitspade: A lattice-based sequential pattern mining algorithm using bitmap representation. In: Sixth International Conference on Data Mining, ICDM’06, pp. 792–797. IEEE (2006)Google Scholar
  58. 58.
    Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast vertical mining of sequential patterns using co-occurrence information. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 40–52. Springer (2014)CrossRefGoogle Scholar
  59. 59.
    Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering, pp. 215–224 (2001)Google Scholar
  60. 60.
    Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)MathSciNetCrossRefGoogle Scholar
  61. 61.
    Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 3 (2010)CrossRefGoogle Scholar
  62. 62.
    Yan, X., Han, J., Afshar, R.: Clospan: mining: closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM International Conference on Data Mining, SIAM, pp. 166–177 (2003)CrossRefGoogle Scholar
  63. 63.
    Wang, J., Han, J.: Bide: efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering, pp. 79–90. IEEE (2004)Google Scholar
  64. 64.
    Gomariz, A., Campos, M., Marin, R., Goethals, B.: Clasp: an efficient algorithm for mining frequent closed sequences. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 50–61. Springer (2013)CrossRefGoogle Scholar
  65. 65.
    Lee, Y.S., Yen, S.J.: Incremental and interactive mining of web traversal patterns. Inf. Sci. 178(2), 287–306 (2008)CrossRefGoogle Scholar
  66. 66.
    Fournier-Viger, P., Gomariz, A., Šebek, M., Hlosta, M.: Vgen: fast vertical mining of sequential generator patterns. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 476–488. Springer (2014)Google Scholar
  67. 67.
    Lo, D., Khoo, S.C., Li, J.: Mining and ranking generators of sequential patterns. In: Proceedings of the 2008 SIAM International Conference on Data Mining, SIAM, pp. 553–564 (2008)CrossRefGoogle Scholar
  68. 68.
    Pham, T.T., Luo, J., Hong, T.P., Vo, B.: Msgps: a novel algorithm for mining sequential generator patterns. In: International Conference on Computational Collective Intelligence, pp. 393–401. Springer (2012)CrossRefGoogle Scholar
  69. 69.
    Barron, A., Rissanen, J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theory 44(6), 2743–2760 (1998)MathSciNetCrossRefGoogle Scholar
  70. 70.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15(1), 3389–3393 (2014)zbMATHGoogle Scholar
  71. 71.
    Gao, C., Wang, J., He, Y., Zhou, L.: Efficient mining of frequent sequence generators. In: Proceedings of the 17th International Conference on World Wide Web, pp. 1051–1052. ACM (2008)Google Scholar
  72. 72.
    Yi, S., Zhao, T., Zhang, Y., Ma, S., Che, Z.: An effective algorithm for mining sequential generators. Procedia Eng. 15, 3653–3657 (2011)CrossRefGoogle Scholar
  73. 73.
    Fournier-Viger, P., Wu, C.W., Tseng, V.S.: Mining maximal sequential patterns without candidate maintenance. In: International Conference on Advanced Data Mining and Applications, pp. 169–180. Springer (2013)CrossRefGoogle Scholar
  74. 74.
    Fournier-Viger, P., Wu, C.W., Gomariz, A., Tseng, V.S.: VMSP: Efficient vertical mining of maximal sequential patterns. In: Canadian Conference on Artificial Intelligence, pp. 83–94. Springer (2014)CrossRefGoogle Scholar
  75. 75.
    Lam, H.T., Mörchen, F., Fradkin, D., Calders, T.: Mining compressing sequential patterns. Stat. Anal. Data Min. ASA Data Sci. J. 7(1), 34–52 (2014)MathSciNetCrossRefGoogle Scholar
  76. 76.
    Tzvetkov, P., Yan, X., Han, J.: Tsp: Mining top-k closed sequential patterns. Knowl. Inf. Syst. 7(4), 438–457 (2005)CrossRefGoogle Scholar
  77. 77.
    Fournier-Viger, P., Zida, S., Lin, J.C.W., Wu, C.W., Tseng, V.S.: Efim-closed: Fast and memory efficient discovery of closed high-utility itemsets. In: Machine Learning and Data Mining in Pattern Recognition, pp. 199–213. Springer (2016)CrossRefGoogle Scholar
  78. 78.
    Shie, B.E., Hsiao, H.F., Tseng, V.S., Philip, S.Y.: Mining high utility mobile sequential patterns in mobile commerce environments. In: International Conference on Database Systems for Advanced Applications, pp. 224–238. Springer (2011)CrossRefGoogle Scholar
  79. 79.
    Ahmed, C.F., Tanbeer, S.K., Jeong, B.S.: Mining high utility web access sequences in dynamic web log data. In: 2010 11th ACIS International Conference on Software Engineering Artificial Intelligence Networking and Parallel/Distributed Computing (SNPD), pp. 76–81. IEEE (2010)Google Scholar
  80. 80.
    Ahmed, C.F., Tanbeer, S.K., Jeong, B.S.: A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J. 32(5), 676–686 (2010)CrossRefGoogle Scholar
  81. 81.
    Deogun, J., Jiang, L.: Prediction mining–an approach to mining association rules for prediction. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, pp. 98–108 (2005)Google Scholar
  82. 82.
    Fournier-Viger, P., Nkambou, R., Tseng, V.S.M.: Rulegrowth: mining sequential rules common to several sequences by pattern-growth. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 956–961. ACM (2011)Google Scholar
  83. 83.
    Fournier-Viger, P., Faghihi, U., Nkambou, R., Nguifo, E.M.: CMRules: mining sequential rules common to several sequences. Knowl. Based Syst. 25(1), 63–76 (2012)CrossRefGoogle Scholar
  84. 84.
    Fournier-Viger, P., Gueniche, T., Zida, S., Tseng, V.S.: Erminer: sequential rule mining using equivalence classes. In: International Symposium on Intelligent Data Analysis, pp. 108–119. Springer (2014)Google Scholar
  85. 85.
    Fournier-Viger, P., Tseng, V.S.: Mining top-k sequential rules. In: International Conference on Advanced Data Mining and Applications, pp. 180–194. Springer (2011)CrossRefGoogle Scholar
  86. 86.
    Fournier-Viger, P., Tseng, V.S.: Tns: mining top-k non-redundant sequential rules. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp. 164–166. ACM (2013)Google Scholar
  87. 87.
    Zida, S., Fournier-Viger, P., Wu, C.W., Lin, J.C.W., Tseng, V.S.: Efficient mining of high-utility sequential rules. In: International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 157–171. Springer (2015)CrossRefGoogle Scholar
  88. 88.
    Gueniche, T., Fournier-Viger, P., Tseng, V.S.: Compact prediction tree: a lossless model for accurate sequence prediction. In: ADMA, vol. 2, pp. 177–188 (2013)CrossRefGoogle Scholar
  89. 89.
    Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. 32(4), 396–402 (1984)CrossRefGoogle Scholar
  90. 90.
    Padmanabhan, V.N., Mogul, J.C.: Using predictive prefetching to improve world wide web latency. ACM SIGCOMM Comput. Commun. Rev. 26(3), 22–36 (1996)CrossRefGoogle Scholar
  91. 91.
    Pitkow, J., Pirolli, P.: Mining longest repeatin g subsequences to predict World Wide Web surfing. In: Proceedings of USENIX Symposium on Internet Technologies and Systems, pp. 1 (1999)Google Scholar
  92. 92.
    Gueniche, T., Fournier-Viger, P., Raman, R., Tseng, V.S.: CPT+: Decreasing the time/space complexity of the compact prediction tree. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 625–636. Springer (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Carine Bou Rjeily
    • 1
  • Georges Badr
    • 2
    Email author
  • Amir Hajjarm El Hassani
    • 1
  • Emmanuel Andres
    • 3
  1. 1.Nanomedicine LabUniversité de Bourgogne Franche - Comté, UTBM BelfortBelfortFrance
  2. 2.TICKET LabAntonine UniversityHadathLebanon
  3. 3.Université de Strasbourg, Centre Hospitalier UniversitaireStrasbourgFrance

Personalised recommendations