Risk Factors Analysis and Death Prediction in Some Life-Threatening Ailments Using Chi-Square Case-Based Reasoning (χ2 CBR) Model

  • D. A. Adeniyi
  • Z. Wei
  • Y. YangEmail author
Original Research Article


A wealth of data are available within the health care system, however, effective analysis tools for exploring the hidden patterns in these datasets are lacking. To alleviate this limitation, this paper proposes a simple but promising hybrid predictive model by suitably combining the Chi-square distance measurement with case-based reasoning technique. The study presents the realization of an automated risk calculator and death prediction in some life-threatening ailments using Chi-square case-based reasoning (χ2 CBR) model. The proposed predictive engine is capable of reducing runtime and speeds up execution process through the use of critical χ2 distribution value. This work also showcases the development of a novel feature selection method referred to as frequent item based rule (FIBR) method. This FIBR method is used for selecting the best feature for the proposed χ2 CBR model at the preprocessing stage of the predictive procedures. The implementation of the proposed risk calculator is achieved through the use of an in-house developed PHP program experimented with XAMP/Apache HTTP server as hosting server. The process of data acquisition and case-based development is implemented using the MySQL application. Performance comparison between our system, the NBY, the ED-KNN, the ANN, the SVM, the Random Forest and the traditional CBR techniques shows that the quality of predictions produced by our system outperformed the baseline methods studied. The result of our experiment shows that the precision rate and predictive quality of our system in most cases are equal to or greater than 70%. Our result also shows that the proposed system executes faster than the baseline methods studied. Therefore, the proposed risk calculator is capable of providing useful, consistent, faster, accurate and efficient risk level prediction to both the patients and the physicians at any time, online and on a real-time basis.


Risk factors Chi-square distance Case-Based Prediction Ailments Risk calculator 



This research was financially supported by the Aoshan Innovation Project in Science and Technology of Qingdao National Laboratory for Marine Science and Technology (Grant No. 2016ASKJ07).


  1. 1.
    WHO (2016). HIV/AIDS fact sheet. World Health Organization. Accessed 4th July, 2017
  2. 2.
    GLOBOCAN (2008). Breast cancer: prevention and control. Global Health Estimates, WHO 2013. Accessed 4th July, 2017
  3. 3.
    WHO(2017). WHO action plan for the strategy for prevention and control of noncommunicable diseases. World health Organization. Accessed 6th July, 2017
  4. 4.
    Han J, Kamber M (2006) Data mining concept and Techniques, 4111, 2nd edn. Morgan Kaufmann Publishers, Elsevier inc., San Francisco, pp 285–350Google Scholar
  5. 5.
    Markov Z, Larose D, T (2007) Data mining the web, uncovering patterns in web content, structure, and usage. Wiley, New Jersey, pp 115–132Google Scholar
  6. 6.
    Zhang X, Edwards J, Harding J (2007). Personalized online sales using web usage data mining. J Comput Ind. 58 (2007) 772–782. CrossRefGoogle Scholar
  7. 7.
    Adeniyi DA, Wei Z, Yongquan Y (2015) Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. J Appl Comput Inform. CrossRefGoogle Scholar
  8. 8.
    Adeniyi DA, Wei Z, Yongquan Y (2016) Design and realization of online, real-time, web usage data mining and recommendation system using Bayesian classification method. Int J Comput Sci Eng Inf Technol Res (IJCSEITR). 6(3): 19–38 (ISSN(P): 2249–6831; ISSN(E): 2249–7943)Google Scholar
  9. 9.
    Singh A, Das KK (2007). Application of data mining Technique in Bioinformatics. Dissertation, Department of Computer Science and engineering, National Institute of Technology, RourkelaGoogle Scholar
  10. 10.
    Hssina B, Merbouna A, Ezzikouri H, Erritali M,(2014). A Comparative study of decision tree ID3 and C4.5. Int J Adv Comput Sci Appl (Special issue on Advance in Vehicular Ad Hoc Networking. and Applications)CrossRefGoogle Scholar
  11. 11.
    Bhosale D, Ade R (2014). Feature selection based Classification using Naive Bayes, J48 and Support Vector Machine. Int J Comput Appl (0975–8887) 99 16Google Scholar
  12. 12.
    Srinivas V, Santhi rani Ch, madhu T (2013) Investigation of decision tree induction; probabilistic technique and sum for speaker identification. Int J Signal Process Image Process Pattern Recognit 6:193–204. 2013)CrossRefGoogle Scholar
  13. 13.
    Wang K, Tan Y (2011). A New collaborative filtering recommendation approach Based on Naïve Bayesian method. ICSI, part II LNCS, 6729. pp 218–227Google Scholar
  14. 14.
    Munk M, Kapusta J, Sveci P (2012) Data preprocessing evaluation for web log mining: reconstruction of activities of a Web visitor. J Proced Comput Sci 1:2273–2280. 2012)CrossRefGoogle Scholar
  15. 15.
    Bichindaritz I2015). Data mining methods for case-based reasoning in health sciences. In: Proceedings of the ICCBR 2015 Workshops. Frankfurt, Germany. pp. 184–198Google Scholar
  16. 16.
    Biau G, Scornet E (2015). A random forest Guide Tour., arxiv:1511.05741v\[maths.ST] 2015, pp. 1–42Google Scholar
  17. 17.
    Yaghini M, Khoshraftar MM, Fallahi M (2012). A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell (2012)., 1–9CrossRefGoogle Scholar
  18. 18.
    Moosazadeh M, Nezammahalleh IA, Movahednia M, Movahednia N, Khanjani N and Afshari M (2015). Predictive factors of death in patients with tuberculosis: a nested case–control study. Eastern Mediterranean Health Journal (EMHJ), La Revue de Santé de la Méditerranée orientale. vol 21 No. 4, 2015, PP. 287–291CrossRefGoogle Scholar
  19. 19.
    Rivas T, Paz M, Martins JE, Matias JM, Gracia JF, Taboadas J.,(2011). Explaining and predicting workplace accidents using data-mining techniques J Reliab Eng Syst Safety 96(7) 739–747. CrossRefGoogle Scholar
  20. 20.
    Idowu PA, Williams KO, Balogun JA, Oluwaranti AI (2015). Breast cancer risk prediction using data mining classification techniques. TNC, transactions on networks and communications (TNC), Society for Science and Education, UK. vol 3, 2, ISSN: 2054–7420.
  21. 21.
    Rout M, Majhi B, Majhi R, Panda G (2013) Forecasting of currency exchange rates using an adaptive ARMA model with differential evolution based training. J King Saud Univ Comput Inf Sci 26:7–18. CrossRefGoogle Scholar
  22. 22.
    Aggarwa CC, Zhai C (2012) Mining text data. Springer. Berlin Scholar
  23. 23.
    Aha DW (1991).Case based learning algorithms. In: Proceedings of the DARPA Case-Based Reasoning Workshop_ distributed by Morgan Kaufmann Publishers Inc. PP. 1–13Google Scholar
  24. 24.
    Adeniyi DA, Wei Z, Yang Y, (2016), Personalized news filtering and recommendation system using Chi-square statistics-based K-nearest neighbor (χ2SB-KNN) model. Enterp Inf Syst CrossRefGoogle Scholar
  25. 25.
    Dubois D, Prude H (1980) Fuzzy sets and systems: theory and applications. Academic Press, New YorkGoogle Scholar
  26. 26.
    Zadeh LA (1966). Shadows of Fuzzy sets. Probl. Peredachi inf. 2, No. 1, 37–44 (In Russ) [Engl.trans. probl.inf. Transm. (USSR) 2 No. 1, 29–34(1966)]Google Scholar
  27. 27.
    Keller JM, Gray MR, givens JA (1985). A fuzzy K-nearest neighbor algorithm. IEEE Transac Syst Man Cybern, vol SMC-15 No4. (0018-9472/85/0700-0580$01.00)CrossRefGoogle Scholar
  28. 28.
    Schank RC, (1984). Memory-based expert systems. Technical Report (# AFOSR. TR. 84–0814), Yale University, New HavenGoogle Scholar
  29. 29.
    Poole D, Mackworth A (2017). Artificial Intelligence: foundations of computational agents, Cambridge University PressGoogle Scholar
  30. 30.
    Nworgu BG (1991) Educational Research: Basic Issues and Methodology. Wisdom Publishes Ltd., IbadanGoogle Scholar
  31. 31.
    Bagdonavicius V, Nikulin MS (2011). Chi-squared goodness-of-fit test for right censored data. Int J Appl Math Stat, 30–50Google Scholar
  32. 32.
    Mesieh AA (2007) Chi square feature extraction based SVMS Arabic language text categorization system. J Comput Sci 3(16):430–435. (ISSN 1549–3606) Google Scholar
  33. 33.
    Nsofor GC (2006). A comprehensive Analysis of predictive data mining techniques. M.Sc. Thesis, The University of Tennessee, KnoxvilleGoogle Scholar
  34. 34.
    Melville P, Sindhwani V (2010). Recommender System. IBM T.J, Watson Research centre, Yorktown Heights, 10598, 1–18Google Scholar
  35. 35.
    Rijsbergen CV, (1979). Information retrieval. London; Boston, Butter-worth, 2nd edn. ISBN: 0-408-70929-4Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Technology, College of Information Science and EngineeringOcean University of ChinaQingdaoChina

Personalised recommendations