Abstract
A wealth of data are available within the health care system, however, effective analysis tools for exploring the hidden patterns in these datasets are lacking. To alleviate this limitation, this paper proposes a simple but promising hybrid predictive model by suitably combining the Chi-square distance measurement with case-based reasoning technique. The study presents the realization of an automated risk calculator and death prediction in some life-threatening ailments using Chi-square case-based reasoning (χ2 CBR) model. The proposed predictive engine is capable of reducing runtime and speeds up execution process through the use of critical χ2 distribution value. This work also showcases the development of a novel feature selection method referred to as frequent item based rule (FIBR) method. This FIBR method is used for selecting the best feature for the proposed χ2 CBR model at the preprocessing stage of the predictive procedures. The implementation of the proposed risk calculator is achieved through the use of an in-house developed PHP program experimented with XAMP/Apache HTTP server as hosting server. The process of data acquisition and case-based development is implemented using the MySQL application. Performance comparison between our system, the NBY, the ED-KNN, the ANN, the SVM, the Random Forest and the traditional CBR techniques shows that the quality of predictions produced by our system outperformed the baseline methods studied. The result of our experiment shows that the precision rate and predictive quality of our system in most cases are equal to or greater than 70%. Our result also shows that the proposed system executes faster than the baseline methods studied. Therefore, the proposed risk calculator is capable of providing useful, consistent, faster, accurate and efficient risk level prediction to both the patients and the physicians at any time, online and on a real-time basis.
Similar content being viewed by others
References
WHO (2016). HIV/AIDS fact sheet. World Health Organization. http://www.who.int/mediacentre/factsheets/fs360/en/. Accessed 4th July, 2017
GLOBOCAN (2008). Breast cancer: prevention and control. Global Health Estimates, WHO 2013. http://www.who.int/mediacentre/factsheets/fs360/en/. Accessed 4th July, 2017
WHO(2017). WHO action plan for the strategy for prevention and control of noncommunicable diseases. World health Organization. http://www.who.int/mediacentre/factsheets/fs360/en/ Accessed 6th July, 2017
Han J, Kamber M (2006) Data mining concept and Techniques, 4111, 2nd edn. Morgan Kaufmann Publishers, Elsevier inc., San Francisco, pp 285–350
Markov Z, Larose D, T (2007) Data mining the web, uncovering patterns in web content, structure, and usage. Wiley, New Jersey, pp 115–132
Zhang X, Edwards J, Harding J (2007). Personalized online sales using web usage data mining. J Comput Ind. 58 (2007) 772–782. https://doi.org/10.1016/j.compind.2007.02.004
Adeniyi DA, Wei Z, Yongquan Y (2015) Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. J Appl Comput Inform. https://doi.org/10.1016/j.aci.2014.10.001
Adeniyi DA, Wei Z, Yongquan Y (2016) Design and realization of online, real-time, web usage data mining and recommendation system using Bayesian classification method. Int J Comput Sci Eng Inf Technol Res (IJCSEITR). 6(3): 19–38 (ISSN(P): 2249–6831; ISSN(E): 2249–7943)
Singh A, Das KK (2007). Application of data mining Technique in Bioinformatics. Dissertation, Department of Computer Science and engineering, National Institute of Technology, Rourkela
Hssina B, Merbouna A, Ezzikouri H, Erritali M,(2014). A Comparative study of decision tree ID3 and C4.5. Int J Adv Comput Sci Appl https://doi.org/10.14569/SpecialIssue.2014.040203 (Special issue on Advance in Vehicular Ad Hoc Networking. and Applications)
Bhosale D, Ade R (2014). Feature selection based Classification using Naive Bayes, J48 and Support Vector Machine. Int J Comput Appl (0975–8887) 99 16
Srinivas V, Santhi rani Ch, madhu T (2013) Investigation of decision tree induction; probabilistic technique and sum for speaker identification. Int J Signal Process Image Process Pattern Recognit 6:193–204. https://doi.org/10.14257/ijsip.2013.6.6.18 2013)
Wang K, Tan Y (2011). A New collaborative filtering recommendation approach Based on Naïve Bayesian method. ICSI, part II LNCS, 6729. pp 218–227
Munk M, Kapusta J, Sveci P (2012) Data preprocessing evaluation for web log mining: reconstruction of activities of a Web visitor. J Proced Comput Sci 1:2273–2280. https://doi.org/10.1016/j.procs.2010.04.255 2012)
Bichindaritz I2015). Data mining methods for case-based reasoning in health sciences. In: Proceedings of the ICCBR 2015 Workshops. Frankfurt, Germany. pp. 184–198
Biau G, Scornet E (2015). A random forest Guide Tour., arxiv:1511.05741v\[maths.ST] 2015, pp. 1–42
Yaghini M, Khoshraftar MM, Fallahi M (2012). A hybrid algorithm for artificial neural network training. Eng Appl Artif Intell (2012). https://doi.org/10.1016/j.engappai.2012.01.023, 1–9
Moosazadeh M, Nezammahalleh IA, Movahednia M, Movahednia N, Khanjani N and Afshari M (2015). Predictive factors of death in patients with tuberculosis: a nested case–control study. Eastern Mediterranean Health Journal (EMHJ), La Revue de Santé de la Méditerranée orientale. vol 21 No. 4, 2015, PP. 287–291
Rivas T, Paz M, Martins JE, Matias JM, Gracia JF, Taboadas J.,(2011). Explaining and predicting workplace accidents using data-mining techniques J Reliab Eng Syst Safety 96(7) 739–747. https://doi.org/10.1016/j.ress.2011.03.006
Idowu PA, Williams KO, Balogun JA, Oluwaranti AI (2015). Breast cancer risk prediction using data mining classification techniques. TNC, transactions on networks and communications (TNC), Society for Science and Education, UK. vol 3, 2, ISSN: 2054–7420. https://doi.org/10.14738/tnc.32.662
Rout M, Majhi B, Majhi R, Panda G (2013) Forecasting of currency exchange rates using an adaptive ARMA model with differential evolution based training. J King Saud Univ Comput Inf Sci 26:7–18. https://doi.org/10.1016/j.jksuci.2013.01.002
Aggarwa CC, Zhai C (2012) Mining text data. Springer. Berlin https://doi.org/10.1007/978-1-4614-3223-4-6
Aha DW (1991).Case based learning algorithms. In: Proceedings of the DARPA Case-Based Reasoning Workshop_ distributed by Morgan Kaufmann Publishers Inc. PP. 1–13
Adeniyi DA, Wei Z, Yang Y, (2016), Personalized news filtering and recommendation system using Chi-square statistics-based K-nearest neighbor (χ2SB-KNN) model. Enterp Inf Syst https://doi.org/10.1080/17517575.2016.1229500
Dubois D, Prude H (1980) Fuzzy sets and systems: theory and applications. Academic Press, New York
Zadeh LA (1966). Shadows of Fuzzy sets. Probl. Peredachi inf. 2, No. 1, 37–44 (In Russ) [Engl.trans. probl.inf. Transm. (USSR) 2 No. 1, 29–34(1966)]
Keller JM, Gray MR, givens JA (1985). A fuzzy K-nearest neighbor algorithm. IEEE Transac Syst Man Cybern, vol SMC-15 No4. (0018-9472/85/0700-0580$01.00)
Schank RC, (1984). Memory-based expert systems. Technical Report (# AFOSR. TR. 84–0814), Yale University, New Haven
Poole D, Mackworth A (2017). Artificial Intelligence: foundations of computational agents, Cambridge University Press
Nworgu BG (1991) Educational Research: Basic Issues and Methodology. Wisdom Publishes Ltd., Ibadan
Bagdonavicius V, Nikulin MS (2011). Chi-squared goodness-of-fit test for right censored data. Int J Appl Math Stat, 30–50
Mesieh AA (2007) Chi square feature extraction based SVMS Arabic language text categorization system. J Comput Sci 3(16):430–435. (ISSN 1549–3606)
Nsofor GC (2006). A comprehensive Analysis of predictive data mining techniques. M.Sc. Thesis, The University of Tennessee, Knoxville
Melville P, Sindhwani V (2010). Recommender System. IBM T.J, Watson Research centre, Yorktown Heights, 10598, 1–18
Rijsbergen CV, (1979). Information retrieval. London; Boston, Butter-worth, 2nd edn. ISBN: 0-408-70929-4
Acknowledgements
This research was financially supported by the Aoshan Innovation Project in Science and Technology of Qingdao National Laboratory for Marine Science and Technology (Grant No. 2016ASKJ07).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Adeniyi, D.A., Wei, Z. & Yang, Y. Risk Factors Analysis and Death Prediction in Some Life-Threatening Ailments Using Chi-Square Case-Based Reasoning (χ2 CBR) Model. Interdiscip Sci Comput Life Sci 10, 854–874 (2018). https://doi.org/10.1007/s12539-018-0283-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-018-0283-6