Abstract
Expert system is an artificial intelligence based system that imitates the decision making ability of human and it is used as the diagnostic tool for many diseases including diabetes mellitus, COVID-19, cancers, coronary artery disease (CAD), among other diseases. Even though CAD is globally one of the deadliest diseases and it is not well known in Nigeria, it causes many deaths as such in 2014, 53,836 or 2.82% of total deaths in Nigeria resulted from the CAD. In this study, fuzzy based expert system for diagnosis of CAD is developed in order to provide the complementary diagnostic tools for diagnosis of CAD’s patients in Nigeria. The improved C4.5 data mining algorithm is used to transfer the knowledge of human expert to the knowledge base on the expert system instead of using conventional techniques such as interviews, questionnaires, etc. Taken together, the performance evaluation system was carried out, and the system has an overall accuracy, sensitivity and specificity of 94.55%, 95.35% and 95.00% respectively; which show that, the system is reliable and capable of diagnosing both negative and positive cases of CAD patients efficiently.
Similar content being viewed by others
1 Introduction
Coronary artery disease (CAD) is the disease caused by vascular stenosis that supplies oxygenated blood to humans’ heart which results in severe heart problems such as angina and heart attack [10, 11]. CAD is one of the largest killers in developed countries including Nigeria killing more than 7.4 million people around the world in 2012 and 53,836 persons as of 2014 in Nigeria [20, 21]. It is also estimated that 1 in 7 people in the United States has CAD [3]. The disease is also one of the leading causes of deaths in women, killing more people than cancers [3]. CAD causes more deaths and disabilities in developed nations than developing ones [20] [28].
In Nigeria, CAD causes more deaths due to insufficient knowledge of the negative impact of the disease on humans [5] [23, 24]. In Nigeria, CAD-related deaths reached 53,836 which amount to 2.82% of all deaths that occurred in 2014 [20] [24]. Most of the victims of the disease tend to ignore the early symptoms and consultations with the health workers until they are in a bad or severe condition of the CAD. Therefore, most of these patients die before receiving appropriate medications or medical attentions [5]. There is a huge burden of CAD in most of the West African countries due to limited resources to provide comprehensive health care for the CAD patients and inadequate awareness campaign of the disease. Therefore, early detection and diagnosis of the CAD, being currently one of the deadliest diseases in Nigeria, might be assisting significantly to fight the disease [8, 9].
Medical diagnosis is the process of diagnosing the disease by measuring specific symptoms and signs [11, 12] [22]. The patient expresses symptoms of the disease to medical doctor, while signs of the disease are observed by medical doctor. However, patient may not accurately sometime express the symptoms, and physicians may not always be sure of signs of the diseases due to uncertainty and vagueness in the course of diagnostic decision making [11, 21, 22]. Therefore, various uncertainties and vagueness affect the diagnostic process, and they must be carefully dealt with [7]. Sometime also, physicians often have variations in their decisions due to uncertainty and vagueness of the information they have at their disposal [18, 24, 25]. These uncertainties, complexities and vagueness involved in the diagnostic decision-making process have to be addressed. In this regard, the fuzzy expert systems are being developed to mimic human specialists' diagnostic decision-making processes in order to address the issue of uncertainties, complexities and vagueness often associated with decision making [11, 12, 26]. The fuzzy-based expert system is an advanced artificial intelligence system that uses unconventional thinking to reduce the uncertainty that is often associated with the diagnosis process of diseases [1, 17] [21] [26]. In this study, a fuzzy expert system for diagnosis of coronary artery disease is built with MATLAB, which can easily be integrated into the electronic health record system.
2 Related works
In [14], fuzzy expert system for diagnosing of heart disease based on medical records in Jordan was developed using a visual studio, and system is able to identify CAD patients. In [19], the clinical support system for treating chronic heart disease using risk factors, most of which are clinical risk factors, has been developed. C4.5 algorithm is used to generate system production rules from the Cleveland heart disease database and system was proved to be very efficient and effective. In [16], a web-based diagnostic system for diagnosis of cardiac is developed with PHP, HTML, and Java script and MySQL and the system used 15 input variables with seven diagnostic rules. In [1], an expert system for diagnosing cardiovascular disease with MATLAB was developed, and the system has 94% accuracy. In [13], an expert system for cardiovascular disease was developed and the production rules of the system were made from the UCI Cleveland Clinic Foundation, Repository of Machine Learning Databases. In [17], a CAD screening system using clinical parameters was developed and a questionnaire was designed under the medical team's guidance to collect information about patients' clinical parameters based on the risk factors of CAD. The system was implemented using object-oriented technique with one demographic risk factor and eleven clinical risk factors of CAD.
Many scholars have developed expert systems for diagnosing CAD. Still, most of these systems, and their production rules, were generated from a repository dataset of CAD, such as the Cleveland Heart Disease Database. Only a few generated datasets from the available medical records. Therefore, developing a Fuzzy Based Expert System for Diagnosis of CAD using datasets generated from CAD patients' medical records in Nigeria is required.
3 Materials and methods
The fuzzy based expert system developed in this work, has three (3) major component which include knowledge discovery (data mining), fuzzification, knowledge inference and defuzzification. Figure 1 shows the methods and materials of the study.
3.1 Dataset
Diagnostic data of the patients who are suffering from and those who were suspected of having CAD was collected at General Hospitals in Kano State, Nigeria. The data collection was approved by the Kano State Ministry of Health in Kano – Nigeria.
Data preparation.
The dataset collected was prepared, cleansed and only 506 diagnostic cases were recorded. The dataset has twelve (12) attributes which include age, glucose, blood pressure, chest pain, triglycerides, high-density lipoprotein (HDL), cholesterol, low-density lipoprotein (LDL), body mass index, creatinine, heart rate, and diagnostic result. Table 1 shows units, range, and data type of each attribute of the dataset.
3.2 Knowledge discovery (Data mining process)
The prepared and cleaned dataset was transformed into Weka readable file format called Attribute-Relation File Format (ARFF). Weka is an open-source machine learning software used to uncover useful knowledge from the dataset [25]. An improved C4.5 classification algorithm proposed by [4] was encoded into the Weka to generate the production rules used in the knowledge of the system. The algorithm employed L’ hospital rule in the course of the improvement of C4.5 algorithm where it uses average information gain and information gain ratio rather than just information gain ratio used by C4.5 algorithm as the criterion to select the candidate attribute as the root of the decision tree [27]. Let Assume, S is the dataset and B as the set of attributes of the dataset. The information gain of attribute B is computed using Eq. 1 as follows
The Gain-Ratio (b) is expressed as follows
The algorithm computes average information gain and information gain ratio using equation.
where.
B1 is the set of positive sample in B.
B2 is the set of negative sample in B.
B11 is the set of positive sample that are in B with positive value of attributes.
B12 is the set of positive sample that are in B with negative value of attributes.
B21 is the set of negative sample that are in B with positive value of attributes.
B22 is the set of negative sample that are in B with negative value of attributes.
The improved C4.5 algorithm was ran into Weka simultaneously with C4.5 and Random Tree Algorithms respectively. The performance results of the algorithms are shown in Table 2. The improved algorithm has the highest accuracy of 86.56% among all the algorithms.
The decision tree generated by using improved C4.5 was converted into crisp rules. Table 3 shows some of the corresponding crisp rules of generated from the decision tree.
3.3 Rule selection
Rule Selection Technique (RST) proposed by [2] was adopted to select the crisp rules generated using an improved C4.5 algorithm. The rule selection is based on the notion of the importance measure and supports filtering of the rules, therefore, rules were converted into decision table. The filtering technique is applied to select the rules in order to reduce their number before to apply importance measure to select the most importance ones [2].
3.4 Fuzzification
Fuzzification is the process of fuzzifying the crisp set of rules generated using an improved C4.5 algorithm. The fuzzification is carried out using fuzzy logic. Unlike traditional logic which has only 0 or 1, fuzzy logic has infinite numbers from 0 to 1. Fuzzy logic is called multi-valued logic, unlike the conventional logic set, where an element can either belong entirely to a group or does not belong at all [6, 17]. In the fuzzy theory, A fuzzy set A in X is defined as a set of ordered pairs = , () ∈
where μA(x) is called the membership function of set A
Fuzzy sets allow a succession of possible choices. For any element x of the universe X, the membership function μA(x) is equal to the degree that x is an element of set A [27]. This value set between 0 and 1 is considered the order of membership [27]. It is also known as the membership value of the element x in set A. Fuzzy logic is just an expression of ambiguity and uncertainty. The advantage is that they can overlap and avoid the problem of sharp boundaries [15]. Therefore, the attributes of the dataset which are the system's inputs and output were fuzzified in order to address the inaccuracies, ambiguities, and uncertainties associated with diagnostic decision making of CAD’ patients[17].
The system input parameters include: age, blood pressure, glucose, cholesterol, triglycerides, HDL, LDL, creatinine, body mass index, heart rate, and chest pain, which has been defined with three fuzzy linguistic values and the output variable (diagnosis)with an input parameter chest pain have four fuzzy values. The output variable (diagnosis) has healthy, mild, moderate, and severe fuzzy linguistic values while and chest pain which is an input parameter has typical angina, atypical angina, non-angina, and asymptomatic fuzzy linguistic values. However, there is no ambiguity or overlap in chest pain, and since the patient has only one chest pain at a time. The value of a fuzzy variable is defined by the fuzzy membership grade, which is determined by the membership function. However, a trapezoidal membership function was used for all input variables while for output variable, triangular membership function was used. A trapezoidal membership function distribution is represented as Trapezoidal (x; a,b,c,d). The membership function value at x = a, x = b, x = c and x = d are set equal to 0.0, 1.0, 1.0 and 0.0, respectively. The trapezoidal membership function expressed in Eq. (6) below
The triangular membership function is donated by Triangle (x; a, b, c). The membership function value at x = a, x = b and x = c are set equal to 0.0, 1.0 and 0.0, respectively. The triangular membership function expressed in Eq. (7) below
The linguistic variables and membership functions each attribute of the dataset is determined, calculated and visualized using MATLAB. Thus, each crisp value has been transformed or converted into a fuzzy value. As such, all the crisp set rules generated using an improved C4.5 decision tree algorithm were transformed into the corresponding fuzzy set rules. Moreover, after determining each attribute's linguistic variable and converting crisp value into fuzzy values, the crisp set rules generated earlier were converted into a fuzzy set of rules. Table 4 shows the sample of fuzzy rules, Fig. 2 shows membership functions and linguistic of age, Fig. 3 shows the membership function of the linguistic variables of the chest pain and Fig. 4 shows the membership functions of the linguistic variables of diagnosis.
4 Fuzzy based expert system
A fuzzy Based Expert System for Diagnosis of CAD has three major components which include knowledge base, inference engine, and defuzzification (user interface).
4.1 Knowledge base
The knowledge base has been developed based on the historical data and the experience of cardiologists. Cardiologists were consulted and involved in the stage of data collection, cleaning, interpretation and knowledge generation. Cardiologists verified each rule generated with an improved data mining algorithm, and all the conflicts were resolved. The system employed a production technique for knowledge presentation. The production rules are written in the format of < IF (condition) THEN (conclusion) > . In the present fuzzy system, condition and conclusion are fuzzy variables. These rules are diagnostic rules and are selected by the inference engine of the system. MATLAB is used to implement the system, which has 87 rules. The knowledge base rules are shown in Fig. 5.
4.2 Knowledge inference
Knowledge inference is a mechanism behind inferring new knowledge from existing fuzzy rules available in the system knowledge base. Therefore, new information and conclusions would be deduced from it. Mamdani inference technique is used to stimulate expert physicians' reasoning in diagnosing CAD in this work. Mamdani Fuzzy Inference System is widely used because it provides good results with a relatively simple structure. Mamdani is used to create a control system synthesizing a set of linguistic production rules obtained from experienced human operators [17]. Therefore, the Minimum operator, the conjunction operator is MIN, the t-norm from the compositional rule is MIN, and the MAX operator is used to aggregate the rules. Figure 6 shows the Graphical User Interface (GUI) of System Inference with Mamdani technique.
4.3 Defuzzification
Defuzzification involves transforming the output of the inference engine (fuzzy values) into crisp values. A centroid is employed in this work for defuzzification, called the center of area or center of gravity, where z is the output variable, and (z) is the membership function of the aggregated fuzzy set A referring to z. The Centroid method de-fuzzifies the system's diagnosis result's undefined values, which is the output of the system to crisp values.
Figure 7 shows the GUI of Rule Viewer of the system while Fig. 8 shows the Surface Viewer of the system.
5 Performance evaluation of the system
The expert system was applied to the diagnostic data of 100 people (Healthy = 21%, Mild = 23%, Moderate = 31% and Severe = 25%) who came to Specialist Hospital in Kano, Nigeria for CAD checkup. Information based on one demographic risk factor and eleven clinical risk factors was taken from them and were labelled by cardiologist. The system was applied to find the model predicted risk to these people. For evaluation of the performance of the system, model predicted outputs were compared with the results given by the cardiologist. Table 5 shows the check-up results for each class of patients with healthy, mild, moderate, and severe cases.
The system used to diagnose CAD patients based on a demographical CAD risk factor and eleven clinical risk factors. Below are metrics used to evaluate the performance of the system
-
i.
Accuracy: is used to evaluate the percentage of CAD patients who were correctly diagnosed by the system.
-
ii.
Sensitivity: is used to evaluate the percentage of CAD patients who were abnormal and correctly diagnosed by the system.
-
iii.
Specificity: is used to evaluate the percentage of CAD patients who were normal and correctly diagnosed by the system.
-
iv.
Receiver Operating Characteristic Curve (ROC) is used to show the relationship between the specificity and sensitivity of the system.
Table 6 and Fig. 9 show the system's performance evaluation result based on accuracy sensitivity and specificity as 94.55%, 95.35%, and 95.00%, respectively. ROC shows the relationship between the specificity and sensitivity of the system. The result indicates that the system is reliable and can diagnose both negative and positive CAD patients effectively.
The x-axis of ROC is showing specificity while y-axis showing sensitivity as shown in Fig. 10. The curve shows that the relationship between the specificity and sensitivity and it indicates the diagnostic ability of the system as its discrimination threshold is varied. The curve shows that, the system can diagnose positive cases than negative cases of the CAD patients efficiently.
6 Conclusion
CAD is no longer one of the deadliest diseases to developed nations but rather to developing countries like Nigeria. Therefore, CAD is a world phenomenon. In this study, a fuzzy-based expert system for CAD diagnosis has been designed to complement health workers to diagnose CAD. The improved C4.5 data mining algorithm is used to transfer the human knowledge to the system's knowledge base instead of conventional techniques such as interviews, questionnaires, etc. The performance evaluation system was carried out, and the system has 94.55% accuracy, 95.35% sensitivity, and 95.00% specificity. This shows that system has both higher capability of detecting both healthy and unhealthy CAD patients and it can be relied upon.
References
Ali A, Mehdi N. A Fuzzy Expert System for Heart Disease Diagnosis. Proceedings of the International Multi Conference of Engineers and Computer Scientists. 2010;1:17–9.
Noor AS, Venkatachalam PA, Ahmad FH. Diagnosis of Coronary Artery Disease Using Artificial Intelligence Based Decision Support System. In Proceedings of the International Conference on Man-Machine Systems (ICoMMS), Batu Ferringhi, Penang, Malaysia. 2009;11–13.
American Heart Association (AHA). Heart disease and stroke statistics —at a glance. 2015.
Yahaya BZ, Muhammad LJ, Abdulganiyyu N, Ishaq FS, Atomsa Y. An Improved C4.5 Algorithm Using L’ Hospital Rule for Large Dataset. Indian J Sci Technol. 2018;11:47.
Nwaneli CU. Changing trend in coronary heart disease in Nigeria. Afr Medical J. 2010;1(1):1–4.
Dilip KP. Soft Computing: Fundamentals and Applications. India: NAROSA; 2013. p. 103–21.
Ishaq FS, Muhammad LJ, Yahaya BZ, Atomsa Y. Data Mining Driven Models for Diagnosis of Diabetes Mellitus: A Survey. Indian J Sci Technol. 2018;11:42.
Yan HM, Jiang YT, Zheng J, Peng CL, Li QH. A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Syst Appl. 2006;30:272–81.
Reddy KS. Cardiovascular diseases in the developing countries: Dimensions, determinants, dynamics and directions for public health action. Public Health Nutrition. 2002;5:231–7.
Adel L, Raja NA, Roziati Z, Awang B. Design of a Fuzzy-based Decision Support System for Coronary Heart Disease Diagnosis. J Med Syst. 2012.
Muhammad LJ, Garba EJ, Oye ND, Wajiga GM. On the Problems of Knowledge Acquisition and Representation of Expert System for Diagnosis of Coronary Artery Disease (CAD). International Journal of u- and e- Service, Science and Technology. 2018;11(3):49–58.
Muhammad LJ, Abba Haruna A, Mohammed IA, Abubakar M, Badamasi BG, Musa Amshi J. Performance Evaluation of Classification Data Mining Algorithms on Coronary Artery Disease Dataset, 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran. 2019;1–5.
Adel L, Raja NA, Roziati Z, Awang B. Design of a Fuzzy-based Decision Support System for Coronary Heart Disease Diagnosis, Springer. J Med Syst. 2012.
Ali MA. Fuzzy expert system for Coronary Artery Disease diagnosis in Jordan. Heath Technology: Springer; 2017.
Oladipupo OO. A fuzzy association rule mining expert-driven approach to knowledge acquisition. Ph.D. Thesis, Covenant University. 2012.
Akinyokun OC, Iwasokun GB, Arekete SA, Samuel RW. Fuzzy logic-drive expert system for the diagnosis of heart failure disease. Artif Intell Res. 2015;4(1):12–20.
Debabrata P, Mandana KM, Sarbajit P, Debranjan S, Chandan C. Fuzzy expert system approach for coronary artery disease screening using clinical parameters. Elsevier, Journal of Knowledge-Based Systems. 2012;36:162–74.
Smita SS, Sushil S, Ali M MS. Fuzzy Expert Systems (FES) for Medical Diagnosis. Int J Comp Appl. 2013;63:11.
Oumaima T, Bouchaib C, Abdelhadi R, Omar B. A fuzzy medical diagnostic support system for cardiovascular diseases diagnosis using risk factors. In Proceeding of IEEE International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS). 2018.
World Health Organization (WHO). Cardiovascular diseases, factsheet#317. 2015.
Muhammad LJ, Algehyne EA, Usman SS. Predictive Supervised Machine Learning Models for Diabetes Mellitus. Springer Nature Computer Science, 2020;1:240.
Muhammad LJ, Islam MM, Usman SS, et al. Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients’ Recovery. Springer Nature Computer Science, 2020.
Muhammad LJ, Garba A, Abba G. Security Challenges for Building Knowledge Based Economy in Nigeria. International Journal of Security and Its Applications. 2015;9:1.
Haruna AA, Muhammad LJ, Yahaya BZ, et al. An Improved C4.5 Data Mining Driven Algorithm for the Diagnosis of Coronary Artery Disease. International Conference on Digitization (ICD), Sharjah, United Arab Emirates. 2019;48–52.
Hussain SS, et al. Performance Evaluation of Various Data Mining Algorithms on Road Traffic Accident Dataset. In: Satapathy S, Joshi A. (eds) Information and Communication Technology for Intelligent Systems. Smart Innovation, Systems and Technologies. 2019;106.
Ishaq FS, Muhammad LJ, Yahaya BZ, et al. Fuzzy Based Expert System for Diagnosis of Diabetes Mellitus. International Journal of Advanced Science and Technology. 2020;136:39–50.
Muhammad LJ, Besiru Jibrin M, Yahaya BZ, Mohammed Besiru Jibrin IA, Ahmad A, Amshi JM. "An Improved C4.5 Algorithm using Principle of Equivalent of Infinitesimal and Arithmetic Mean Best Selection Attribute for Large Dataset," 2020 10th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran. 2020;006–010. https://doi.org/10.1109/ICCKE50421.2020.9303622.
Muhammad LJ, Garba EJ, Oye ND, Wajiga GM, Garko AB. Mining Framework to Knowledge Acquisition for Expert System – A Study on Coronary Artery Disease. In Advances in ubiquitous sensing applications for healthcare, Translational Bioinformatics in Healthcare and Medicine, Academic Press. 2021.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
The ethical approval has been granted by Ministry of Health, Kano State – Nigeria.
Conflict of interest
The authors declare that they have no conflict of interest.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Rights and permissions
About this article
Cite this article
Muhammad, L.J., Algehyne, E.A. Fuzzy based expert system for diagnosis of coronary artery disease in nigeria. Health Technol. 11, 319–329 (2021). https://doi.org/10.1007/s12553-021-00531-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12553-021-00531-z