Keywords

1 Introduction

The advancements in healthcare infrastructure of a nation play a key role in determining the health-related quality of life of its residents. The emerging trends in artificial intelligence has boosted the utility of computational resources to improve the healthcare infrastructure globally [1,2,3,4]. One such condition that has gained a lot of attention from research scientists is heart attack [5,6,7,8], owning to the fact that it affects 33% of deaths annually. However, the decision making process in this domain still relies upon the expertise or guesswork of the cardiovascular expert. This makes it highly ambiguous to assure the quality of the decisions [9]. Consequently, there is a need for a system to aid the process of cardiovascular risk assessment.

The increase in the number of artificial intelligence (AI) enthusiasts have helped in the process by generating considerable funding for AI-based research in medical decision making. Nevertheless, the most significant concern in the process has remained the explainability of the results generated by AI-based systems. Particularly, in the case of medical diagnosis, it is extremely important for the decisions to have a plausible justification.

There are numerous artificial intelligence techniques that can support healthcare decision support systems such as genetic fuzzy systems, neural networks, and genetic programming. Three main aspects of every AI systems need to be taken into account: reliability which pertains to the accuracy of the results, explainability which accounts for providing relevant justifications of the results and understandability that focuses on the usability.

This paper concerns the effective diagnosis of heart attack based on a novel classification technique. Exploiting the advantages of Adaptive Neuro Fuzzy Inference System (ANFIS) and Genetic Algorithm (GA) in learning and optimization, an effective classification technique is proposed. The proposed system takes the medical data of a patient to predict how likely the patient might be at a risk of having heart attack. The algorithm is trained using a neural network to determine fuzzy parameters by processing the dataset fed to it. The membership functions (MFs) generated by the Fuzzy Inference System (FIS) are to be optimised by Genetic Algorithm (GA).

The system has an innovative interface which explains the results generated by the algorithm. The possibility of having heart attack is determined in five levels: no risk, slight risk, average risk, high risk and very high risk. Additionally, the explainable interface of the system reflects the relevance of various attributes such as heart rate, chest pain type and other factors that contribute to the predicted results. This makes the decision-making process intuitive as well as reliable. The explainable interface of the system intends to provide insight into how reasonable the predictions are. This gives the clinicians the ability to trace back a prognosis of heart attack to the associated symptoms, and helps in making the process of medical diagnosis trustworthy, safe and transparent. The system’s reliability is evaluated by evaluation functions such as sensitivity, specificity, precision, accuracy, Root Mean Squared Error (RMSE) and k-fold cross validation.

A brief narrative of the hypothesis considered to be evaluated is provided in Sect. 2. Section 3, illuminates the existing literature in the domain. In Sect. 4, the methodology adopted to investigate the hypothesis is discussed. Section 5, focuses on the results achieved. Finally, conclusions are presented in Sect. 6

2 Hypothesis Description

The most of the studies in the realm of AI-based systems have focused on reliability, and explainability is addressed infrequently [10]. Diagnosis of heart attack corresponds to a number of related symptoms which can explain the predictions made by the system. This supports the principle of explainability of the results generated by AI algorithms. In this paper, a potential methodology to predict heart attack in a patient is discussed based on the values of various symptoms such as age, sex, type of chest pain and resting heart rate. An effective justification of the predictions is attained through providing evaluation graphs explaining the predictions made by the algorithm. In fact, the paper represents an effort to mimic the diagnosis process followed by clinicians through providing explainable graphs and giving an assessment based on the previous experience or existing precedence.

3 State of the Art

There is a plethora of work currently done in the domain of artificial intelligence for the medical diagnosis. An application to predict heart attack was presented in [8]. The authors used the Gini index to measure the impurity of training dataset and incorporated it into ANFIS to predict heart attack. Guessing the first approximation of the solutions, their proposed algorithm was employed to tune the parameters and normalize the weights. However, a further investigation is needed to overcome the problem of missing features in large data sets. A convolutional neural network for multi-label classification was presented in [11]. Defining importance values for the input data, the authors tried to present explainable results for each clinical code in the ICD-9 taxonomy and verified their results through evaluating by a physician. Their results showed high precisions along with high Macro-F1 scores comparing to previous works. Employing ANFIS, the prediction of renal failure time frame of Chronic Kidney Disease (CKD) was studied in [12]. A threshold value of glomerular filtration rate (GFR) was defined as the criterion to determine the renal failure. The proposed system could effectively capture the vagueness of the process to predict well-grounded GFR values. In addition, it could predict GFR values over increasing forecasting period with accuracy. Even so, not taking urine protein into account dampers the precision of the system.

Various studies have tried to address the explainability of AI-based algorithms. A rule-based fuzzy inference system was embedded into deep learning networks to make it explainable in [13]. The fuzzy rules were transformed into symbolic functions to initialize the neural network. The system could automatically identify and analyse the input pattern and adjust the behaviour of the predictors so that it could be adaptable. The challenges were overcoming the problem of lack of labelled data and also increasing the reliability of the output data and reducing the required time. In another study, the possible methodologies to approach explainable AI systems in medical domain were discussed in [9]. The authors addressed two types of explainability including post-hoc and ante-hoc explainabilities. The post-hoc explainability relates to explanation after the event in question such as the local interpretable model-agnostic explanations. On the other hand, the ante-hoc explainability corresponds to explanation before the event in question such as fuzzy inference systems. The authors suggested that integrating knowledge-based and neural techniques could provide a text-based explainable model.

The potential of ANFIS has been explored in various research applications. One such application was proposed in [14], where the authors discussed the efficiency of ANFIS and rough sets to predict tuberculosis (TB). Having four output variables related to the chances of having TB and 30 input variables related to the conditions of the patients, the authors reported close to 97% accuracy of the results obtained by ANFIS model compared to 92% accuracy of the rough sets results. A new ANFIS model using Levenberg-Marquardt algorithm for the diagnosis of heart disease was presented in [5]. The authors reported that the proposed model was more reliable and accurate comparing to grid partitioning classification and the case when ANFIS was trained with least squares estimation and backpropagation method. However, the proposed algorithm required calculating the Jacobian matrix for each iteration which could be a challenge for large data sets. Another brilliant employment of ANFIS for diagnosis of heart disease was presented in [7]. Using a 5 layer ANFIS and a hybrid algorithm of forward and backward pass, the authors reported the accuracy of 92.3%. Additionally in an attempt to integrate ANFIS with optimisation algorithms, GA was employed to train ANFIS for diagnosis of heart disease in [6]. The authors used 252 out of 297 cases of the dataset for training and the remained for testing, and reported the accuracy of about 98%.

4 Methodology

Adaptive Neuro Fuzzy Inference System (ANFIS) is a suitable machine learning technique for the classification problems. Combining the features of fuzzy system and neural network, the system can classify and predict unknown inputs. ANFIS exploits the advantages of the human expert knowledge based on the fuzzy linguistic rules, and the ability to learn and adapt based on the neural network. The fuzzy logic creates the flexibility of defining linguistic rules which provides the ability to use and deal with natural languages and human expert knowledge. Fuzzy inference system (FIS) constitutes of three main components including selection of fuzzy rules in order to form a rule-based fuzzy system, membership functions which create a database, and output which is related to the rules and membership functions, and creates an explainable conclusion [15].

The proposed ANFIS-GA algorithm takes the medical data of a patient to predict the risk of having heart attack. The dataset used in this paper is the Cleveland dataset which includes medical reports of 297 patients [16]. It represents 14 features including age, sex, chest pain type, resting blood pressure, cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate, exercise induced angina, depression induced by exercise relative to rest, the slope of peak exercise, number of major vessels, the heart status, and the diagnosis of heart disease as shown in Table 1. 161 out of 297 patients provided in the dataset have no heart disease and 136 patients have some level of heart disease. The diagnosis of heart attack is categorized into five different possibilities including at no risk (level 0), slight risk (level 1), average risk (level 2), high risk (level 3) and very high risk (level 4) of having heart attack. 257 instances of the dataset are used for training and 40 instances are used for testing.

Table 1. Features of the Cleveland dataset.

The block diagram of the proposed algorithm is presented in Fig. 1. First, using Takagi Sugeno type fuzzy inference system, grid partitioning method is used to create initial membership functions based on the training set. Using Gaussian membership functions, if-then rules are generated. After creating an initial FIS, the algorithm is trained using ANFIS to determine fuzzy parameters by processing the dataset fed to it. ANFIS used in the algorithm generates a single-output Sugeno FIS and integrates the least-squares and the backpropagation gradient decent methods for training FIS membership function parameters. The system is trained for 1000 epochs. The training process stops whenever the designated epoch number is reached. In next step and in a novel technique, the membership function parameters are optimized by GA. As a result of combining these three algorithms (the fuzzy logic, the neural network and the genetic algorithm), a new classification algorithm for predicting heart attack is obtained. GA is set to minimize the cost function (CF) which is defined as root mean squared error (RMSE) between the desired (expected) and the predicted heart attack level.

Fig. 1.
figure 1

The proposed ANFIS-GA flow chart.

$$ {\text{CF}} = {\text{RMSE}} = \sqrt {\frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left( {DO_{i} - PO_{i} } \right)^{2} } } , $$
(1)

where DO is the desired output and PO is the predicted output achieved by the algorithm and N is the number of features provided in the dataset. GA parameters are: crossover probability 0.9, mutation probability 0.2, initial population 100 and number of iterations 200. The predictions made by the algorithm can be categorized into four classes. YN which is the case where a patient at a risk of having heart attack is predicted to be at no risk, YY for the case where the algorithm successfully predicts a patient being at a risk of having heart attack, NN which corresponds to the case where the algorithm truly predicts no risk of having heart attack for a patient at level 0, and the class NY, which is related to the case where the algorithm predicts being at a risk of having heart attack for a patient who is healthy.

To evaluate the proposed evolutionary ANFIS-GA system, four evaluation functions including sensitivity, specificity, precision and accuracy are defined to be used along with RMSE

$$ {\text{Sensitivity}} = \frac{\text{YY}}{{{\text{YY}} + {\text{YN}}}}, $$
(2)
$$ {\text{Specificity}} = \frac{\text{NN}}{{{\text{NN}} + {\text{NY}}}}, $$
(3)
$$ {\text{Precision}} = \frac{\text{YY}}{{{\text{YY}} + {\text{NY}}}}, $$
(4)
$$ {\text{Accuracy}} = \frac{{{\text{YY}} + {\text{NN}}}}{{ {\text{YY}} + {\text{YN}} + {\text{NN}} + {\text{NY}}}}. $$
(5)

To investigate the reliability of the proposed system, evaluation functions and RMSE are calculated and compared for both training set and testing set. To address the explainability issue, the algorithm is designed in such a way that it provides graphs related to the symptoms of the predicted patients. The results are generated for the patients who are successfully predicted to be at no risk, at high risk (level 3 and 4) and at a risk of having heart attack. Additionally, to study the robustness of the proposed system and to provide an explainable criterion for the obtained results, the performance of the system is investigated for the cases where one of the features is discarded from the data set. Features are removed one by one and the performance of the system is evaluated and compared with the case when all the features are considered.

To detect the features with highest importance in diagnosis of heart attack, an Importance Evaluation Function (IEF) is defined based on the evaluation functions

$$ {\text{IEF}} = 1 - \frac{x}{400} , $$
(6)

where \( x \) is the summation of sensitivity, specificity, precision and accuracy percentages of testing set. IEF varies within zero and one, where value one shows the highest importance. If a feature with high importance value is removed, the evaluation scores of the testing set will decrease significantly. Top features with highest IEF are to be recognized and the performance of the proposed system would be evaluated based on these features for training and testing data sets. As a result, a simpler model consisting of the recognized features would be proposed and the performance of the system would be evaluated. The results would help to validate the explainability provided for the predictions.

5 Results and Discussion

To investigate the performance of the proposed ANFIS-GA algorithm in predicting heart attack, the algorithm was tested on the Cleveland dataset. As illustrated in Fig. 2, the cost function (RMSE) of the training set reduced from 0.82754 (output of ANFIS) to 0.75372 (output of GA) after 200 iterations. Consequently, the evaluation functions for the training set obtained as the sensitivity 91.1504%, the specificity 79.1667%, the precision 77.4436% and the accuracy 84.4358%. Additionally, the sensitivity, specificity, precision and accuracy for the testing set achieved 79.1667%, 81.25%, 86.3636% and 80%, respectively.

Fig. 2.
figure 2

Cost value (RMSE) as a function of iterations obtained by GA.

For the training set, NN, YY, NY and YN obtained 114, 103, 30 and 10, respectively. Additionally, NN, YY, NY and YN for the testing set obtained 13, 19, 3 and 5, respectively. The algorithm has the capability to illustrate the features corresponding to the predictions in various classes. Figure 3, shows the features for a patient who is truly predicted to be at high risk of having heart attack (level 3). It can be seen from the figure that he is a 58 years old man with highest level of chest pain type (asymptomatic), resting blood pressure of 114 mm-Hg, serum cholesterol of 318 mg/dl, fasting blood sugar less than 120 mg/dl, resting electrocardiographic result of value 1 which means he has ST-T wave abnormality, maximum heart rate achieved 140, without Exercise induced angina, ST depression of value 4.4, down sloping of the peak exercise ST segment, 3 major vessels colored by fluoroscopy and fixed defect of heart.

Fig. 3.
figure 3

The features corresponding to a patient who is successfully predicted by the proposed algorithm to be at level three risk of having heart attack.

The corresponding features of the patients who are successfully predicted to be at a risk of having heart attack are illustrated in Fig. 4 (19 out of 40 patients). It can be seen that the predicted patients are aged from 40 to 68 years old with the average of 55.68. Nearly 74% of them are older than 55 years old. Three of these 19 patients are women and 16 are men. 16 patients have the level 4 of chest pain type (Asymptomatic), 2 of them have Atypical angina and only one of them has Typical angina. Their resting blood pressure is in the range of 110 mm-Hg to 170 mm-Hg with the average of 136.95 mm-Hg. The amount of serum cholesterol for them is in the range of 131 mg/dl to 335 mg/dl with the average of 226.26 mg/dl. Only four patients out of 19 patients have fasting blood sugar more than 120 mg/dl. The resting electrocardiographic results show that the majority of them (11 out of 19) are normal (level 0), two patients have ST-T wave abnormality and six patients show probable or definite left ventricular hypertrophy by Estes’ criteria. The maximum heart rate is in the range of 90 to 181 with the average of 135.89. Additionally, 11 out of 19 patients have exercise induced angina. The ST depression induced by exercise relative to rest for these patients is in the range of 0 to 4.4 with the average of 2.005. For the majority of them (13 out of 19) the slope of the peak exercise ST segment is flat, four of them have upsloping and two have downsloping. Only one of the patients has three major colored by fluoroscopy, five of them have two, seven of them have one, and six of them have no major vessels. Also, 12 out of 19 patients have reversible defect related to their heart status, six of them have fixed defect, and only one of them is normal. These graphs provide explainability for the predictions and are useful for clinicians to interpret the results.

Fig. 4.
figure 4

The features corresponding to the patients who are successfully predicted by the proposed algorithm to be at a risk of having heart attack.

Furthermore, the features corresponding to the patients who are successfully predicted to be at no risk of having heart attack are provided in Fig. 5. As can be seen in this figure, the patients (13 patients) have less than 120 mg/dl fasting blood sugar. The result of resting electrocardiographic for 11 patients is normal and two patients have probable or definite left ventricular hypertrophy by Estes’ criteria. None of the patients have exercise induced angina. For nine of the patients ST depression is zero, 11 patients have no major vessels colored by fluoroscopy and two patients have one vessel. 12 patients have normal heart status and one patient has reversible defect of heart.

Fig. 5.
figure 5

The features corresponding to the patients who are successfully predicted by the proposed algorithm to be at no risk of having heart attack.

The features were discarded and the performance of the model was evaluate (Table 2). The top six features with highest values of IEF were: ST depression induced by exercise relative to rest with IEF 0.2556, number of major vessels colored by fluoroscopy with IEF 0.245, maximum heart rate achieved with IEF 0.2329, and exercise induced angina, resting electrocardiographic results, age with IEF values of 0.2143. Consequently, the algorithm was tested for the smaller model of dataset containing these features and the performance of the proposed algorithm was evaluated. The sensitivity, specificity, precision and accuracy for the new testing set achieved by the proposed algorithm were 79.1667%, 87.5%, 90.4762% and 82.5%, respectively. Additionally, RMSE obtained 0.85158. 19 out of 40 patients are successfully predicted to be at a risk of having heart attack. Six of the patients have one major vessel colored by fluoroscopy, seven of them are recognized of having two major vessels, one of them have three major vessels and the remained five patients have no major vessels. Only two patients have no ST depression and the remained patients have ST depression in the range of 0.2 to 4.4. Additionally, 14 patients are successfully predicted to be at no risk of having heart attack. All of these patients have no exercise induced angina. One patient has one major vessel and the remained patients have no major vessels. Nine patients out of 14 recognized patients are without ST depression and three patients have probable or definite left ventricular hypertrophy by Estes’ criteria.

Table 2. Evaluation results achieved by the proposed ANFIS-GA algorithm for the training set (Tr) and the testing set (Ts).

Furthermore, to validate the efficiency of the proposed ANFIS-GA algorithm, 9-fold cross validation was used. The dataset including 297 instances was partitioned into nine equal groups of 33 instances. In each step, a unique group was taken as the testing dataset while the remaining groups were taken as the training dataset. Therefore, the evaluation functions for the training and testing datasets were calculated. Consequently, the evaluation results are provided as the average of evaluation functions scores as presented in Table 3. As can be seen, the performance of the proposed algorithm as a result of employing 9-fold cross validation is quite satisfactory.

Table 3. The average of evaluation functions for training sets (Trs) and testing sets (Tss) obtained from employing 9-fold cross validation.

6 Conclusion

A novel Adaptive Neural Fuzzy Inference System (ANFIS)-Genetic Algorithm (GA) algorithm for predicting heart attack was proposed. Using the explainablity of fuzzy rules, training capability of neural network and the optimization power of GA, the algorithm was employed on the Cleveland dataset. The trained Fuzzy Inference System (FIS) obtained from ANFIS was fed into GA to optimize the membership function parameters. As a result, the Root mean Squared Error (RMSE) between the expected results and the predicted ones obtained from the algorithm reduced from 0.82754 to 0.75372. The performance of the proposed algorithm was evaluated by evaluation functions such as sensitivity, specificity, precision, accuracy and RMSE. Also, 9-fold cross validation was employed to evaluate the efficiency of the algorithm. The results showed that the performance of the proposed algorithm in predicting heart attack was quite satisfactory.

To provide explainable results to be useful for clinicians and patients, the algorithm was designed in such a way that it provides matrices corresponding to the predicted patients in various classes of having heart attack. Additionally, an Importance Evaluation Function (IEF) was proposed to investigate the importance of various features in predicting heart attack. Features were discarded one by one and the performance of the model was evaluated based on the evaluation functions. Calculating IEF for all features, the top six features with highest importance were determined. As a result, a new model based on the top important features was generated. Consequently, the performance of the algorithm on the new model was evaluated. The sensitivity, specificity, precision, accuracy and RMSE of the new testing set were obtained 79.1667%, 87.5%, 90.4762%, 82.5% and 0.85158, respectively. Detecting the features with highest importance and showing the satisfactory performance of the algorithm on the new model may be considered as a good criterion for providing explainable predictions based on the importance of the features.

The performance of the proposed algorithm needs to be evaluated through being employed on other datasets as the future work. Additionally, based on the proposed IEF, it was observed that some features have key roles in predicting heart attack. A cooperation with medical experts would be beneficial to validate the predictions.