Keywords

1 Introduction

In this chapter we introduce and compare various algorithms which have been used to enhance the performance of the classification method PROAFTN. It is a supervised learning that learns from a training set and builds set of prototypes to classify new objects [10, 11]. The supervised learning classification methods have been applied extensively in Ambient Assisted Living (AAL) from sensors’ generated data [36]. The enhanced algorithm can be used for instance to activity recognition and behavior analysis in AAL on sensors data [43]. It can be applied for the classification of daily living activities in a smart home using the generated sensors data [36]. Hence, the enhanced PROAFTN classifier can be integrated to active and assisted living systems as well as for smart homes health care monitoring frameworks as any classifiers used in the comparative study presented in this chapter [47]. This chapter is concerned with the supervised learning methods where the given samples or objects have known class labels called also training set, and the target is to build a model from these data to classify unlabeled instances called testing data. We focus on the classification problems in which classes are identified with discrete, or nominal, values indicating for each instance to which class it belongs, among the classes residing in the data set [21, 60]. Supervised classification problems require a classification model that identifies the behaviors and characteristics of the available objects or samples called training set. This model is then used to assign a predefined class to each new object [31]. A variety of research disciplines such as statistics [60], Multiple Criteria Decision Aid (MCDA) [11, 22] and artificial intelligence have addressed the classification problem [39]. The field of MCDA [10, 63] includes a wide variety of tools and methodologies developed for the purpose of helping a decision model (DM) to select from finite sets of alternatives according to two or more criteria [62]. In MCDA, the classification problems can be distinguished from other classification problems within the machine learning framework from two perspectives [2]. The first includes the characteristics describing the objects, which are assumed to have the form of decision criteria, providing not only a description of the objects but also some additional preferential information associated with each attribute [22, 51]. The second includes the nature of the classification pattern, which is defined in both ordinal, known as sorting [35], and nominal, known as multicriteria classification [10, 11, 63]. Classification based machine learning models usually fail to tackle these issues, focusing basically on the accuracy of the results obtained from the classification algorithms [62].

This chapter is devoted to the classification method based on the preference relational models known as outranking relational models as described by Roy [52] and Vincke [59]. The method presented in this paper employs a partial comparison between the objects to be classified and prototypes of the classes on each attribute. Then, it applies a global aggregation using the concordance and non-discordance principle [45]. Therefore it avoids resorting to conventional distance that aggregates the score of all attributes in the same value unit. Hence, it helps to overcome some difficulties encountered when data is expressed in different units and to find the correct preprocessing and normalization data methods. The PROAFTN method uses concordance and non-discordance principle that belongs to MCDA field developed by Roy [52, 54]. Moreover, Zopounidis and Doumpos [63] dividing the classification problems based on MCDA into two categories: sorting problems for methods that utilize preferential ordering of classes and multicriteria classification for nominal sorting there is no preferential ordering of classes. In MCDA field the PROAFTN method is considered as nominal sorting or multicriteria classification [10, 63]. The main characteristic of multicriteria classification is that the classification models do not automatically result only from the training set but depend also on the judgment of an expert. In this chapter we will show how techniques from machine learning and optimization can determine the accurate parameters for fuzzy the classification method PROAFTN [11]. When applying PROAFTN method, we need to learn the value of some parameters, in case of our proposed method we have boundaries of intervals that define the prototype profiles of the classes, the attributes’ weights, etc. To determine the attributes’ intervals, PROAFTN applies the discretization technique as described by Ching et al. [20] from a set of pre-classified objects presenting a training set [13]. Even-though these approaches offer good quality solutions, they still need considerable computational time. The focus of this chapter concerns the application of different optimization techniques based on meta-heuristics for learning PROAFTN method. To apply PROAFTN method over very large data, there are many parameters to be set. If one were to use the exact optimization methods to infer these parameters, the computational effort that would be required is an exponential function of the problem size. Therefore, it is sometimes necessary to abandon the search for the optimal solution, using deterministic algorithms, and simply seek a good solution in a reasonable computational time, using meta-heuristics algorithms. In this paper, we will show how inductive learning method based on meta-heuristic techniques can lead to the efficient multicriteria classification data analysis.

The major characteristics of the multicriteria classification method compared with other well known classifiers can be summarized as follows:

  • The PROAFTN method can apply two learning approaches: deductive or knowledge based and inductive learning. In the deductive approach, the expert has the role of establishing the required parameters for the studied problem for example the experts’ knowledge or rules can be expressed as intervals, which can be implemented easily to build the prototype of the classes. In the inductive approach, the parameters and the classification models are obtained and learned automatically from the training dataset.

  • PROAFTN uses the outranking and preference modeling as proposed by Roy [52] and it hence can be used to gain understanding about the problem domain.

  • PROAFTN uses fuzzy sets for deciding whether an object belongs to a class or not. The fuzzy membership degree gives an idea about its weak and strong membership to the corresponding classes.

The overriding goal of this study is to present a generalized framework to learn the classification method PROAFTN. And then compare the performance and the efficiency of the learned method against well-known machine learning classifiers.

We shall conclude that the integration of machine learning techniques and meta-heuristic optimization to PROAFTN method will lead to significantly more robust and efficient data classification tool.

The rest of the chapter is organized as follows: Sect. 2 overviews the PROAFTN methodology and its notations. Section 3 explains the generalized learning framework for PROAFTN. In Sect. 4 the results of our experiments are reported. Finally, conclusions and future work are drawn in Sect. 5.

2 PROAFTN Method

This section describes the PROAFTN procedure, which belongs to the class of supervised learning to solve classification problems. Based on fuzzy relations between the objects being classified and the prototype of the classes, it seeks to define a membership degree between the objects and the classes of the problem [11]. The PROAFTN method is based on outranking relation as an alternative to the Euclidean distance through the calculation of an indifference index between the object to be assigned and the prototype of the classes obtained through the training phase. Hence, to assign an object to the class PROAFTN follow the rule known as concordance and no discordance principle as used by the outranking relations: if the object a is judged indifferent or similar to prototype of the class according to the majority of attributes “concordance principle” and there is no attribute uses its veto against the affirmation “a is an indifferent to this prototype” “no-discordance principal”, the object a is considered indifferent to this prototype and it should be assigned to the class of this prototype [11, 52].

PROAFTN has been applied to the resolution of many real-world practical problems such as acute leukemia diagnosis [14], asthma treatment [56], cervical tumor segmentation [50], Alzheimer diagnosis [18], e-Health [15] and in optical fiber design [53], asrtocytic and bladder tumors grading by means of computer-aided diagnosis image analysis system [12] and it was also applied to image processing and classification [1]. PROAFTN also has been applied for intrusion detection and analyzing Cyber-attacks [24, 25]. Singh and Arora [55] present an interesting application of fuzzy classification PROAFTN to network intrusion detection. In this paper authors find that PROAFTN outperforms the well known classifier Support Vector Machine [55]. The following subsections describe the notations, the classification methodology, and the inductive approach used by PROAFTN.

2.1 PROAFTN Notations

The PROAFTN notations used in this paper are presented in Table 1.

Table 1. Notations and parameters used by the PROAFTN method

2.2 Fuzzy Intervals

Let A represents a set of objects known as a training set. Consider a new object a to be classified. Let a be described by a set of m attributes \({\{g_1,g_2,...,g_m\}}\). Let the k classes be \({\{C^1,C^2,...,C^k\}}\). The different steps of the procedure are as follows:

For each class \(C^h\), a set \(L_h\) of prototypes is determined. For each prototype \(b^h_i\) and each attribute \(g_j\), an interval \([S^1_j(b^h_i)\), \(S^2_j(b^h_i)]\) is defined where \(S^2_j(b^h_i)\ge S^1_j(b^h_i)\). Two thresholds \(d^1_j(b^h_i)\) and \(d^2_j(b^h_i)\) are introduced to define the fuzzy intervals: the pessimistic interval \([S^1_j(b^h_i), S^2_j(b^h_i)]\) and the optimistic interval \([S^1_j(b^h_i)-d^1_j(b^h_i), S^2_j(b^h_i)+d^2_j(b^h_i)]\). The pessimistic intervals are determined by applying discretization techniques from the training set as described in [26, 28]. The classical data mining techniques, such as decision tree, numerical domains “continuous numeric values” into intervals and the discretized intervals are treated as ordinal “discretized” values during induction. Ramírez-Gallego et al. [29] present more details on different approaches used for data discretization in machine learning. In our case the discretized intervals are treated as intervals and they are not treated as discrete value. As a result, PROAFTN avoids losing information in the induction process and also can use both inductive and deductive learning without transforming the continue values to discrete data. In deductive learning, the rules in our case can also be given by interacting with the expert in the form of ranges or intervals, and then can be optimized during the learning process. Figure 2 depicts the representation of PROAFTN’s intervals. To apply PROAFTN, the pessimistic interval \([S^1_{jh}, S^2_{jh}]\) and the optimistic interval \([q^1_{jh}, q^2_{jh}]\) [13] of each attribute in each class need to be determined. Figure 2 depicts the representation of PROAFTN’s intervals. When evaluating a certain quantity or a measure with a regular or crisp interval, there are two extreme cases, which we should try to avoid. It is possible to make a pessimistic evaluation, but then the interval will appear wider. It is also possible to make an optimistic evaluation, but then there will be a risk of the output measure to get out of limits of the resulting narrow interval, so that the reliability of obtained results will be doubtful. To overcome this problem we have introduced fuzzy approach to features’ or criteria evaluation as presented in Fig. 1 [16]. They permit to have simultaneously both pessimistic and optimistic representations of the studied measure [23]. This is why we introduce the thresholds d1 and d2 for each attribute to define in the same time the both pessimistic interval \([S^1_j(b^h_i), S^2_j(b^h_i)]\) and the optimistic interval \([S^1_j(b^h_i)-d^1_j(b^h_i), S^2_j(b^h_i)+d^2_j(b^h_i)]\) [13]. The carrier of a fuzzy interval (from S1 minus d1 to S2 plus d2) will be chosen so that it guarantees not to override the considered quantity over necessary limits, and the kernel (S1 to S2) will contain the most true-like values [61]. To apply PROAFTN, the pessimistic interval \([S^1_{jh}, S^2_{jh}]\) and the optimistic interval \([q^1_{jh}, q^2_{jh}]\) [13] for each attribute in each class need to be determined, where:

(1)

applied to:

(2)

Hence, \(S^1_{jh}\) = \(S^1_j(b^h_i)\), \(S^2_{jh}\) = \(S^2_j(b^h_i)\), \(q^1_{jh}\) = \(q^1_j(b^h_i)\), \(q^2_{jh}\) = \(q^2_j(b^h_i)\), \(d^1_{jh}\) = \(d^1_j(b^h_i)\), and \(d^2_{jh}\) = \(d^2_j(b^h_i)\). The following subsections explain the stages required to classify the testing object a to the class \(C^h\) using PROAFTN.

Fig. 1.
figure 1

Fuzzy approach for features evaluation

2.3 Computing the Fuzzy Indifference Relation

The initial stage of classification procedure is performed by calculating the fuzzy indifference relation \(I(a,b^h_i)\) or also called the fuzzy resemblance measure. The fuzzy indifference relation is based on the concordance and non-discordance principle which represents the relationship (membership degree) between the object to be assigned and the prototype [10, 11]; it is formulated as:

$$\begin{aligned} I(a,b^h_i)=\left( \sum ^m_{j=1}w_{jh} C_{jh}^i(a,b^h_i) \right) \prod ^m_{j=1} \left( 1-D_{jh}^i(a,b^h_i)^{w_{jh}} \right) \end{aligned}$$
(3)

where \(w_{jh}\) is the weight that measures the importance of a relevant attribute \(g_j\) of a specific class \(C^h\):

$$ w_{jh} \in [0, 1],\ \ \text {and } \ \ \sum _{j=1}^{m}w_{jh}=1 $$

\(C_{jh}^i(a,b^h_i)\) is the degree that measures the closeness of the object a to the prototype \(b^h_i\) according to the attribute \(g_j\).

$$\begin{aligned} C_{jh}^i(a,b_i^h) = \min \{C_{jh}^1(a,b_{i1}^h),C_{jh}^{i2}(a,b_i^h)\}, \end{aligned}$$
(4)

where

$$ C_{jh}^{i1}(a,b_i^h) = \frac{d^1_j(b^h_i)-\min \{S_j^1(b_i^h)-g_j(a),d^1_j(b^h_i)\}}{d^1_j(b^h_i)-\min \{S_j^1(b_i^h)-g_j(a), 0\}} $$

and

$$ C_{jh}^{i2}(a,b_i^h) = \frac{d^2_j(b^h_i)-\min \{g_j(a)-S_j^2(b_i^h),d^2_j(b^h_i)\}}{d^2_j(b^h_i)-\min \{g_j(a)-S_j^2(b_i^h), 0\}} $$

\(D_{jh}^i(a,b^h_i)\), is the discordance index that measures how far the object a is from the prototype \(b^h_i\) according to the attribute \(g_j\). Two veto thresholds \(v_j^1(b^h_i)\) and \(v_j^2(b^h_i)\) [11], are used to define this value, where the object a is considered perfectly different from the prototype \(b^h_i\) based on the value of attribute \(g_j\). In general, the value of veto thresholds are determined by an expert familiar with problem. In this study the effect of the veto thresholds is not considered and only the concordance principle is used, so Eq. (3) is summarized by:

$$\begin{aligned} I(a,b^h_i)=\sum ^m_{j=1}w_{jh} C_{jh}^i(a,b^h_i) \end{aligned}$$
(5)
Fig. 2.
figure 2

Graphical representation of the partial indifference concordance index between the object a and the prototype \(b_i^h\) represented by intervals.

For more illustrations, the three comparative cases between the object a and prototype \(b^h_i\) according to the attribute \(g_j\) are obtained (Fig. 2):

  • case 1 (strong indifference):

    \(C_{jh}^i(a,b_i^h) = 1\) \(\Leftrightarrow g_j(a) \in [S_{jh}^1, S_{jh}^2]\); (i.e., \(S_{jh}^1 \le g_j(a) \le S_{jh}^2\))

  • case 2 (no indifference):

    \(C_{jh}^i(a,b_i^h) = 0\) \( \Leftrightarrow g_j(a) \le q_{jh}^1\), or \(g_j(a) \ge q_{jh}^2\)

  • case 3 (weak indifference):

    The value of \(C_{jh}^i(a,b_i^h) \in (0,1)\) is calculated based on Eq. (4). (i.e., \(g_j(a)\) \(\in \) \([q_{jh}^1, S_{jh}^1]\) or \(g_j(a)\) \(\in \) \([S_{jh}^2, q_{jh}^2]\))

The partial fuzzy indifference relation is represented by the trapezoidal membership function. This type of functions are well studied in the references [42] and [9]. Table 2 presents the performance matrix which is used to evaluate the prototype of classes on a set of attributes. The rows of the matrix represent the prototypes of the classes and the columns represent the attributes. The intersection between the row i and the column j corresponds to the partial indifference relation \(C_{jh}^i(a,b^h_i)\) between the prototype \(b_i^h\) and the object a to be assigned according to the attribute \(g_j\).

Table 2. Performance matrix of prototypes of the class \(C^h\) according to their partial fuzzy indifference relation with an object a to be classified.

2.4 Evaluation of the Membership Degree

The membership degree \(\delta (a,C^h)\) between the object a and the class \(C^h\) is calculated based on the indifference degree between a and its closest neighbor in the set of prototype \(B^h\) of the class \(C^h\). To calculate the degree of membership of the object a to the class \(C^h\), PROAFTN apply the formulae given by the Eq. 6.

$$\begin{aligned} \delta (a,C^h)=\max \{I(a,b^h_1),I(a,b^h_2),...,I(a,b^h_{L_h})\} \end{aligned}$$
(6)

2.5 Assignment of an Object to the Class

Once the membership degree of the testing “unlabeled” object a is calculated, the PROAFTN classifier will assign this object to the right class \(C^h\) by following the decision rule given by Eq. 7.

$$\begin{aligned} a \in C^h \Leftrightarrow \delta (a,C^h) = \max \{\delta (a,C^i)/i \in \{1,...,k\}\} \end{aligned}$$
(7)

3 Introduced Meta-heuristic Algorithms for Learning PROAFTN

The classification procedure used by PROAFTN to assign objects to the preferred classes is summarized in Algorithm 1.

figure a

The rest of the chapter is to present the different methodologies based on machine learning and metaheuristic techniques for learning the classification method PROAFTN from data. The goal of the development of such methodologies is to obtain, from the training data set, the PROAFTN parameters that achieve the highest classification accuracy by applying the Algorithm 1. For this purpose, different learning methodologies are summarized in the following subsections.

3.1 Learn and Improve PROAFTN Based on Machine Learning Techniques

In [7, 13], new methods were proposed to learn and improve PROAFTN based on machine learning techniques. The proposed learning methods consist of two stages: the first stage involves using a novel discretization technique to obtain the required parameters for PROAFTN, and the second stage is the development of a new inductive approach to construct PROAFTN prototypes for classification. Three unsupervised discretization methods – Equal Width Binning (EWB), Equal Frequency Binning (EFB) and k-Means – were used to establish PROAFTN parameters as described in algorithm. Algorithm 2 explains the utilization of discretization techniques and Chebyshev’s theorem to obtain the parameters \(\{S^1,S^2,d^1,d^2\}\) for PROAFTN. Firstly, the discretization technique is used to initially obtain the intervals \(\{S^1_{jh}, S^2_{jh}\}\) for each attribute in each class. Secondly, Chebyshev’s theorem is utilized to tune the generated intervals by discretization technique to obtain \(\{d^1_{jh}, d^2_{jh}\}\) [16].

figure b

Thereafter, an induction approach was introduced to compose PROAFTN prototypes to be used for classification. To evaluate the performance of the proposed approaches, a general comparative study was carried out between DT algorithms (C4.5 and ID3) and PROAFTN based on the proposed learning techniques. That portion of the study concluded that PROAFTN and DT algorithms (C4.5 and ID3) share a very important property: they are both interpretable. In terms of classification accuracy, PROAFTN was able to outperform DT [16].

A superior technique for learning PROAFTN was introduced using Genetic algorithms (GA). More particularly, the developed technique, called GAPRO, integrates k-Means and a genetic algorithm to establish PROAFTN prototypes automatically from data in near optimal form. The purpose of using GA was to automate and optimize the selection of number of clusters and the thresholds to refining the prototypes. Based on the results generated by 12 typical classification problems, it was noticed that the newly proposed approach enabled PROAFTN to outperform widely used classification methods. The general description of using k-Means with GA to learn the PROAFTN classifier is documented in [7, 13]. A GA is an adaptive metaheuristic search algorithm based on the concepts of natural selection and biological evolution. GA principles are inspired by Charles Darwin’s theory of “survival of the fittest”; that is, the strong tend to adapt and survive while the weak tend to vanish. GA was first introduced by John H. Holland in the 1970s and further developed in 1975 to allow computers to evolve solutions to difficult search and combinatorial systems, such as function optimization and machine learning. As reported in the literature, GA represents an intelligent exploitation of a random search used to solve optimization problems. In spite of its stochastic behavior, GA is generally quite effective for rapid global searches for large, non-linear and poorly understood spaces; it exploits historical information to direct the search into the region of better performance within the search space [32, 49].

In this work, GA is utilized to approximately obtain the best values for the threshold \(\beta \) and the number of clusters \(\kappa \). The threshold \(\beta \) represents the ratio of the total number of objects from training set within each interval of each attribute in each class. As discussed earlier, to apply the discretization k-Means, the best \(\kappa \) value is required to obtain the intervals: \([S^1_j(b^h_i)\), \(S^2_j(b^h_i)]\), \([d^1_j(b^h_i)\), \(d^2_j(b^h_i)]\) and thresholds \(\beta \) as illustrated in Algorithm 4. In addition, the best value of \(\beta \) is also required to build the classification model that contains the best prototypes as described in Algorithm 4. Furthermore, since each dataset may have different values for \(\kappa \) and \(\beta \), finding the best values for \(\beta \) and \(\kappa \) to compose PROAFTN prototypes is considered a difficult optimization task. As a result, GA is utilized to obtain these values. Within this framework, the value for \(\beta \) varies between 0 and 1 (i.e., \(\beta \in [0, 1]\)), and the value for \(\kappa \) changes from 2 to 9 (\( \kappa \in {2,..., 9}\)). The formulation of the optimization problem, which is based on maximizing classification accuracy to provide the optimal parameters (\(\kappa \) and \(\beta \)), is defined as:

(12)

where the objective or fitness function f depends on the classification accuracy and n represents the set of training objects/samples to be assigned to different classes. The procedure for calculating the fitness function f is described in Algorithm 3. In this regard, the result of the optimization problem defined in Eq. (12) can vary within the interval [0, 100].

figure c
figure d

3.2 Learning PROAFTN Using Particle Swarm Optimization

A new methodology based on the particle swarm optimization (PSO) algorithm was introduced to learn PROAFTN. First, an optimization model was formulated, and thereafter a PSO was used to solve it. PSO was proposed to induce the classification model for PROAFTN in so-called PSOPRO by inferring the best parameters from data with high classification accuracy. It was found that PSOPRO is an efficient approach for data classification. The performance of PSOPRO applied to different classification datasets demonstrates that PSOPRO outperforms the well-known classification methods.

PSO is an efficient evolutionary optimization algorithm using the social behavior of living organisms to explore the search space. Furthermore, PSO is easy to code and requires few control parameters [17]. The proposed approach employs PSO for training and improving the efficiency of the PROAFTN classifier. In this perspective, the optimization model is first formulated, and thereafter a PSO algorithm is used for solving it. During the learning stage, PSO uses training samples to induce the best PROAFTN parameters in the form of prototypes. Then, these prototypes, which represent the classification model, are used for assigning unknown samples. The target is to obtain the set of prototypes that maximizes the classification accuracy on each dataset.

The general description of the PSO methodology and its application is described in [6]. As discussed earlier, to apply PROAFTN, the pessimistic interval \([S^1_{jh}, S^2_{jh}]\) and the optimistic interval \([q^1_{jh}, q^2_{jh}]\) for each attribute in each class need to be determined, where:

(13)

applied to:

(14)

Hence, \(S^1_{jh} = S^1_j(b^h_i)\), \(S^2_{jh} = S^2_j(b^h_i)\), \(q^1_{jh} = q^1_j(b^h_i)\), \(q^2_{jh} = q^2_j(b^h_i)\), \(d^1_{jh} = d^1_j(b^h_i)\), and \(d^2_{jh} = d^2_j(b^h_i)\).

As mentioned above, to apply PROAFTN, the intervals \([S^1_{jh}, S^2_{jh}]\) and \([q^1_{jh}, q^2_{jh}]\) satisfy the constraints in Eq. (14) and the weights \(w_{jh}\) must be obtained for each attribute \(g_{j}\) in class \(C^h\). To simplify the constraints in Eq. (14), the variable substitution based on Eq. (13) is used. As a result, the parameters \(d^1_{jh}\) and \(d^2_{jh}\) are used instead of \(q^1_{jh}\) and \(q^2_{jh}\), respectively. Therefore, the optimization problem, which is based on maximizing classification accuracy providing the optimal parameters \(S^1_{jh}, S^2_{jh}, d^1_{jh}, d^2_{jh}\) and \(w_{jh}\), is defined here,

(15)

where f is the function that calculates the classification accuracy, and n represents the number of training samples used during the optimization. The procedure for calculating the fitness function \(f(S^1_{jh},S^2_{jh},d^1_{jh},d^2_{jh},w_{jh})\) is described in Table 3.

Table 3. The steps for calculating the objective function f.

To solve the optimization problem presented in Eq. (15), PSO is adopted here. The problem dimension D (i.e., the number of parameters in the optimization problem) is described as follows: Each particle \(\mathbf {x}\) is composed of the parameters \(S^1_{jh}, S^2_{jh}, d^1_{jh}, d^2_{jh}\) and \(w_{jh}\), for all \(j=1,2,...,m\) and \(h=1,2,...,k\). Therefore, each particle in the population is composed of \(D = 5 \times m \times k\) real values (i.e., \(D=dim(\mathbf {x})\)).

3.3 Differential Evolution for Learning PROAFTN

A new learning strategy based on the Differential Evolution (DE) algorithm was proposed for obtaining the best PROAFTN parameters. The proposed strategy is called DEPRO. DE is an efficient metaheuristics optimisation algorithm based on a simple mathematical structure that mimics a complex process of evolution. Based on results generated from a variety of public datasets, DEPRO provides excellent results, outperforming the most common classification algorithms.

In this direction, a new learning approach based on DE is proposed for learning the PROAFTN method. More particularly, DE is introduced here to solve the optimization problem introduced in Eq. (15). The new proposed learning technique, called DEPRO, utilizes DE to train and improve the PROAFTN classifier. In this context, DE is utilized as an inductive learning approach to infer the best PROAFTN parameters from the training samples. The generated parameters are then used to compose the prototypes, which represent the classification model that will be used for assigning unknown samples. The target is to find the prototypes that maximize the classification accuracy on each dataset. The full description of the DE methodology and its application to learn PROAFTN is described in [4]. The general procedure of the DE algorithm is presented in Algorithm 5.

figure e

The procedure for calculating the fitness function \(f(S^1_{jh},S^2_{jh},d^1_{jh},d^2_{jh},w_{jh})\) is described in Table 3. The mutation and crossover steps to update the elements (genes) of the trial individual \(\mathbf v _i\) based DEPRO are performed as follows:

$$\begin{aligned} v_{ihj\tau }= {\left\{ \begin{array}{ll} x_{r_1hj\tau }+ F(x_{r_2hj\tau } - x_{r_3hj\tau }), &{} \text {if} \ (rand_{\tau } < \kappa ) \ \ \text {or} \ \ (\rho = \tau )\\ x_{ihj\tau }, &{}\text {otherwise.} \end{array}\right. } \end{aligned}$$
(16)
$$ i, r_1, r_2, r_3 \in \{1, ..., N_{pop}\}, \ \ i \ne r_1 \ne r_2 \ne r_3; \ $$
$$ h = 1, ..., k; \ \ j = 1, ..., m; \ \ \tau = 1, ..., D$$

where F is the mutation factor \(\in [0, 2]\), and \(\kappa \) is the crossover factor. This modified operation (i.e., Eq. (16)) forces the mutation and crossover process to be applied on each gene \(\tau \) selected randomly for each set of 5 parameters \(S^1_{jh}, S^2_{jh}, d^1_{jh}, d^2_{jh}\) and \(w_{jh}\) in \(\mathbf v _i\) for all \(j=1,2,...,m\) and \(h=1,2,...,k\).

3.4 A Hybrid Metaheuristic Framework for Establishing PROAFTN Parameters

As discussed earlier, there are different ways to classify the behavior of metaheuristic algorithms based on their characteristics. One of these major characteristics is to identify whether the evolution strategy is based on population-based search or single point search. Population-based methods deal in every iteration with a set of solutions rather than with a single solution. As a result, population-based algorithms have the capability to efficiently explore the search space, whereas the strength of single-point solution methods is that they provide a structured way to explore a promising region in the search space. Therefore, a promising area in the search space is searched in a more intensive way by using single-point solution methods than by using population-based methods [58]. Population-based methods can be augmented with single-point solution methods to improve the search mechanism. While the use of population-based methods ensures an exploration of the search space, the use of single-point techniques helps to identify good areas in the search space. One of the most popular ways of hybridization concerns the use of single-point search methods in population-based methods. Thus, hybridization that in some way manages to combine the advantages of population-based methods with the strengths of single-point methods is often very successful, which is the motivation and the case for this work. In many applications, hybrids metaheuristics have proved to be quite beneficial in improving the fitness of individuals [37, 38, 57]. In this methodology, a new hybrid of metaheuristics approaches were introduced to obtain the best PROAFTN parameters configuration for a given problem. The two proposed hybrid approaches are: (1) Particle Swarm optimization (PSO) and Reduced Variable Neighborhood Search (RVNS), called PSOPRO-RVNS; and (2) Differential Evolution (DE) and RVNS, called DEPRO-RVNS. Based on the generated results on both training and testing data, it was shown that the performance of PROAFTN is significantly improved compared with the previous study presented in the previous sections (Sects. 3.2 and 3.3). Furthermore, the experimental study demonstrated that PSOPRO-RVNS and DEPRO-RVNS strongly outperform well-known machine learning classifiers in a variety of problems. RVNS is a variation of the metaheuristic Variable Neighborhood Search (VNS) [33, 34]. The basic idea of the VNS algorithm is to find a solution in the search space with a systematic change of neighborhood. The basic VNS is very useful for approximate solutions for many combinatorial and global optimization problems; however, the major limitation is that it is very time consuming because of the utilization of ingredient-based approaches as it is used as a local search routine. RVNS uses a different approach; the solutions are drawn randomly from their neighborhood. The incumbent solution is replaced if a better solution is found. RVNS is simple, efficient and provides good results with low computational cost [30, 34]. In RVNS, two procedures are used: shake and move. Starting from the initial solution (the position of prematurely converged individuals) \(\mathbf {x}\), the algorithm selects a random solution \(\mathbf {x}'\) from the initial solution’s neighborhood. If the generated \(\mathbf {x}'\) is better than \(\mathbf {x}\), it replaces \(\mathbf {x}\) and the algorithm starts all over again with the same neighborhood. Otherwise, the algorithm continues with the next neighborhood structure. The pseudo-code of RVNS is given in Algorithm 6.

figure f

In [13] the RVNS heuristics is used to learn the PROAFTN classifier by optimizing its parameters that are presented as intervals namely the pessimistic and optimistic intervals. In this light, a hybrid of metaheuristics is proposed here for training the PROAFTN method. In this regard, the two different hybrid approaches PSO augmented with RVNS (called PSOPRO-RVNS) and DE augmented with RVNS (called DEPRO-RVNS) are proposed for solving this optimization problem. The two proposed training techniques presented in (Sects. 3.2 and 3.3) are integrated with the single point search RVNS, to improve the performance of PROAFTN. The details on how DE and RVNS have been used together to learn the PROAFTN classifier is described in [5]. And in the same context, the details of the application of PSO and RVNS to learn PROAFTN is described in [3]. To use RVNS to find a better solution provided by PSO or DE in each iteration, the following equations are considered to update the boundary for the previous solution \(\mathbf {x}\) containing (\(S^1_{jh},S^2_{jh}, d^1_{jh}, d^2_{jh}\)) parameters:

$$\begin{aligned} l_{\lambda jbh}= & {} x_{\lambda jbh} - (k/k_{max})x_{\lambda jbh} \end{aligned}$$
(17)
$$\begin{aligned} use \,x instead\, of\, s u_{\lambda jbh}= & {} x_{\lambda jbh} + (k/k_{max})x_{\lambda jbh} \end{aligned}$$
(18)

where \(l_{\lambda jbh}\) and \(u_{\lambda jbh}\) are the lower and upper bounds for each element \(\lambda \in [1, \ldots , D]\). Factor \(k/k_{max}\) is used to define the boundary for each element and \(x_{\lambda jbh}\) is the previous solution for each element \(\lambda \in [1, \ldots , D]\) provided by PSO.

The use of the hybrid PSO/DE augmented with RVNS for learning PROAFTN is explained here and for more details please see [5]. Using PSO, the elements for each particle position \(\mathbf {x}_i\) consisting of the parameters \(S^1_{jh}, S^2_{jh}, d^1_{jh}\) and \(d^2_{jh}\) are updated using:

$$\begin{aligned} x_{i\lambda jbh}(t + 1) = x_{i\lambda jbh} (t) + v_{i\lambda jbh} (t + 1) \end{aligned}$$
(19)

where the velocity update \(\mathbf {v}_i\) for each element based on \(\mathbf {P}^{Best}_i\) and \(\mathbf {G}^{Best}\) is formulated as:

$$\begin{aligned} {\begin{matrix} v_{i\lambda jbh}(t + 1) = \varpi (t)v_{i\lambda jbh}(t) + \\ \tau _1\rho _1 (P^{Best}_{i\lambda jbh}- x_{i\lambda jbh} (t)) + \\ \tau _2 \rho _2 (G^{Best}_{\lambda jbh}-x_{i\lambda jbh}(t)) \end{matrix}} \end{aligned}$$
(20)
$$\begin{aligned} i = 1, ..., N_{pop}; \ \ \lambda = 1, ..., D\; \end{aligned}$$
$$\begin{aligned} j = 1, ..., m; \ \ b = 1, ..., L_h; \ \ h = 1, ..., k \end{aligned}$$

where \(\varpi (t)\) is the inertia weight that controls the exploration of the search space. \(\tau _1 \) and \(\tau _2 \) are the individual and social components/weights, respectively. \(\rho _1\) and \(\rho _2\) are random numbers between 0 and 1. \(\mathbf {P}^{Best}_i(t)\) is the personal best position of the particle i, and \(\mathbf {G}^{Best}(t)\) is the neighborhood best position of particle i. Algorithm 6 demonstrates the required steps to evolve the velocity \(\mathbf {v}_i\) and particle position \(\mathbf {x}_i\) for each particle containing PROAFTN parameters. The shaking phase to randomly generate the elements of \(\mathbf {x}'\) is given by:

$$\begin{aligned} x'_{\lambda jbh} = l_{\lambda jbh} + (u_{\lambda jbh}-l_{\lambda jbh}). rand[0,1] \end{aligned}$$
(21)

Accordingly, the moving is applied as:

$$\begin{aligned} \text {If} \ {f'(x'_{\lambda jbh})> f(x_{\lambda jbh})} \ \ \text {then} \ \ x_{\lambda jbh} \ = \ x_{\lambda jbh}' \end{aligned}$$
(22)

The steps that explain the employment of RVNS to improve PROAFTN parameters are listed in Algorithm 7.

figure g

4 Comparative Study with PROAFTN and Well Known Classifiers

The proposed methodologies were implemented in Java and applied to 12 popular datasets: Breast Cancer Wisconsin Original (BCancer), Transfusion Service Center (Blood), Heart Disease (Heart), Hepatitis, Haberman’s Survival (HM), Iris, Liver Disorders (Liver), Mammographic Mass (MM), Pima Indians Diabetes (Pima), Statlog Australian Credit Approval (STAust), Teaching Assistant Evaluation (TA), and Wine. The details of the datasets’ description and their dimensionality are presented in Table 4. The datasets are in the public domain and are available at the University of California at Irvine (UCI) Machine Learning Repository database [8].

Table 4. Description of datasets used in our experiments.

To summarize, a comparison of the various approaches introduced throughout this research for learning PROAFTN – GAPRO, PSOPRO, DEPRO, PSOPRO-RVNS and DEPRO-RVNS – is presented in Table 5. One can see that DEPRO-RVNS and PSOPRO-RVNS perform the best.

Table 5. The performance of all approaches for learning PROAFTN introduced in this research study based on classification accuracy (in %). The average accuracy and average ranking is also included.

Table 7 summarizes and gives robust analysis on a comparison that includes the developed approaches of learning PROAFTN classifier against other classifiers. As observed, both approaches DEPRO-RVNS and PSOPRO-RVNS strongly outperform other classifiers. Therefore, the developed approaches can be classified into three groups, based on their performances:

  • Best approaches: DEPRO-RVNS and PSOPRO-RVNS.

  • Middle approaches: DEPRO and PSOPRO.

  • Weakest approach: GA-PRO.

It should be noted also that DEPRO-RVNS and PSOPRO-RVNS are efficient in terms of computation speed. One of the advantages of DE and PSO over other global optimization methods is that they often converge faster and with more certainty than other methods. Furthermore, utilizing RVNS inside DE and PSO improved the search for good solutions in a shorter time (Table 5).

Table 6. Experimental results based on classification accuracy (in %) to measure the performance of the well-known classifiers on the same datasets
Table 7. Mean accuracy rankings. The algorithms developed in this paper are marked in bold.

Comparison with was done against implementations provided in WEKA [27] for neural network multi-level perceptron (NN MLD), naive Bayes (NB), decision trees (PART), C4.5 and k nearest neighbour (knn). We used H2O for deep learning (h2o DL) [19] and generalized linear models (h2o GLM) [44]. We used R’s implementation of random forest (RFOREST) [41] with n = 500 trees. PROAFTN and decision trees share a very important property: both of them use the white box model. Decision trees and PROAFTN can generate classification models which can be easily explained and interpreted. However, when evaluating any classification method there is another important factor to be considered: classification accuracy. Based on the experimental study presented in Sect. 4, the PROAFTN method has proven to generate a higher classification accuracy than decision tree such as C4.5 [46] and other well-known classifiers learning algorithms including Naive Bayes, Support Vector Machines (SVM), Neural Network (NN), K- Nearest Neighbor K-NN, and Rule Learner (see Table 6). That can be explain by the fact that PROAFTN using fuzzy intervals. A general comparison between PROAFTN based on the proposed learning approaches adopted in this paper (PRO-BPLA) and other machine learning classifiers is summarized in Table 8. The observations made in this table are based on evidence of existing empirical and theoretical studies as presented in [40]. We have also added some evidence based on the results obtained using the developed learning methodology introduced in this research study. As a summary, Table 8 compares the properties of some well known machine learning classifiers against the properties of the classification method PROAFTN.

Table 8. Summary of the of well-known classifiers versus PRO-BPLA properties (the best rating is **** and the worst is *)

In this chapter, we have presented the implementation of machine learning and metaheuristics algorithms for parameters training of multicriteria classification method. We have shown that learning techniques based on metaheuristics proved to be a successful approach for optimizing the learning of PROAFTN classification method and thus greatly improving its performances. As has been demonstrated, every classification algorithm has its strengths and limitations. More particularly, the characteristics of the method and whether it is strong or weak depend on the situation or on the problem. For instance, assume the problem at hand is a medical dataset and the interest is to look for a classification method for medical diagnostics. Suppose the executives and experts are looking for a high level of classification accuracy and at the same time they are very keen to know more details about the classification process (e.g., why the patient is classified to this category of disease). In such circumstances, classifiers such as Deep Learning networks, k-NN, or SVM may not be an appropriate choice, because of the limited interpret-ability of their classification models. Although deep learning networks have been successfully applied to some health-care application and in particularly into medical imaging, they suffered from some limitations such as the limited interpret-ability of their classification results; they require a very large balanced labeled data set; the preprocessing or change of input domain is often required to bring all the input data to the same scale [48]. Thus, there is a need to look for other classifiers that reason about their outputs and can generate good classification accuracy, such as DTs (C4.5, ID3), NB, or PROAFTN.

Based on the experimental and the comparative study presented in Table 8, the PROAFTN method based on our proposed learning approaches has good accuracy in most instances and can deal with all types of data without sensitivity to noise. PROAFTN uses the pairwise comparison and therefore, there is no need for looking for suitable normalization technique of data like the case of other classifiers. Furthermore, PROAFTN is a transparent and interpretable classifier where it’s easy to generalize the classification rules from the obtained prototypes. It can use both approaches deductive and inductive learning, which allow us to use in the same time historical data with expert judgment to compose the classification model. To sum up, there is no complete or comprehensive classification algorithm that can handle or fit all classification problems. In response to this deficiency, the major task of this work is to review an integration of methodologies from three major fields, MCDA, machine learning, and optimization based metaheuristics, through the aforementioned classification method PROAFTN. The target of this study was to exploit the machine learning techniques and the optimization approaches to improve the performance of PROAFTN. The aim is to find a good suitable and comprehensive (interpretable) classification procedure that can be applied efficiently in many applications including the ambient assisted living environments.

5 Conclusions and Future Work

The target of this chapter is to exploit the machine learning techniques and the optimization approaches to improve the performance of PROAFTN. The aim is to find a good suitable and comprehensive (interpretable) classification procedure that can be applied efficiently in health applications including the ambient assisted living environments. This chapter describes the ability of the metaheuristics when embedded to the classification method PROAFTN in order to classify new objects. To do this we compared the improved PROAFTN methodology with those reported previously on the same data and same validation technique (10-cross validation). In addition to reviewing several approaches to modeling and learning classification method PROAFTN, this chapter also presents new ideas to further research in the areas of data mining and machine learning. Below are some possible directions for future research.

  1. 1.

    The fact that PROAFTN has several parameters to be obtained for each attribute and for each class, which provides more information to assign objects to the closest class. However, in some cases this may cause some limitation on the speed of learning, particularly when using metaheuristics, as we presented in this paper. Possible future solutions could be summarized as follows:

    • Utilizing different approaches for obtaining the weights. One possible direction is to use a features ranking approach by using some strong algorithms that perform well in the aspect of dimensionality reduction.

    • Determining intervals bounds for more than one prototype before performing optimization. This would involve establishing the intervals’ bounds a priori by using some clustering techniques, hence improving and speeding up the search and improving the likelihood of finding the best solutions.

  2. 2.

    As we know the performance of approaches based on the choice of control parameters varies from one application to another. However, in this work the control parameters are fixed for all applications. A better control of parameter choice for the metaheuristics based PROAFTN algorithms will be investigated.

  3. 3.

    To speed up the PROAFTN learning process, possible improvement could be made by using parallel computation. The different processors can deal with the fold independently in the cross validation folds process. The parallelism can be also applied in the composition of prototypes of each class.

  4. 4.

    In this chapter, an inductive learning is presented to build the classification models for the PROAFTN method. PROAFTN also can apply the deductive learning that allows the introduction of the given knowledge in setting PROAFTN parameters such intervals and/or weights to build the prototype of classes.