This book is a research compendium in the area of Fuzzy Granular Computing . Two main branches are handled, a proposed fuzzy granulating algorithm, and higher-type information granule formation, although more work was performed on the latter.

3.1 Fuzzy Granular Gravitational Clustering Algorithm

A Fuzzy granulation algorithm was proposed in [1, 2] which uses the concept of gravitational forces to form information granules, and was aptly named Fuzzy Granular Gravitational Clustering Algorithm (FGGCA).

It is based on Newton’s Law of Universal Gravitation, as shown in Eq. (3.1) which calculates the gravitational force between two bodies. Where \( F_{g} \) is the gravitational force, G is a gravitational constant which equals 6.674 × 10−11, m 1 and m 2 are the masses of both bodies, and \( {\left\| {x_{ 1} ,x_{ 2} } \right\|} \) is the distance between the centers of mass between both bodies.

$$ {F_{g} = G\frac{{m_{ 1} m_{ 2} }}{{\left\| {x_{ 1} ,x_{ 2} } \right\|^{ 2} }}} $$
(3.1)

The premise of the algorithm is that gravitational forces are simulated inside a confined and normalized space [−1, 1], where each datum represents a point body with a mass of 100 kg. Here, each attribute, or feature, of a dataset is considered a dimension in Euclidean n-space.

The algorithm starts by first calculating all gravitational forces between each data point, as shown in Eq. (3.2). Where \( {F_{ij} } \) is the gravitational force between the i-th and j-th data points, and \( i \ne j \).

$$ {F_{ij} = G\frac{{m_{i} m_{j} }}{{\left\| {x_{i} ,x_{j} } \right\|^{ 2} }}} $$
(3.2)

The sum of all gravitational forces per each data point must then be calculated, as shown in Eq. (3.3), this calculates an individual gravitational force density for each data point relative to the rest of the data points.

$$ {F_{i}^{density} = \sum\limits_{j = 1}^{n} {F_{ij} } } $$
(3.3)

A premise is given where data points with more gravitational force density are more likely to be actual cluster centers than those with the lowest densities. For this, all densities must be sorted into descending order, from highest density to lowest density. This also rearranges all data points, from \( {x \in X} \) to \( {x^{sorted} \in X} \).

Having sorted all data points, the next step is to start joining nearby, and gravitationally dense, points with each other in a pairwise manner. This process is done iteratively for all data points while the following condition is true: IF \( {{ \hbox{min} }\left( {\left\| {x_{i} ,x_{j} } \right\|} \right) < radius} \) THEN start joining data points. Where \( i \ne j \), and radius is a user criterion used to decide the maximum size of each information granule. Here, the behavior of different values of radius affect in such a way that for a small size, more information granules will be found, whereas if the value is high, there will be less amount of information granules. When the condition is met, it means that \( x_{i} \) is more gravitationally dense than \( x_{j} \) therefore \( x_{i} \) absorbs \( x_{j} \), as shown in Eqs. (3.4) and (3.5).

$$ {x_{i} \cup x_{j} } $$
(3.4)
$$ {m_{i} \cup m_{j} } $$
(3.5)

The joining of \( x_{j} \) unto \( x_{i} \) updates the position of \( x_{i} \), via Eqs. (3.6)–(3.8). Where \( \rho_{barycenter} \) is the gravitational center of mass between both data points, \( \lambda \) is a scaling factor between \( x_{i} \) and \( x_{j} \).

$$ {\rho_{barycenter} = \left( {\frac{{m_{j} }}{{m_{i} + m_{j} }}} \right)\left\| {x_{i} ,x_{j} } \right\|} $$
(3.6)
$$ {\lambda = \frac{{\rho_{barycenter} }}{{\left\| {x_{i} ,x_{j} } \right\|}}} $$
(3.7)
$$ {x_{i} = x_{i} + \lambda \left( {x_{j} - x_{i} } \right)} $$
(3.8)

The described algorithm iterates this process for all \( {x \in X} \) until all data points have been joined together, or the distance in the condition previously shown, with the user criterion of radius, is not met. Each iteration of this process usually reduces the cardinality of X to about half its original size, all while maintaining the original sum of mass in the system.

After this iterative process ends the final step before iterating once again from Eq. (3.2), is to adjust the user criterion value of radius in order to control the amount and size of each information granule, this is done via Eq. (3.9). Where \( \Delta \) is another user set criterion for how fast radius will change.

$$ {radius = radius \times \Delta } $$
(3.9)

The algorithm will iterate until all remaining interactions are beyond radius. Once finished, the centers for each information granule are obtained; the following description will now show how the information granule’s size is calculated. These information granule sizes are also found by the help of gravitational forces.

Since FGGCA is fuzzy in nature, membership functions form the rule set of the model. Therefore Gaussian membership functions were chosen to represent each individual fuzzy information granule. As a result, two parameters are required, a location point \( x \) (as calculated by the previously described algorithm) and a standard deviation \( \sigma \) to delimit the size of the fuzzy information granule.

To calculate the required \( \sigma \) values, let us define the found cluster center as \( {x^{c} } \). Where the premise is to find which cluster center \( {x^{c} } \) exerts more gravitational force upon x when \( {x^{c} \ne x_{i} } \), when found \( x_{i} \in x^{c} \) is established. This process iterates through all \( {x \in X} \).

The algorithm to find the size of the fuzzy information granules is dictated by first obtaining the sets of data points upon which each \( {x^{c} } \) have more influence over, as shown in Eq. (3.10). Where \( F_{j} \) is the gravitational force exerted between \( x_{j}^{c} \) and \( x_{i} \), with conditional \( {x_{{^{j} }}^{c} \ne x_{i} } \). Afterwards after \( x_{i} \) iterates through all \( x_{j}^{c} \), the \( F_{j} \) which exerts more gravitational influence adds \( {x_{i} } \) into its set \( {x_{{setOfCluster_{j} }} } \).

$$ {F_{j} = G\frac{{m_{j}^{c} m_{i} }}{{\left\| {x_{j}^{c} ,x_{i} } \right\|^{ 2} }}} $$
(3.10)

When all \( {x \in X} \) are transferred into \( {x_{{setOfCluster_{j} }} } \) the \( {\sigma_{k}^{c} } \) values can now be calculated for each \( {x^{c} } \), as shown in Eq. (3.11). Where \( {\sigma_{k}^{c} } \) is the standard deviation for each cluster center \( {x^{c} } \) on the k-th input.

$$ {\sigma_{k}^{c} = std\left( {x_{{setOfCluster_{j} }} } \right)} $$
(3.11)

The described algorithm so far has given a technique for calculating fuzzy information granules which are represented by the antecedents in a FLS, yet the antecedents have not been reviewed. The proposed granular FLS has Takagi-Sugeno-Kang (TSK) [35] consequents. To adjust these linear first Order polynomials, a known method based on Least Square Estimator method (LSE) to adjust all coefficient parameters [6] is used.

The consequent parameter adjustment considers c cluster centers \( {\left\{ {x_{ 1}^{{}} ,x_{ 2}^{{}} , \ldots ,x_{c}^{{}} } \right\}} \). The input space variables are represented by \( {y_{i}^{{}} } \), and the output space variables are represented by \( {z_{i}^{{}} } \). Where \( {z_{i}^{{}} } \) is in the form of a first Order linear function, as shown in Eq. (3.12), \( {G_{i} } \) are the input constant parameters, and \( {h_{i} } \) the constants for each \( {z_{i}^{{}} } \).

$$ {z_{i}^{{}} = G_{i} y + h_{i} } $$
(3.12)

As each \( {x_{i}^{{}} } \) defines a fuzzy rule, and training data \( x \) exists, the firing strengths \( {\omega_{i} } \) per each rule are calculated via Eq. (3.13), with the consideration that all membership functions are Gaussian. Where each \( {x_{i} } \) is used alongside the obtained \( {\sigma^{c} } \) and \( {x^{c} } \).

$$ \omega_{i} = {\text{e}}^{{ - \frac{1}{2}\left\| {\frac{{x_{i} - x^{c} }}{{\sigma^{c} }}} \right\|^{2} }} $$
(3.13)

A parameter \( {\tau_{i} } \) is defined in order to use LSE , as shown in Eq. (3.14). Which can be rewritten as Eq. (3.15). Now, \( {z_{i}^{{}} } \) can be defined to the form shown in Eq. (3.16), which describes a matrix of parameters to be optimized by the LSE method, taking the form of AX = B.

$$ {\tau_{i} = \frac{{\omega_{i} }}{{\sum {\omega_{i} } }}} $$
(3.14)
$$ {z = \sum\limits_{i - 1}^{c} {\tau_{i} z_{i}^{{}} = \sum\limits_{i = 1}^{c} {\tau_{i} \left( {G_{i} x + h_{i} } \right)} } } $$
(3.15)
$$ {\left[ {\begin{array}{*{20}l} {z_{ 1}^{T} } \hfill \\ \vdots \hfill \\ {z_{n}^{T} } \hfill \\ \end{array} } \right] = \left[ {\begin{array}{*{20}l} {\begin{array}{*{20}l} {\tau_{ 1 , 1} x_{ 1}^{T} } \hfill & {\tau_{ 1 , 1} } \hfill & \cdots \hfill & {\tau_{c, 1} x_{ 1}^{T} } \hfill & {\tau_{c, 1} } \hfill \\ \end{array} } \hfill \\ \vdots \hfill \\ {\begin{array}{*{20}l} {\tau_{ 1 ,n} x_{n}^{T} } \hfill & {\tau_{ 1 ,n} } \hfill & \cdots \hfill & {\tau_{c,n} x_{n}^{T} } \hfill & {\tau_{c,n} } \hfill \\ \end{array} } \hfill \\ \end{array} } \right]\left[ {\begin{array}{*{20}l} {G_{ 1}^{T} } \hfill \\ {h_{ 1}^{T} } \hfill \\ \vdots \hfill \\ {G_{c}^{T} } \hfill \\ {h_{c}^{T} } \hfill \\ \end{array} } \right]} $$
(3.16)

where,

$$ {B = \left[ {\begin{array}{*{20}l} {z_{ 1}^{T} } \hfill \\ \vdots \hfill \\ {z_{n}^{T} } \hfill \\ \end{array} } \right]} $$
(3.17)
$$ {A = \left[ {\begin{array}{*{20}l} {\tau_{ 1 , 1} x_{ 1}^{T} } \hfill & {\tau_{ 1 , 1} } \hfill & \cdots \hfill & {\tau_{c, 1} x_{ 1}^{T} } \hfill & {\tau_{c, 1} } \hfill \\ \vdots \hfill & {} \hfill & {} \hfill & {} \hfill & {} \hfill \\ {\tau_{ 1 ,n} x_{n}^{T} } \hfill & {\tau_{ 1 ,n} } \hfill & \cdots \hfill & {\tau_{c,n} x_{n}^{T} } \hfill & {\tau_{c,n} } \hfill \\ \end{array} } \right]} $$
(3.18)
$$ {X = \left[ {\begin{array}{*{20}l} {G_{ 1}^{T} } \hfill \\ {h_{ 1}^{T} } \hfill \\ \vdots \hfill \\ {G_{c}^{T} } \hfill \\ {h_{c}^{T} } \hfill \\ \end{array} } \right]} $$
(3.19)

Knowing that LSE has the form \( {AX = B} \). Where B, in Eq. (3.17), is a matrix of the output values; A, in Eq. (3.18), is a matrix of constants; and X, in Eq. (3.19), is a matrix of estimated parameters. The final solution is given by Eq. (3.20).

$$ {X = \left( {A^{T} A} \right)^{ - 1} A^{T} B} $$
(3.20)

As already shown, the FGGCA is separated into two sections, apart from the TSK consequent learning algorithm, which are finding the clusters, and calculating the information granule sizes. The two following procedures summarize both this processes. In Appendix C.2, used code is shown.

procedure findClusters(x, radius, \( \Delta \))

procedure findSizeOfGranules(\( x \), \( {x^{c} } \))

In Appendix B.1, some figures are shown which visually describe the behavior of the FGGCA when dealing with 2D data.

3.2 Higher-Type Information Granule Formation

Most research in this book focuses on the formation of Fuzzy Higher-Type information granules which leads to multiple approaches being presented.

3.2.1 A Hybrid Method for IT2 TSK Formation Based on the Principle of Justifiable Granularity and PSO for Spread Optimization

An initial method for the formation of IT2 TSK FIS was proposed in [7] which introduces a method for using the principle of justifiable granularity as a means to heuristically obtaining an interval which reflects the IT2 FS uncertainty , afterwards its IT2 TSK consequent spreads are optimized via a Particle Swarm Optimization (PSO) algorithm. This method described in more detail in the following paragraphs.

Considering that Gaussian membership functions with uncertain means will be used, shown in Fig. 3.1, the required inputs of the proposed method are cluster centers from any clustering algorithm as to define the initial rule set for the IT2 FIS and subsets of data which are best represented by each cluster center.

Fig. 3.1
figure 1

Sample IT2 Gaussian membership function with uncertain mean

The process which obtains the representative subsets for each cluster center is executed via Eq. (3.21). Where \( \left\| {x_{i} ,x_{j} } \right\| \) is the Euclidean distance in n-space between a cluster \( x_{c} \) and a datum \( x_{i} \).

$$ \left\| {x_{c} ,x_{i} } \right\| = \sqrt {\sum\nolimits_{i = 1}^{n} {\left( {x_{c} - x_{i} } \right)^{2} } } $$
(3.21)

When all subsets are found the information granule’s coverage must be calculated, i.e. the standard deviation \( \sigma \). This is done through Eq. (3.22). Where \( \sigma_{j,k} \) is the standard deviation of the j-th rule and k-th input, \( x_{i} \) is each datum from the subset obtained from Eq. (3.21), \( x_{c}^{j,k} \) is the cluster center for the j-th rule and k-th input, and n is the cardinality of the subset.

$$ \sigma_{j,k} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} (x_{i} - x_{c}^{j,k} )^{2} }}{n - 1} $$
(3.22)

Up to this point a Type-1 Gaussian membership function can be created, but the end product is an IT2 Gaussian membership function with uncertain mean. The following process obtains the remaining required parameter to form the IT2 FS.

To obtain the uncertainty on the IT2 FS the principle of justifiable granularity is used as a means to heuristically measure it. Done by forcing each information granule to its largest coverage by using the user criterion value of \( \alpha_{max} \) on each side of the interval, as described by Eqs. (2.6) and (2.7). When both intervals a and b are obtained, their difference is used to heuristically measure the uncertainty for the IT2 FS, as shown in Eq. (3.23). Where \( \tau \) is the measure of uncertainty for a specific information granule.

$$ \tau = \left| {a - b} \right| $$
(3.23)

The obtained value of \( \tau \) is used with Eqs. (3.24) and (3.25). Where the center parameter of the IT2 Gaussian membership function with uncertain mean holds the uncertainty of the IT2 FS, i.e. \( \tau \) offsets the mean of the Gaussian membership function thus adding the uncertainty. This concludes the section for forming Higher-type fuzzy information granules in the antecedents of an IT2 FLS.

$$ m_{j,k}^{l} = x_{c}^{j,k} - \tau_{j,k} $$
(3.24)
$$ m_{j,k}^{l} = x_{c}^{j,k} + \tau_{j,k} $$
(3.25)

The IT2 linear TSK consequents are calculated in a two step process. First, an LSE algorithm [8, 9] is used twice, as the Gaussian membership function with uncertain mean is parameterized by a left and right T1 Gaussian membership function, the LSE is applied as if two T1 FLSs existed. When all TSK coefficients are obtained, the average of both sets of parameters is used, as shown in Eq. (3.26). Where \( \varphi_{l} \) and \( \varphi_{r} \) are the coefficient sets for the left and right side T1 FLSs, and \( \varphi \) is the set used for the proposed IT2 FLS. The code for this process is included in Appendix C.3.

$$ \varphi = \frac{{\varphi_{l} + \varphi_{r} }}{2} $$
(3.26)

The second part of the process for calculating the coefficients for the IT2 TSK consequents is to obtain the spreads of each coefficient, with format as shown in Eq. (3.27). Where c are the coefficients, x the dependent input variables, and s the spreads.

$$ y^{i} = \sum\limits_{k = 1}^{p} {c_{k}^{i} x_{k} } + c_{0}^{i} - \sum\limits_{k = 1}^{p} {|x_{k} |s_{k}^{i} - s_{0}^{i} } $$
(3.27)

To adjust the spreads, a PSO algorithm [10] was chosen. Where the initial population is randomly generated with values within [0, 0.09], i.e. very small spreads. As the PSO uses multiple parameters which can speed up the convergence to a good result, the ones used here are shown in Table 3.1. Where three values, marked in bold, specify how the optimization is desired. The individual acceleration factors are fixed, this causes the PSO to search more slowly inside the confined space, otherwise the spread significantly increases and that is not a desirable behavior. The rest of the parameters are set to typical recommended parameter values.

Table 3.1 PSO parameters used for optimizing the spread of the IT2 TSK linear polynomials

The objective function for the PSO is another very important topic that must be addressed in order to obtain the best possible solution. Shown in Eq. (3.28) is the manually adjusted objective function. Where \( \theta \) is the RMSE of the output coverage, and \( \vartheta \) is the RMSE of the size of the Footprint Of Uncertainty.

$$ objVal = 0.8*\theta + 0.2*\vartheta $$
(3.28)

Once the PSO concludes its 50 iterations, the spread values are expected to be quasi-optimal, with some minor error on the FOU coverage, or optimal, with zero error on the FOU coverage.

3.2.2 Information Granule Formation via the Concept of Uncertainty-Based Information with IT2 FS Representation with TSK Consequents Optimized with Cuckoo Search

This approach, published in [11] uses the concept of uncertainty-based information as a means to measure uncertainty and use it for the formation of IT2 FS antecedents. Its consequents are IT2 TSK linear polynomials which are optimized via a Cuckoo Search [12] algorithm.

The concepts of uncertainty and information are closely related in such a way that their fundamental characteristic is that involved uncertainty from any situation is a result from information deficiency. Therefore, an assumption can be made where uncertainty is reduced by obtaining new relevant information, .e.g. obtaining new experimental results, and by measuring the difference between both sets of information (a priori and a posteriori), then, can some uncertainty can be measured.

In Fig. 3.2, the shown diagram represents the general idea behind the behavior of uncertainty-based information. A reduction of uncertainty can be obtained via the difference of two uncertain models from the same information. That is, an a priori uncertainty model is obtained with a first sample of information, where as an a posteriori uncertainty model is obtained with a second sample of information related to the same situation.

Fig. 3.2
figure 2

Diagram of the behavior of the uncertainty-based information where uncertainty is reduced by the difference between two uncertain models of the same information

With an inspiration on uncertainty-based information, higher-type information granules can be formed via the capture and measurement of uncertainty.

Through a first sample of information D1, an uncertain model can be created. And through a second sample of information D2 another similar uncertain model can be also created. These two models of uncertainty are analogous to the models in the theory of uncertainty-based information, a priori and a posteriori uncertainty models. As uncertainty-based information tries to reduce uncertainty by measuring it, for the purpose of this proposed technique, it uses such measurement of uncertainty and integrates it into an IT2 FS.

The proposed approach works by taking two sets of samples from an information source, D1 and D2, and obtaining two Gaussian Type-1 membership functions. As there will probably be a difference between the two taken samples, this difference is what ultimately reflects the direct measurement of uncertainty which is then imposed unto an IT2 Gaussian membership function with an uncertain standard deviation. This behavior is shown in Fig. 3.3.

Fig. 3.3
figure 3

Explanatory diagram of how the proposed approach measures and defines the uncertainty, and forms an IT2 FS with such uncertainty

An implementation to this approach uses the Subtractive clustering algorithm [6] to obtain rule sets. The following algorithm summarizes how the fuzzy Higher-Type information granules are formed:

  1. 1.

    Obtain first rule set from D1, comprised of centers \( m \) and standard deviations \( \sigma_{1} \). Where \( \sigma_{1} \) is obtained by finding sets from data closest to each \( m^{i} \).

  2. 2.

    Obtain \( \sigma_{2} \) via D2. Where \( \sigma_{2} \) is obtained by finding sets from data closest to each \( m^{i} \).

  3. 3.

    Form antecedent IT2 membership functions by using found parameters \( m \), \( \sigma_{1}^{i} \) and \( \sigma_{2}^{i} \).

  4. 4.

    The consequents are obtained via an optimization of IT2 TSK linear polynomials with a Cuckoo Search optimization algorithm [12].

Another implementation uses the Fuzzy C-Means clustering algorithm [13] to obtain rule sets. This implementation is a better reflection of the concept of a priori and a posteriori uncertainty models. Appendix C.4 shows the code which executes this algorithm.

  1. 1.

    Obtain first rule set from D1, comprised of centers \( m_{1} \) and standard deviations \( \sigma_{1} \). Where \( \sigma_{1} \) is obtained via the \( U_{1} \) matrix partition, where the highest value represents to which \( m_{1}^{i} \) each datum belongs to. This obtains the a priori uncertainty model.

  2. 2.

    Obtain first rule set from D2, comprised of centers \( m_{2} \) and standard deviations \( \sigma_{2} \). Since the FCM randomly chooses an initial U partition matrix, to conform to rule set similarity between both executions. The second FCM’s U partition matrix is initialized with the values of \( U_{1} \). Where \( \sigma_{1} \) is obtained via the \( U_{2} \) matrix partition, where the highest value represents to which \( m_{1}^{i} \) each datum belongs to. This obtains the a posteriori uncertainty model.

  3. 3.

    Form antecedents IT2 membership functions by obtaining the mean of \( m_{1}^{i} \) and \( m_{2}^{i} \), and using both \( \sigma_{1}^{i} \) and \( \sigma_{2}^{i} \).

  4. 4.

    The consequents are obtained via an optimization of IT2 TSK linear polynomials with a Cuckoo Search optimization algorithm.

3.2.3 Method for Measurement of Uncertainty Applied to the Formation of IT2 FS

In this approach, by Sanchez et al. [14] a heuristic methodology is proposed which based on the Coefficient of Variation can measure a degree of uncertainty from a dataset and construct Higher-type information granules in the form of IT2 FSs. The consequents of the IT2 FLS use TSK linear polynomials which are optimized via a Cuckoo Search optimization algorithm.

The premise of the approach is that uncertainty can be interpreted as a case for data dispersion in a sample of data. As shown in Fig. 3.4, where (a) shows a low dispersion scenario where most data samples are very close together, therefore having low dispersion, or in the case of the premise of this approach, low uncertainty; or the case of (b), where although there is a concentration of data near the center there are still data samples existing far from it, this can be interpreted as medium uncertainty; or (c), where all data is equally spaced though the Universe of Discourse, therefore an interpretation is given such that uncertainty is so high that any place could potentially obtain another sample.

Fig. 3.4
figure 4

Visual depiction of varying degrees of data dispersion. a Low dispersion, b medium dispersion, and c high dispersion

The conversion is done via the use of the Coefficient of Variation \( cv \), shown in Eq. (3.29). Where \( \sigma \) is the standard deviation, and \( m \) is the mean of the set.

$$ cv = \frac{\sigma }{m} $$
(3.29)

A limitation exists on \( cv \) which adds a restriction that only non-negative values can be computed.

The dispersion-uncertainty relation, when taking into account the use of IT2 FSs when the FOU is used as a measure of uncertainty, a direct proportion relation is used, as shown in Eq. (3.30). Such relation states that with low dispersion, a small FOU exists, with a medium amount of dispersion, a medium amount of FOU exists, and with a high amount of dispersion, a high amount of FOU exists.

$$ c_{v} \propto FOU $$
(3.30)

The dispersion-uncertainty relation can be visually perceived in Fig. 3.5, where different degrees of dispersion are reflected by a proportionally sized FOU.

Fig. 3.5
figure 5

Varying degrees of FOU in an IT2 FS. Where a FOU = 0.0 (low dispersion), b FOU = 0.5 (medium dispersion), and c FOU = 1.0 (high dispersion)

To form Higher-type information granules for the antecedents of the FLS, a FIS prototype must first be acquired, conformed of a rule set composed of centers for Gaussian membership functions and the accompanying subsets of data which created each prototype. The proposed approach works on each FS independently from each other, each FS is conformed of a center value m, and a standard deviation \( \sigma \) obtained from the subset of sample data \( \delta \). And for each FS a \( cv \) is calculated using Eq. (3.29). This value is now used to search for a near optimal FOU area in an IT2 FS. The search starts by first considering the highest possible obtainable area \( FOU^{\hbox{max} } = \int {\tilde{A}\left( x \right)dx} = 1 \) with the previously obtained \( \sigma \), as shown in Fig. 3.5.c, achieved when \( \sigma_{1} = 0 \) and \( \sigma_{2} = 2\sigma \). Here, possible values of FOU are in the interval [0, 1]. The search is then performed with discrete small steps \( \lambda \) until the FOU value which equals \( c_{v} \) is found. The smallest value \( \sigma^{0} \) is defined as \( \sigma_{1} = \sigma_{2} = \sigma^{0} = \sigma \), as shown in Fig. 3.5a. Each step \( \lambda \) affects \( \sigma_{1} \) and \( \sigma_{2} \) by a size increment/decrement, as shown in Eq. (3.31). This is done iteratively while \( \left\| {\sigma_{1} ,\sigma_{2} } \right\| \le 2\sigma \), where \( \left\| {\sigma_{1} ,\sigma_{2} } \right\| \) is the Euclidean distance between \( \sigma_{1} \) and \( \sigma_{2} \), or until \( cv = FOU_{i} \), and \( FOU_{i} \) is the current iteration of the IT2 FS modified by \( \sigma^{i}_{1,2} \).

$$ \sigma^{i}_{1,2} = \sigma_{1,2}^{{_{i - 1} }} \pm \lambda ;\quad i = 0,1, \ldots ,n $$
(3.31)

When the search has found the values of \( \sigma_{1} \) and \( \sigma_{2} \) which together form the desired FOU, the IT2 FS can be formed, i.e. a Higher Type Information Granule has been formed, as shown in Fig. 3.6.

Fig. 3.6
figure 6

IT2 FS represented by a Gaussian membership function with uncertainty in the standard deviation. Formed with three variables: \( \sigma_{1} \),\( \sigma_{2} \) and \( m \)

As for obtaining the consequents of the FLS, a Cuckoo Search optimization algorithm is used for this purpose. Where IT2 TSK linear polynomials embody the consequents of the FLS.

3.2.4 Formation of GT2 Gaussian Membership Functions Based on the Information Granule Numerical Evidence

This work was done by Sanchez et al. [15], where a technique for the formation of GT2 FS is proposed by means of inspiration on the principle of justifiable granularity , whereby the concept of numerical evidence is used in the formation of IT2 FS.

The proposed technique can be defined by the following steps:

  1. 1.

    Use any clustering algorithm to obtain an initial set of cluster centers (C) and sets of data (D) which formed each information granule.

  2. 2.

    Calculate the required parameters to form a Gaussian primary membership function. Extract an individual \( c_{i} \), from \( c \in C \), and using the subset \( d_{i} \), from \( d \in D \), obtain the standard deviation \( \sigma_{i} \).

  3. 3.

    Initialize \( \sigma \) for Gaussian secondary membership functions. Where the initial value of \( \sigma = 0.1 \).

  4. 4.

    For each data point belonging to the subset \( d_{i} \), create a secondary membership function with revolution on \( fx\left( u \right) \) axis, following \( u \) of the primary membership function.

Two approaches exist to how the value of \( \sigma \) for the secondary membership functions changes and how it affects the GT2 FS. The first approach maintains a fixed value of \( \sigma \) for all secondary membership functions. Appendix C.5 shows code which creates this specific type of GT2 membership functions. Figures 3.7 and 3.8, show a top view and orthogonal view of a sample GT2 FS with a constant \( \sigma \) for the secondary membership functions.

Fig. 3.7
figure 7

Proposed GT2 Gaussian membership function with constant \( \varvec{\sigma} \) for all secondary membership functions. Top view

Fig. 3.8
figure 8

Proposed GT2 Gaussian membership function with constant \( \varvec{\sigma} \) for all secondary membership functions. Orthogonal view

The other proposed approach to forming GT2 Gaussian membership functions is to change \( \sigma \) for the secondary membership functions based on \( u \). Where the closest points to the center will increase in size, since more data is expected which could potentially fall there, whereas the sides have less possibilities of obtaining new data points, therefore reducing the size. Appendix C.6 shows code which creates this specific type of GT2 membership functions. Figures 3.9 and 3.10 depicts a sample Gaussian primary membership function with \( \sigma \) for the secondary membership function dependent on \( u \).

Fig. 3.9
figure 9

Proposed GT2 Gaussian membership function with \( \varvec{\sigma} \) dependent on \( \varvec{u} \) for all secondary membership functions. Orthogonal view

Fig. 3.10
figure 10

Proposed GT2 Gaussian membership function with \( \varvec{\sigma} \) dependent on \( \varvec{u} \) for all secondary membership functions. Orthogonal view

In Appendix B.2, additional figures are shown for an application in solving the Iris dataset. For both fixed, and \( \sigma \) dependent on \( u \).