Fuzzy rulebased systems for recognitionintensive classification in granular computing context
 67 Downloads
Abstract
In traditional machine learning, classification is typically undertaken in the way of discriminative learning using probabilistic approaches, i.e. learning a classifier that discriminates one class from other classes. The above learning strategy is mainly due to the assumption that different classes are mutually exclusive and each instance is clearcut. However, the above assumption does not always hold in the context of reallife data classification, especially when the nature of a classification task is to recognize patterns of specific classes. For example, in the context of emotion detection, multiple emotions may be identified from the same person at the same time, which indicates in general that different emotions may involve specific relationships rather than mutual exclusion. In this paper, we focus on classification problems that involve pattern recognition. In particular, we position the study in the context of granular computing, and propose the use of fuzzy rulebased systems for recognitionintensive classification of reallife data instances. Furthermore, we report an experimental study conducted using 7 UCI data sets on life sciences, to compare the fuzzy approach with four popular probabilistic approaches in pattern recognition tasks. The experimental results show that the fuzzy approach can not only be used as an alternative one to the probabilistic approaches but also is capable to capture more patterns which probabilistic approaches cannot achieve.
Keywords
Machine learning Multitask learning Reallife data classification Fuzzy classification Fuzzy rulebased systems Granular computing1 Introduction
Classification is one of the most popular tasks of machine learning, which has been popularly involved in various application areas, such as sentiment analysis (Liu and Cocea 2017b; Pedrycz and Chen 2016; Jefferson et al. 2017), image processing (Liu et al. 2017a; Wang and Yu 2016), pattern recognition (Teng et al. 2007; Wu et al. 2011) and decision making (Liu and Gegov 2015; Xu and Wang 2016; Liu and You 2017).
In traditional machine learning, classification is typically conducted by training a classifier that discriminates one class from other classes towards uniquely classifying each instance, since the classification is based on the assumptions that different classes are mutually exclusive and that each instance is clearcut and thus cannot belong to more than one class. However, the above assumptions do not always hold in reallife data classification. For example, it is very normal that the same movie can belong to different categories or the same book can be associated with different subjects. Also, while different classes are truly mutually exclusive, it is also possible that some instances are very complex and hard to distinguish, e.g. in handwritten digits recognition, the two digits ‘4’ and ‘9’ can be highly similar to each other, due to the diversity in handwriting styles from different people.
Furthermore, as introduced in Liu et al. (2017), classification is essentially a task of predicting the value of a discrete attribute. In the context of data science, discrete attributes can be specialized into several other types, such as nominal, ordinal and string (Tan et al. 2005). Due to the difference in types of discrete attributes, the nature of classification tasks can also be varied. In particular, classification tasks can be specialized into pattern recognition, rating and decision making (Liu et al. 2017), which indicates that a classification task is not necessarily aimed at discrimination between different classes, i.e. the purpose could be simply to identify patterns of a specific class from instances, without the need to distinguish the class from other classes.
In this paper, we focus on recognitionintensive classification in the setting of granular computing. In particular, we propose to adopt fuzzy rulebased systems in the context of multitask classification, i.e. each class is viewed as an information granule, which involves a specific recognition task, in terms of the membership degree of an instance to the class. Also, the recognition task for each class is undertaken independently, i.e. the membership degree of an instance to each class is measured independently, in the context of generative classification.
The contributions of this paper include the following: (a) we point out the case that in recognitionintensive classification different classes generally involve some specific relationships rather than mutual exclusion, so it is not appropriate to undertake such a classification task in a discriminative way; (b) we show both theoretically and empirically that fuzzy approaches are more suitable than probabilistic ones for recognitionintensive classification, i.e. fuzzy approaches can not only be used as the alternative ones to probabilistic approaches in terms of classification performance, but also show the capability of capturing more patterns that cannot be discovered using probabilistic approaches.
The rest of this paper is organized as follows: Sect. 2 provides related work on recognitionintensive classification in the context of traditional machine learning and the concepts of granular computing. In Sect. 3, we illustrate the procedure of fuzzy rulebased classification in the context of multitask learning. We also justify the significance and advantages of fuzzy classification of reallife data that involve recognition tasks. In Sect. 4, we report an experimental study conducted using 7 UCI data sets, and discuss the results critically and comparatively to show the advantages of fuzzy approaches for recognitionintensive classification, in comparison with probabilistic approaches. In Sect. 5, we summarize the contributions of this paper and suggest further directions towards advancing this research area in the future.
2 Related work
This section provides a review of recognitionintensive classification when traditional machine learning approaches are used. This section also presents an overview of granular computing concepts and techniques and shows how they can be used effectively and efficiently for dealing with reallife classification problems.
2.1 Review of recognitionintensive classification
As introduced in Liu et al. (2017), recognition can be either a binary or multiclass classification task. A popular example of binary classification for the purpose of recognition is gender identification (Guo 2014), which is aimed at judging that a person is male or female. In this context, both the male and female classes are of high interest, since it is required to distinguish clearly the two classes towards identifying accurately the gender of a person, which has motivated researchers to focus the research on discriminative approaches of classification according to Wu et al. (2011), Ali and Xavier (2014), Lin et al. (2016), Suykens and Vandewalle (1999). In other words, researchers aim to identify features that can discriminate effectively between male and female in the setting of discriminative learning.
However, there are also some examples of binary classification that only involve one class of interest, such as cyberbullying detection (Zhao et al. 2016; Reynolds et al. 2011). In the context of cyberbullying classification, the aim is essentially at recognizing effectively any such offensive languages from online text posted via social media, i.e. it is to judge if the text is sent for the purpose of bullying. In reality, vast majority of textual instances posted via social media would normally belong to the ‘no’ class (i.e. a collected data set usually contains less than 10% cyberbullying instances), as mentioned in Reynolds et al. (2011), which indicates the case of class imbalance. Since discriminative classification has been mainly involved in traditional machine learning, some popular probabilistic approaches, such as support vector machine (SVM), naive Bayes (NB) and decision trees (DT), have been popularly used for cyberbullying detection (Zhao et al. 2016).
From a perspective of granular computingbased machine learning, the ‘yes’ class, which represents the case of cyberbullying, can be viewed as the only target class, since it is the only class of interest and the prediction accuracy for the ‘no’ class would be usually very high. From this point of view, it is only needed to extract a set of features that are highly relevant to the target class, such that a classifier is learned for recognizing the case of cyberbullying. In other words, the classifier output is the ‘no’ class by default, and the ‘yes’ class is provided as the output only when some features of cyberbullying are found from text.
On the other hand, there are also many examples of multiclass classification for the purpose of recognition, such as emotion identification (Teng et al. 2007). Due to the popularity of probabilistic approaches in traditional machine learning, SVM and NB have been used for discriminating one emotion from the other ones (Altrabsheh et al. 2015). However, as argued in Liu et al. (2017), different emotions are not really mutually exclusive, i.e. it is normal that different emotions can be identified from the same person at the same time, so it is not really appropriate to learn classifiers towards discriminating between different classes. Instead, it would be necessary to treat identification of each emotion as an independent task. In this context, it is necessary to extract only features that are highly relevant to this specific emotion, such that a classifier is learned from these features for identifying whether a person has this specific emotion at a particular point.
2.2 Overview of granular computing
Granular computing is a paradigm of information processing. It is aimed at structural thinking from a philosophical perspective and is aimed at structural problem solving from a practical perspective (Yao 2005b).
In general, granular computing involves two operations, namely granulation and organization (Yao 2005a). The former operation is aimed at decomposing a whole into several (overlapping or nonoverlapping) parts, whereas the latter operation is aimed at integrating several (overlapping or nonoverlapping) parts into a whole. In the context of computer science, granulation and organization are typically used as the topdown and bottomup approaches, respectively (Liu and Cocea 2017a). In other words, granulation means to divide a complex problem into several simpler subproblems, whereas organization indicates that several modular problems are linked together into a more systematic problem.

In the context of classification, each class can be viewed as a granule, since a class is essentially a collection of objects/instances.

In the context of rulebased systems, each rule can be viewed as a granule, since a rule consists of a collection of rule terms as its antecedent.

In the context of fuzzification of continuous attributes, each linguistic term can be viewed as a granule, since a linguistic term is essentially a fuzzy set that represents a collection of elements with different membership degrees to the fuzzy set.
In general, there are some specific relationships between granules in the same level or different levels, which leads to the need to involve the concept of granularity (Pedrycz and Chen 2015). In particular, granules, which are located at the same level of granularity, involve horizontal relationships (Liu and Cocea 2018), e.g. mutual exclusion, correlation and mutual independence.
In contrast, granules, which are located at different levels of granularity, involve hierarchical relationships (Liu and Cocea 2018; Liu et al. 2017), e.g. generalization/specialization and aggregation/decomposition. For example, in the context of classification, a class at a higher level of granularity may be specialized/decomposed into subclasses at a lower level of granularity. Also, classes at a lower level of granularity may be generalized/aggregated into a super class at a higher level of granularity (Liu and Cocea 2017a). On the other hand, different classes may also be mutually exclusive, correlated or mutually independent, when these classes are at the same level of granularity (Liu et al. 2017b).
In practice, granular computing concepts have been popularly used in various areas, such as artificial intelligence (Wilke and Portmann 2016; Pedrycz and Chen 2011; Skowron et al. 2016), computational intelligence (Dubois and Prade 2016; Yao 2005b; Kreinovich 2016; Livi and Sadeghian 2016), machine learning (Min and Xu 2016; Peters and Weber 2016; Liu and Cocea 2017c; Antonelli et al. 2016; Chen et al. 2001), decision making (Xu and Wang 2016; Liu and You 2017; Chatterjee and Kar 2017), data clustering (Chen et al. 2009; Horng et al. 2005; Chen et al. 2011) and natural language processing (Zhang et al. 2007).
Furthermore, granular computing concepts have also been popularly used in ensemble learning techniques (Liu and Cocea 2017c). In particular, ensemble learning approaches, such as Bagging, involve granulation of information through decomposing a training set into a number of overlapping samples (different versions of training data), and also involve organization through combining the individual outputs derived from different base classifiers towards finally assigning a class to an unseen instance; there has also been a very similar perspective stressed and discussed in Hu and Shi (2009).
3 Fuzzy multitask classification approach
In this section, we illustrate the procedure of fuzzy rulebased systems in the context of generative multitask classification. Also, we justify the theoretical significance and advantages of using fuzzy approaches for recognitionintensive classification.
3.1 Procedure of fuzzy rulebased systems
A fuzzy rulebased system is essentially based on fuzzy logic and fuzzy set theory. Fuzzy logic is an extension of deterministic logic, i.e. fuzzy truth values are continuous, which are ranged from 0 to 1, unlike binary truth values (0 or 1).

Rule 1: if \(x_1\) is Young and \(x_2\) is High then class = Impressive;

Rule 2: if \(x_1\) is Young and \(x_2\) is Middle then class = Impressive;

Rule 3: if \(x_1\) is Young and \(x_2\) is Low then class = Normal;

Rule 4: if \(x_1\) is Middleaged and \(x_2\) is High then class = Impressive;

Rule 5: if \(x_1\) is Middleaged and \(x_2\) is Middle then class = Normal;

Rule 6: if \(x_1\) is Middleaged and \(x_2\) is Low then class = Odd;

Rule 7: if \(x_1\) is Old and \(x_2\) is High then class = Normal;

Rule 8: if \(x_1\) is Old and \(x_2\) is Middle then class = Odd;

Rule 9: if \(x_1\) is Old and \(x_2\) is Low then class = Odd;
According to Figs. 2 and 3, if \(x_1= 30\) and \(x_2= 60k\), then the following steps will be executed:

Rule 1: \(f_\text {Young}(30)= 0.67\), \(f_\text {High}(60k)=0.33\);

Rule 2: \(f_\text {Young}(30)= 0.67\), \(f_\text {Middle}(60k)=0.67\);

Rule 3: \(f_\text {Young}(30)= 0.67\), \(f_\text {Low}(60k)= 0\);

Rule 4: \(f_\text {Middleaged}(30)= 0.33\), \(f_\text {High}(60k)=0.33\);

Rule 5: \(f_\text {Middleaged}(30)= 0.33\), \(f_\text {Middle}(60k)=0.67\);

Rule 6: \(f_\text {Middleaged}(30)= 0.33\), \(f_\text {Low}(60k)= 0\);

Rule 7: \(f_\text {Old}(30)= 0\), \(f_\text {High}(60k)=0.33\);

Rule 8: \(f_\text {Old}(30)= 0\), \(f_\text {Middle}(60k)=0.67\);

Rule 9: \(f_\text {Old}(30)= 0\), \(f_\text {Low}(60k)= 0\);
In the fuzzification step, the notation \(f_\text {High}(60k)=0.33\) represents that the membership degree of the numerical value ‘60k’ to the fuzzy set defined with the linguistic term ‘High’ is 0.33. The fuzzification step is aimed at mapping the value of a continuous attribute to a value of membership degree to a fuzzy set (i.e. mapping to the value of a linguistic term transformed from the continuous attribute).

Rule 1: \(f_\text {Young}(30) \wedge f_\text {High}(60k)= Min(0.67, 0.33) = 0.33\);

Rule 2: \(f_\text {Young}(30) \wedge f_\text {Middle}(60k)= Min(0.67, 0.67) = 0.67\);

Rule 3: \(f_\text {Young}(30) \wedge f_\text {Low}(60k)= Min(0.67, 0) =0\);

Rule 4: \(f_\text {Middleaged}(30) \wedge f_\text {High}(60k)= Min(0.33, 0.33) = 0.33\);

Rule 5: \(f_\text {Middleaged}(30) \wedge f_\text {Middle}(60k)= Min(0.33, 0.67) = 0.33\);

Rule 6: \(f_\text {Middleaged}(30) \wedge f_\text {Low}(60k)= Min(0.33, 0) = 0\);

Rule 7: \(f_\text {Old}(30) \wedge f_\text {High}(60k)= Min(0, 0.33) = 0\);

Rule 8: \(f_\text {Old}(30) \wedge f_\text {Middle}(60k)= Min(0, 0.67) = 0\);

Rule 9: \(f_\text {Old}(30) \wedge f_\text {Low}(60k)= Min(0, 0) = 0\);
In the application step, the conjunction of the two fuzzy membership degree values, respectively, for the two attributes ‘\(x_1\)’ and ‘\(x_2\)’ is aimed at deriving the firing strength of a fuzzy rule. For example, the antecedent of Rule 1 consists of \(x_1\) is Young and \(x_2\) is High, so the firing strength of Rule 1 is 0.33, while \(f_\text {Young}(30)= 0.67\) and \(f_\text {High}(60k)=0.33\).

Rule 1: \(f_{\text {Rule}1 \rightarrow \text {Impressive}}(30, 60k)= 0.33\);

Rule 2: \(f_{\text {Rule}2 \rightarrow \text {Impressive}}(30, 60k)= 0.67\);

Rule 3: \(f_{\text {Rule}3 \rightarrow \text {Normal}}(30, 60k)= 0\);

Rule 4: \(f_{\text {Rule}4 \rightarrow \text {Impressive}}(30, 60k)= 0.33\);

Rule 5: \(f_{\text {Rule}5 \rightarrow \text {Normal}}(30, 60k)= 0.33\);

Rule 6: \(f_{\text {Rule}6 \rightarrow \text {Odd}}(30, 60k)= 0\);

Rule 7: \(f_{\text {Rule}7 \rightarrow \text {Normal}}(30, 60k)= 0\);

Rule 8: \(f_{\text {Rule}8 \rightarrow \text {Odd}}(30, 60k)= 0\);

Rule 9: \(f_{\text {Rule}9 \rightarrow \text {Odd}}(30, 60k)= 0\);
In the implication step, the firing strength of a fuzzy rule derived in the application step can be used further to infer the value of membership degree of an input vector to one of the class labels ‘Impressive’, ‘Normal’ and ‘Odd’, depending on the actual consequent of the fuzzy rule. For example, \(f_{\text {Rule}1 \rightarrow \text {Impressive}}(30, 60k)= 0.33\) indicates that the consequent of Rule 1 is assigned the class label ‘Impressive’ and the input vector ‘(30, 60k)’ has the membership degree value of 0.33 to the class label ‘Impressive’. In other words, the input vector ‘(30, 60k)’ gains the membership degree value of 0.33 to the class label ‘Impressive’, through the inference using Rule 1.
In the aggregation step, the membership degree value of the input vector to the class label (‘Impressive’, ‘Normal’ or ‘Odd’), which is inferred through using a rule, is compared with the other membership degree values inferred through using the other rules, towards finding the maximum among all the membership degree values. For example, Rule 1, Rule 2 and Rule 4 are all assigned the class label ‘Impressive’ as their consequent and the input vector ‘(30, 60k)’ gains the membership degree values of are 0.33, 0.67 and 0.33, respectively, to the class label ‘Impressive’, through the inference using the above three rules. As the maximum of the fuzzy membership degree values is 0.67, the input vector is judged finally to have the membership degree value of 0.67 to the class label ‘Impressive’.
In traditional machine learning, it is usually needed to provide an crisp output as the classification outcome, so defuzzification is typically involved by choosing the class label with the maximum value of membership degree. When there is more than one class label with the maximum value of membership degree, defuzzification is achieved by randomly choosing one of these classes with the maximum membership degree. For the above illustrative example, the final classification outcome is to assign the class label ‘Impressive’ to the unseen instance ‘(30, 60k, ?)’, since the value (0.67) of the membership degree to this class label is the maximum one. In contrast, generative multitask classification is aimed at measuring independently the membership degree value of an instance to each class, so it is not necessary to involve the defuzzification step.
Besides, as mentioned above, definition of membership functions can be based on expert knowledge or real data. In the context of datadriven definition of membership functions, it is generally not applicable to assign each fuzzy set a linguistic term. Instead, each fuzzy set is provided with an ID, e.g. ID ‘0’ represents the first fuzzy set. The representation of each fuzzy set is achieved through providing the actual parameters of the membership function defined for the fuzzy set, e.g. [20, 30, 55, 75] represents the four parameters (a, b, c, d) of a trapezoid membership function.
3.2 Justification
In the context of recognitionintensive classification, the purpose is essentially to discover the presence of a target class of instances. From this point of view, fuzzy multitask classification is considered as a very suitable approach. For example, in the context of human activities recognition, there are several activities that need to be identified, and each of the activities is viewed as a target class, such that a set of fuzzy rules are learned for each target class and are used to identify the degree to which the activity (corresponding to the target class) is present, in the setting of fuzzy multitask classification.
On the other hand, recognitionintensive classification can usually involve a large number of classes, e.g. human activities recognition can involve more than ten classes as indicated in Kalua et al. (2010). It is very likely that these classes are not mutually exclusive. For example, in activities recognition, the three classes ‘sitting’, ‘sitting down’ and ‘sitting on the ground’ are generally correlated to some extents. Also, the three classes ‘standing up from lying’, ‘standing up from sitting’ and ‘standing up from sitting on the ground’ would have some overlaps in terms of their features. From the above point of view, human activities recognition is not a black and white problem, so fuzzy approaches are capable to deal with this kind of problems in a grey manner, i.e. it is aimed at identifying independently the degree of presence of each activity.
Furthermore, as mentioned in Sect. 1, it is also possible in reality that an instance can fully belong to more than one class, since these classes are defined from different perspectives. For example, a student can be classified as an international student in terms of nationality, as a fulltime student in terms of study mode, or as a undergraduate student in terms of degree levels. In this context, a student can fully belong to all the three classes above. Since fuzzy rulebased classification is generally done in a generative way, i.e. it treats each class equally and the membership degree of an instance to each class is measured independently, a fuzzy classifier is really capable to capture the case that an instance highly or even fully belongs to more than one class, i.e. an instance appears to have a very high membership degree (closer or even equal to 1) to more than one class.
We will show experimental results to support the above argumentation in the context of fuzzy multitask classification from granular computing perspectives.
4 Experiments, results and discussion
Characteristics of data sets
Dataset  Attribute types  #Attributes  #Instances  #Classes  References 

Foresttype  Continuous  27  326  4  Johnson et al. (2012) 
Anurancalls1  Continuous  22  7195  4  Colonna et al. (2015) 
Anurancalls2  Continuous  22  7195  8  Colonna et al. (2015) 
Anurancalls3  Continuous  22  7195  10  Colonna et al. (2015) 
Cardiotocography1  Continuous  23  2126  3  de Campos et al. (2000) 
Cardiotocography2  Continuous  23  2126  10  de Campos et al. (2000) 
Glass  Continuous  10  214  6  Evett and Spiehler (1987) 
In terms of classification accuracy, we compare the fuzzy approach with four popular probabilistic ones for pattern recognition, namely DT (Quinlan 1993), NB (Rish 2001), KNN (Zhang 1992) and SVM with the polynomial kernel (Cristianini 2000). Also, we show the membership degree values of some representative instances (selected from the test sets) to all the given classes, to indicate that the fuzzy approach is capable to capture more patterns than expected, i.e. an instance may also belong to other classes apart from the target class, or the set of given classes is not complete, so the instance cannot be classified and an extra class needs to be found.
Classification accuracy
Dataset  DT (Quinlan 1993)  NB (Rish 2001)  KNN (Zhang 1992)  SVM (Cristianini 2000)  Fuzzy (Berthold 2003) 

Foresttype  0.806  0.772  0.84  0.828  0.853 
Anurancalls1  0.951  0.878  0.991  0.935  0.98 
Anurancalls2  0.937  0.854  0.988  0.933  0.97 
Anurancalls3  0.947  0.884  0.987  0.94  0.96 
Cardiotocography1  0.983  0.986  0.821  0.78  0.974 
Cardiotocography2  1  0.999  0.422  0.278  0.991 
Glass  0.663  0.485  0.655  0.553  0.686 
The results on classification accuracy are shown in Table 2 and indicate that the fuzzy approach outperforms all the probabilistic ones in two out of the seven cases (on the ‘ForestType’ and ‘Glass’ data sets). In the other five cases, the fuzzy approach performs marginally worse than the best performing one but still outperforms the majority of the probabilistic approaches. The results shown in Table 2 indicate that the fuzzy approach can fairly be used as the alternative one to these popular probabilistic approaches for recognitionintensive classification, without loss of classification accuracy.
However, the fuzzy approach is capable to capture more patterns which probabilistic approaches cannot achieve, as mentioned above. In particular, Tables 3, 4 and 5 are presented to show the membership degrees of each instance to different classes. For example, in Table 3, the first column represents the ID of an instance; the second column represents the class label that is assigned to each instance by experts, and the third to sixth columns represent the membership degrees of each instance to these corresponding classes (i.e. ‘d’, ‘h’, ‘o’ and ‘s’). In addition, the last column represents the prediction made by the fuzzy classifier for assigning a class to an instance in the setting of traditional machine learning, i.e. it is the output of the fuzzy classifier through defuzzification. However, as argued in Sect. 3, the defuzzification step is not needed in the setting of recognitionintensive classification from a granular computing perspective, and we include this column just for clarifying what outputs would be provided by the fuzzy classifier if the defuzzification step is involved.
Results sample on foresttype data
ID  Class  d  h  o  s  Prediction 

1  d  0.25  0  0  0  d 
2  s  0  0.63  0  1  s 
3  s  0  0  0  1  s 
4  d  1  0  0  0  d 
5  h  0  1  0  0  h 
6  o  0  0  1  0  o 
7  d  0.91  0  0  0.28  d 
8  d  0  0  0  0  ? 
9  s  0  0  0  0  ? 
10  s  0  1  0  1  s 
11  s  1  0  0  1  s 
12  s  0  0.77  0  1  s 
13  d  1  0  0.23  0  d 
14  d  1  0  0.44  0  d 
15  o  0.2  0  1  0  o 
Furthermore, the results show that an instance may not belong to any of the predefined classes, i.e. an instance has the membership degree value of 0 to all the classes. In this case, the instance is unclassified, so it is labelled with “?” as shown in Table 3, but it is very different from the case of unclassification from a probabilistic classifier. In probabilistic classification, the above case is due to a normal distribution (e.g. 50/50 for a twoclass classification problem) happening to an instance, i.e. maximum uncertainty is reached. In contrast, the phenomenon of the membership degree of 0 to all the classes indicates that the fuzzy classifier is confident that the instance does not belong to any classes, i.e. no evidence is found to assign the instance any nonzero values of membership degree to any one of the classes.
From a mathematical perspective, the above phenomenon can be explained by the case of incomplete mapping. In particular, a classifier is essentially a function that provides a discrete output after an input is provided. A function f is defined as a mapping from set A to set B, where A is the domain of f and the range of f is a subset of B. In this context, if a classifier does not represent a complete mapping, then there would be some truly existing classes (available in set B) but they are not in the range of this function f. In fact, reallife environments are generally imperfect, imprecise, incomplete and uncertain, so it is fairly possible that a set of predefined classes is not complete, and an extra class, which is not known yet, needs to be found to classify an instance.
Results sample on glass data
Class  1  2  3  4  5  6 

1  1  0  0.5  0  0  0 
2  0.70  0.79  0  0  0  0 
2  0  0.54  0  0  0  0 
2  0.75  1  0.29  0  0  0 
1  1  0  0  0  0  0 
2  0  0  0  0  0  0 
4  0  0  0  1  0  0 
2  0.07  1  0  0.77  0.53  0 
6  0  0.39  0  0.07  0  0.84 
5  0  0  0  0  0  0 
Results sample on Anurancalls1 data
Class  B  D  H  L 

D  0  1  0  0 
L  0  0  0.63  1 
L  0  0  1  1 
H  0  0  1  1 
L  0  0  0.45  1 
D  0  0  0  0 
D  0  1  0  0.72 
L  0  0  0.20  1 
L  0  0  0  0 
L  0  0  0  0.73 
5 Conclusions
In this paper, we proposed the use of fuzzy rulebased systems for recognitionintensive classification in the setting of granular computing. In particular, we treated the recognition of each class of instances as an independent task of learning and classification, and the class is viewed as the target class. When there are several target classes of instances that need to be recognized, fuzzy multitask learning becomes very suitable to not only identify the presence of the patterns of each target class but also measure the degree to which the patterns of a target class are present. The features of fuzzy multitask learning are highly required, especially when there is a large number of classes involved and the classification problem is not black and white.
The experimental results show that the classification performance of the fuzzy approach is fairly comparable to the ones of the probabilistic approaches (DT, NB, KNN and SVM), which indicate that the fuzzy approach can be used as the alternative ones to the probabilistic approaches. However, the probabilistic approaches would fall short in the aspects that are usually involved in recognitionintensive classification. In particular, the probabilistic approaches aim at learning classifiers that discriminate one class from other classes. As mentioned in Sect. 3.2, when the number of classes is very large or even massive, it would become very difficult to discriminate effectively between classes. Also, in the context of recognitionintensive classification, it is fairly possible that different classes are not mutually exclusive so there is no need to involve discrimination between classes.
In contrast, the fuzzy approach aims at training classifiers in the way of generative learning, i.e. each class is treated equally, and recognition of each class of instances is involved in an independent task of learning and classification, i.e. multitask learning. Therefore, the fuzzy approach is capable to deal effectively with a massive number of classes and to discover that an instance does not only belong to the target class but also to other classes. Furthermore, the fuzzy approach can also discover the case that an instance does not belong to any of the given classes and an extra class thus needs to be discovered. In fact, the above case is fairly possible to appear in reallife environments that are imperfect, imprecise, incomplete and uncertain.
In future work, we will investigate further the use of fuzzy rulebased systems for identifying the relationships between classes in the setting of granular computing, i.e. it is to identify the relationships between information granules where each class is viewed as a granule. In particular, following the completion of fuzzy multitask classification, all instances are assigned values of membership degree to the given classes. In this context, a secondary learning task for association (correlation) analysis can be undertaken, where each class is treated as an attribute (feature) and the membership degree value of each instance to this class is treated as a value of this feature. We will also look into the ensemble classification or data stream mining problems using fuzzy rulebased systems where the data instances are challenging, unpredictable and diverse embedded with newly arrived classes. In addition, it is worth to investigate the use of optimization techniques (Chen and Chien 2011; Chen and Kao 2013; Tsai et al. 2008, 2012; Chen and Chang 2011; Chen et al. 2013; Chen and Chung 2006; Chen and Huang 2003) for tuning the shapes of membership functions towards obtaining better performance of prediction.
Notes
Acknowledgements
The authors acknowledge support from the Social Data Science Lab at the Cardiff University and the Affective and Smart Computing Research Group at the Northumbria University.
References
 Ali N, Xavier L (2014) Person identification and gender classification using Gabor filters and fuzzy logic. Int J Electr Electron Data Commun 2(4):20–23Google Scholar
 Altrabsheh N, Cocea M, Fallahkhair S (2015) Predicting students’ emotions using machine learning techniques. Springer, Cham, pp 537–540Google Scholar
 Antonelli M, Ducange P, Lazzerini B, Marcelloni F (2016) Multiobjective evolutionary design of granular rulebased classifiers. Granul Comput 1(1):37–58CrossRefGoogle Scholar
 Bergadano F, Cutello V (1993) Learning membership functions. In: European conference on symbolic and quantitative approaches to reasoning and uncertainty, Granada, Spain, pp 25–32Google Scholar
 Berthold MR (2003) Mixed fuzzy rule formation. Int J Approx Reason 32:67–84MATHCrossRefGoogle Scholar
 Berthold MR, Wiswedel B, Gabriel TR (2013) Fuzzy logic in knime: modules for approximate reasoning. Int J Comput Intell Syst 6(1):34–45CrossRefGoogle Scholar
 Chatterjee K, Kar S (2017) Unified granularnumberbased ahpvikor multicriteria decision framework. Granul Comput 2(3):199–221CrossRefGoogle Scholar
 Chen SM (1996) A fuzzy reasoning approach for rulebased systems based on fuzzy logics. IEEE Trans Syst Man Cybern Part B Cybern 26(5):769–778CrossRefGoogle Scholar
 Chen SM, Chang TH (2001) Finding multiple possible critical paths using fuzzy pert. IEEE Trans Syst Man Cybern Part B Cybern 31(6):930–937CrossRefGoogle Scholar
 Chen SM, Chang YC (2011) Weighted fuzzy rule interpolation based on gabased weightlearning techniques. IEEE Trans Fuzzy Syst 19(4):729–744CrossRefGoogle Scholar
 Chen SM, Chen CD (2011) Handling forecasting problems based on highorder fuzzy logical relationships. Expert Syst Appl 38(4):3857–3864CrossRefGoogle Scholar
 Chen SM, Chien CY (2011) Parallelized genetic ant colony systems for solving the traveling salesman problem. Expert Syst Appl 38(4):3873–3883CrossRefGoogle Scholar
 Chen SM, Chung NY (2006) Forecasting enrollments of students by using fuzzy time series and genetic algorithms. Int J Inform Manag Sci 17(3):1–17MATHGoogle Scholar
 Chen SM, Huang CM (2003) Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms. IEEE Trans Fuzzy Syst 11(4):495–506CrossRefGoogle Scholar
 Chen SM, Kao PY (2013) Taiex forecasting based on fuzzy time series, particle swarm optimization techniques and support vector machines. Inform Sci 247:62–71MathSciNetCrossRefGoogle Scholar
 Chen SM, Lee SH, Lee CH (2001) A new method for generating fuzzy rules from numerical data for handling classification problems. Appl Artif Intell 15(7):645–664CrossRefGoogle Scholar
 Chen SM, Wang NY, Pan JS (2009) Forecasting enrollments using automatic clustering techniques and fuzzy logical relationships. Expert Syst Appl 36(8):11070–11076CrossRefGoogle Scholar
 Chen SM, Lee SH, Lee CH (2011) Fuzzy forecasting based on highorder fuzzy logical relationships and automatic clustering techniques. Expert Syst Appl 38(12):15425–15437CrossRefGoogle Scholar
 Chen SM, Chang YC, Pan JS (2013) Fuzzy rules interpolation for sparse fuzzy rulebased systems based on interval type2 gaussian fuzzy sets and genetic algorithms. IEEE Trans Fuzzy Syst 21(3):412–425CrossRefGoogle Scholar
 Colonna JG, Cristo M, Jnior MS, Nakamura EF (2015) An incremental technique for realtime bioacoustic signal segmentation. Expert Syst Appl 42(21):7367–7374CrossRefGoogle Scholar
 Cristianini N (2000) An introduction to support vector machines and other Kernelbased learning methods. Cambridge University Press, CambridgeMATHCrossRefGoogle Scholar
 de Campos DA, Bernardes J, Garrido A, de S JM, PereiraLeite L (2000) Sisporto 2.0 a program for automated analysis of cardiotocograms. J Matern Fetal Med 9(5):311–318Google Scholar
 Dubois D, Prade H (2016) Bridging gaps between several forms of granular computing. Granul Comput 1(2):115–126CrossRefGoogle Scholar
 Evett IW, Spiehler EJ (1987) Rule induction in forensic science. Technical report, central research establishment, home office forensic science serviceGoogle Scholar
 Guo G (2014) Gender classification. In: Encyclopedia of biometrics. Springer, New York, pp 1–6Google Scholar
 Horng YJ, Chen SM, Chang YC, Lee CH (2005) A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. IEEE Trans Fuzzy Syst 13(2):216–228CrossRefGoogle Scholar
 Hu H, Shi Z (2009) Machine learning as granular computing. In: IEEE International conference on granular computing, Nanchang, Beijing, pp 229–234Google Scholar
 Jefferson C, Liu H, Cocea M (2017) Fuzzy approach for sentiment analysis. In: IEEE International conference on fuzzy systems, Naples, ItalyGoogle Scholar
 Johnson B, Tateishi R, Xie Z (2012) Using geographicallyweighted variables for image classification. Remote Sens Lett 3(6):491–499CrossRefGoogle Scholar
 Kalua B, Mirchevska V, Dovgan E, Lutrek M, Gams M (2010) An agentbased approach to care in independent living. In: International joint conference on ambient intelligence, pp 177–186Google Scholar
 Kreinovich V (2016) Solving equations (and systems of equations) under uncertainty: how different practical problems lead to different mathematical and computational formulations. Granul Comput 1(3):171–179CrossRefGoogle Scholar
 Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 22 Oct 2017
 Lin F, Wu Y, Zhuang Y, Long X, Xu W (2016) Human gender classification: a review. Int J Biom 8(3–4). https://doi.org/10.1504/IJBM.2016.082604
 Liu H, Cocea M (2017a) Fuzzy information granulation towards interpretable sentiment analysis. Granul Comput 2(4):289–302CrossRefGoogle Scholar
 Liu H, Cocea M (2017b) Fuzzy rule based systems for interpretable sentiment analysis. In: International conference on advanced computational intelligence, Doha, Qatar, pp 129–136Google Scholar
 Liu H, Cocea M (2017c) Granular computing based approach for classification towards reduction of bias in ensemble learning. Granul Comput 2(3):131–139CrossRefGoogle Scholar
 Liu H, Cocea M (2018) Granular computing based machine learning: a big data processing approach. Springer, BerlinCrossRefGoogle Scholar
 Liu H, Gegov A (2015) Collaborative decision making by ensemble rule based classification systems, Springer, Switzerland, pp 245–264Google Scholar
 Liu H, Cocea M, Ding W (2017a) Decision tree learning based feature evaluation and selection for image classification. In: International conference on machine learning and cybernetics, Ningbo, ChinaGoogle Scholar
 Liu H, Cocea M, Mohasseb A, Bader M (2017b) Transformation of discriminative singletask classification into generative multitask classification in machine learning context. In: International conference on advanced computational intelligence. Doha, Qatar, pp 66–73Google Scholar
 Liu H, Cocea M, Ding W (2017) Multitask learning for intelligent data processing in granular computing context. Granul Comput (In press)Google Scholar
 Liu P, You X (2017) Probabilistic linguistic todim approach for multiple attribute decisionmaking. Granul Comput 2(4):332–342Google Scholar
 Livi L, Sadeghian A (2016) Granular computing, computational intelligence, and the analysis of nongeometric input spaces. Granul Comput 1(1):13–20CrossRefGoogle Scholar
 Mamdani E, Assilian S (1999) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Hum Comput Stud 51(2):135–147MATHCrossRefGoogle Scholar
 Min F, Xu J (2016) Semigreedy heuristics for feature selection with test cost constraints. Granul Comput 1(3):199–211CrossRefGoogle Scholar
 Pedrycz W, Chen SM (2011) Granular computing and intelligent systems: design with information granules of higher order and higher type. Springer, HeidelbergCrossRefGoogle Scholar
 Pedrycz W, Chen SM (2015) Information granularity, big data, and computational intelligence. Springer, HeidelbergCrossRefGoogle Scholar
 Pedrycz W, Chen SM (2016) Sentiment analysis and ontology engineering: an environment of computational intelligence. Springer, HeidelbergCrossRefGoogle Scholar
 Peters G, Weber R (2016) Dcc: a framework for dynamic granular clustering. Granul Comput 1(1):1–11CrossRefGoogle Scholar
 Quinlan RJ (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
 Reynolds K, Kontostathis A, Edwards L (2011) Using machine learning to detect cyberbullying. In: Proceedings of the 10th international conference on machine learning and applications, pp 241–244Google Scholar
 Rish I (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence vol 3(22), pp 41–46Google Scholar
 Skowron A, Jankowski A, Dutta S (2016) Interactive granular computing. Granul Comput 1(2):95–113MathSciNetMATHCrossRefGoogle Scholar
 Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300MATHCrossRefGoogle Scholar
 Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. AddisonWesley Longman Publishing Co., Inc, BostonGoogle Scholar
 Teng Z, Ren F, Kuroiwa S (2007) Emotion recognition from text based on the rough set theory and the support vector machines. In: International conference on natural language processing and knowledge engineering, Beijing, China, pp 36–41Google Scholar
 Tsai PW, Pan JS, Chen SM, Liao BY, Hao SP (2008) Parallel cat swarm optimization. In: Proceedings of the 2008 international conference on machine learning and cybernetics, Kunming, China, vol 6, pp 3328–3333Google Scholar
 Tsai PW, Pan JS, Chen SM, Liao BY (2012) Enhanced parallel cat swarm optimization based on the taguchi method. Expert Syst Appl 39(7):6309–6319CrossRefGoogle Scholar
 Wang Y, Yu H, Dong J, Stevens B, Liu H (2016) Facial expressionaware face frontalization. In: LNCS Proceedings of Asian conference on computer vision, Taibei, Taiwan, pp 375–388Google Scholar
 Wilke G, Portmann E (2016) Granular computing as a basis of human data interaction: a cognitive cities use case. Granul Comput 1(3):181–197CrossRefGoogle Scholar
 Wu J, Smith WA, Hancock ER (2011) Gender discriminating models from facial surface normals. Pattern Recognit 44(12):2871–2886MATHCrossRefGoogle Scholar
 Xu Z, Wang H (2016) Managing multigranularity linguistic information in qualitative group decision making: an overview. Granul Comput 1(1):21–35CrossRefGoogle Scholar
 Yao J (2005a) Information granulation and granular relationships. In: IEEE international conference on granular computing. Beijing, China, pp 326–329Google Scholar
 Yao Y (2005b) Perspectives of granular computing. In: Proceedings of 2005 IEEE international conference on granular computing, Beijing, China, pp 85–90Google Scholar
 Zadeh L (1965) Fuzzy sets. Inform Control 8(3):338–353MATHCrossRefGoogle Scholar
 Zhang J (1992) Selecting typical instances in instancebased learning. In: Proceedings of the ninth international workshop on machine learning, Aberdeen, United Kingdom, pp 470–479Google Scholar
 Zhang X, Yin Y, Yu H (2007) An application on text classification based on granular computing. Commun IIMA 7(2):1–8Google Scholar
 Zhao R, Zhou A, Mao K (2016) Automatic detection of cyberbullying on social networks based on bullying features. In: Proceedings of the 17th international conference on distributed computing and networkingGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.