Keywords

1 Introduction

Autistic Spectrum Disorder (ASD) is characterized by deficiencies related to social communication and restricted, repetitive and stereotyped patterns of behaviors, interests or activities. Even present in people of the community, this disorder does not prevent that the person has an average life compared with the other people. Aspects of ASD can be noted in children with a delayed speech in isolated words or simple phrases, and 25% of children lose previously acquired language skills (regression) [15], even with the advancement of diagnosis in very young children, there is occasionally the recognition of the disorder in children at the beginning of their school life or deep reflexes during adolescence or in adulthood [23]. Since there is so little information, families fear that growth will bring new problems and can be high anxiety about how the child will cope when you can no longer care for them [13]. Diagnosis can occur in people with early ages, adolescents or even adults, and may present symptoms and in several cases, these adults live with ASD years without specialized monitoring.

With the objective of identifying and producing techniques in the diagnosed of autism in children, adolescents and adults, Thabtah presents a review of techniques to facilitate the collection of information to present them and, consequently, a mobile application [36] was developed to facilitate and disseminate diagnostic techniques for autism. The application that can be downloaded via mobile devices contains questions that help identify the characteristics of autism, becoming a popular form of data collection to form bases for the training of algorithms and more usual for detection of this type of diagnosis. The application allows parents to see if the traces of this mental illness are in their children. In order to use the techniques of diagnosed of autism in adults, they were used with the use of a decision tree, reaching between 95.85% and 95.73% accuracy on how the algorithm correctly classifies a patient with or without autism [6].

Many studies have been conducted to identify traits of autism in people through mobile devices, where the work done by [1, 4, 6, 12, 29, 32, 37, 39, 40]. These approaches allow intelligent systems to be created to aid in the diagnosis of people who have such cognitive definitions. These original structures facilitate the decision-making of specialists in the correct diagnosis to the patients, with a decrease in risk of failure and taking into account the time provided to the patient in a future medical follow-up. The negative point was that studies that use decision trees (knowledge base), where it should be modeled each time a new factor is inserted in the evaluation of the problem. Other factors that can stand out as elements that do not contribute to the stability of the decision trees are the linear, and perpendicular limits to the axes and the high sensitivity to small perturbations in the training set generate very different networks.

In order to overcome these problems, specifically in the set of techniques in the diagnosed of autism in adults, this work proposes the use of fuzzy neural networks (FNN) in order to extract diffuse rules and maintain a high level of precision for the evaluated results, allowing the interpretability of the database to be extracted, comparing its results with machine learning algorithms that are provided by a tool developed in Java called WEKA [19]. They can perform efficient training, extract information from the problem data and maintain high accuracy results. Do they effectively act in solving problems of various natures of science as in the pattern classification [8, 9, 11, 33, 35], linear regression [7], time series forecasting [10], industry issues [25],and also for problem-solving in healthcare [17, 18] and software effort [34] and SQL Injection attack [2, 3]. We can find the model used by the fuzzy neural networks concerning the diagnosed of autistic children [6] also with ASD. The paper is organized as follows: In Sect. 2 are present the concepts of the literature that make the study carried out in this work, highlighting aspects of autism, a review of the concepts of intelligent hybrid systems. In Sect. 3, the main concepts of the model used for the tests are presented. In Sect. 4 the tests and the configurations of the same are presented to the reader so that finally, in Sect. 5 the conclusion about the experiments and future works are highlighted in the paper.

2 Literature Review

2.1 Autistic Spectrum Disorder (ASD)

Autistic Spectrum Disorder (ASD) encompasses different syndromes marked by neurodevelopmental disorders with fundamental characteristics, which can be manifested together or in isolation. The difficulty of communication due to lack of mastery in speech, difficulty in social interaction and a pattern of restrictive and repetitive behavior [15]. A large proportion of individuals of all ages who require ASD treatment can still be found. When performing the early diagnosis, the following impacts on the life of this young person can be controlled with the accompaniment and treatment of professionals. However, it has always been difficult to obtain accurate data so that health professionals could efficiently diagnose the patients [38, 41]. Already in the adult phase, the diagnosis can be considered a rare event [24], however, it generates concern, associated with it can lead to other problems, such as the search for psychiatric consultations, which reveals a lifetime consultation rate of 78%, more often with depression.

2.2 Fuzzy Neural Networks: Principal Concepts

The fuzzy neural networks are a group of hybrid models that can unify in a model the interpretability characteristics of fuzzy systems and the capacity of training and generalization of parameters provided by artificial neural networks. These models can work harmonically to solve problems of linear regression, classification of patterns, diagnosed of time series and other problems of a specific nature with an excellent ability to interpret aspects of the user database of the problem, identifying existing patterns that make up a set of fuzzy rules capable of transforming numerical information into linguistic contexts that can be interpreted by people who do not necessarily know about the main techniques of artificial intelligence. This hybrid technique provides a solid knowledge base on the evaluated database, allowing fuzzy rules of the TSK-type [44] to be obtained through the model. These techniques generally have at least one hidden layer, an output layer, which is also known as the defuzzification layer and the input layer, also called the fuzzification layer. In this fuzzy process, the data submitted to the model are sampled in the space of characteristics a through fuzzification techniques they are transformed into representations of the problem in the sample space. Usually, these approaches are made by cluster or grid algorithms [28].

3 Pruning Fuzzy Neural Network for Detection of Autistic Persons

3.1 Network Architecture

The architecture of the model follows the same definitions proposed by the paper [5]. It has three layers, where the first layer is responsible for the fuzzification process through the grid technique with equally spaced membership functions (which may be Gaussian or Triangular) [22]. Already second layer logical neurons type III do the aggregation of all the fuzzy neurons of the first layer. This logical neuron can extract fuzzy rules from the database, besides facilitating the aggregation and determination of more relevant rules. To filter the least-needed rules for the model, a pruning approach is used in the second layer of the model. The concepts of f-scores applied in [14] make the selection of the most important rules for the model, making the model simpler and with a more cohesive architecture. In this model, the first two layers represent a fuzzy inference system, and the third layer is represented by a single artificial neuron capable of performing binary pattern classification tasks, which is the present context in this classification of adults with characteristics of autism. This single neuron can be seen as a singleton, and its weights are obtained through least squares parameterization techniques. The Extreme Learning Machine [21] is present in the model to perform merely the definition of the synaptic weights that will compose the artificial neural network of aggregation in the third layer.

The first layer is composed of neurons whose activation functions are membership functions of fuzzy sets defined for the input variables. This means that the creation of neurons in the first layer is an exponential relation of high computational cost since it is related between the number of membership functions chosen and the number of features of the base. As the base of autistic adult people has a high number of dimensions to be evaluated, a technique was applied that delimits the relation of neurons in the first layer in 500, where the relation of the number of functions of membership and dimensions is established in a probabilistic way, where not always for each dimension has precisely the number of membership functions chosen but rather the maximum value of M (in this case, for example, if M = 3 was chosen, it may be that in the first dimension only two membership functions and so on up to the limit of 500 relations). So for each input variable \(x_{ij}\), L fuzzy neurons (or membership functions) are defined \(A_{lj}\), l = 1, ... L whose membership functions are the activation functions of the corresponding neurons. Thus, the outputs of the first layer are the membership degrees associated with the input values, i.e., \(a_{jl}\) = \(\mu ^A_l\) for j = 1... N and l = 1 ... L, where N is the number of inputs and L is the number of fuzzy sets for each input results by ANFIS [5] which through IF-THEN fuzzy rules incorporates the knowledge of an expert in a computational system through the database used in the research. It is implemented through a distributed parallel architecture in such a way that the learning paradigms can be harnessed allowing the integration of implicit (dataset) and explicit knowledge (common sense, preliminary knowledge or a specialist) for the resolution and adequate explanation of the problem [22].

The second layer is composed by L fuzzy orneurons, a fuzzy neuron composed of a fuzzy input, in this case, the neurons of the first layer and a numerical weight and bias, allowing the aggregation of several inputs with specific weights using s-norm and t-norm type fuzzy operators. Each neuron performs a weighted aggregation of some of the first layer outputs. This aggregation is performed using the weights \(w_{il}\) (for i = 1... N and l = 1... L). For each input variable j, only one first layer output \(a_{jl}\) is defined as input of the l-th neuron. So that w is sparse (that is, with many weights equal to zero), each neuron of the second layer is associated with an input variable. Finally, the output layer is composed of one neuron whose activation functions are ReLu [26]. The output of the model is:

$$\begin{aligned} y= \sum _{j=0}^{l} f_{ReLU}(z_lv_l) \end{aligned}$$
(1)

where \(z_0\) = 1, \(v_0\) is the bias, and \(z_j\) and \(v_j\), j = 1, ..., l are the output of each fuzzy neuron of the second layer and their corresponding weight, respectively. Figure 1 presents an example of FNN architecture proposed in this paper. We can see that the first layer has fuzzy neurons produced through the relation between the dimensions and the number of membership functions chosen. In the second layer, the orneuron is highlighted and finally in the third layer a neuron of type ReLU. The architecture of this model is very similar to the one presented in [5], highlighting the changes in the second and third layers.

Fig. 1.
figure 1

Fuzzy neural network architecture

ReLU is the most widely used activation function when treating nonlinearity in artificial neural networks today. First, the ReLU function is non-linear, which means that we can easily copy the errors back and have multiple layers of neurons activated by the ReLU function. In this case, only the third layer of the fuzzy neural network uses a neuron with the activation function [26]. The main advantage of using the ReLU function over other activation functions is that it does not activate all neurons at the same time. This facilitates the application of the neurons of the second layer, discarding the activation of neurons not so significant for the responses of the model. This function acts in the process of defuzzification the model coherently and precisely [26]. ReLU can be represented by:

$$\begin{aligned} f_{ReLU}(z)=max(0,z) \end{aligned}$$
(2)

The logical neurons used in the second layer of the model are of the orneuron type what is a functional units able to perform multivariate nonlinear operations in unit hypercubes, where the input signals are individually combined with the weights and performed the subsequent global combining. The orneuron used in this paper can be expressed as [27]:

$$\begin{aligned} z=OR (w;z)=S^n_{i=1} (w_i t z_i) \end{aligned}$$
(3)

where T are t-norms (product), s is a s-norms (probabilistic sum). Fuzzy rules 4 can be extracted from orneurons according to the following example:

$$\begin{aligned} \begin{aligned} Rule_1: If \ x_{i1}\ is\ A_1^1 \ with \ certainty \ w_{11}...\\ or \ x_{i2} \ is \ A_1^2 \ with \ certainty \ w_{21} ...\\ Then\ y_1 \ is \ v_1\\ Rule_2: If\ x_{i1} \ is \ A_2^1 \ with \ certainty \ w_{12} ...\\ or \ x_{i2} \ is \ A_2^2 \ with \ certainty \ w_{22} ...\\ Then \ y_2 \ is \ v_2\\ \end{aligned} \end{aligned}$$
(4)

3.2 Pruning Method - F-Score

Given the ith feature vector (in the case of the FNN the z-vector representing orneurons) with the number of positive instances \(n_{+}\), negative instances \(n_{+}\) and the number of all the instances N, the F-score value of the i-th feature is defined by [14]:

$$\begin{aligned} F(i) = \frac{{{{(\overline{x} _i^{(+)} - {{\overline{x}}_i})}^2} + {{(\overline{x} _i^{(-)} - {{\overline{x}}_i})}^2}}}{{\frac{1}{{{n_ +} - 1}}\sum \limits _{k = 1}^{{n_ +}} {{{(x_{k,i}^{(+)} - \overline{x} _i^{(+)})}^2} + \frac{1}{{{n_ -} - 1}}\sum \limits _{k = 1}^{{n_ -}} {{{(x_{k,i}^{(-)} - \overline{x} _i^{(-)})}^2}}}}}, \end{aligned}$$
(5)

Where \(\overline{x} _i^{(+)}\), \(\overline{x} _i^{(-)}\) and \({\overline{x}}_i\) are the mean of the positive, negative and whole samples, respectively, and is the k-th feature value in the i-faith feature vector. The numerator means the differentiation between the positive and negative sets, and the denominator is the sum of the deviation within each set of resources. This may even imply that the higher the F-score indicates that the resource has more discriminative power. We adopted the F-score method in this study due to the simplicity of its use and mainly in the pure determination of relevance of the fuzzy rules present in the second layer of the problem. This method is non-parametric, making it fast and less complicated to use in complex databases [14].

3.3 Training of the Model Based on Extreme Learning Machine

Subsequently, following the determination of the network topology, the diagnosed of the evaluation of the vector of weights’ output layer are performed. The Moore-Penrose pseudo-Inverse [16] estimates the weight vector was proposed by [21] to facilitate the definition of intelligent model parameters through a single step. Extreme Learning Machine (ELM) concepts can be adapted in this FNN model as:

$$\begin{aligned} \mathbf v = \mathbf Z ^{+}{} \mathbf y \end{aligned}$$
(6)

where \({Z^{+}}\) is the Moore-Penrose pseudo Inverse of z, which represents the outputs of the orneurons of the second layer. This approach is interpreted by the minimum norm of the least squares solution for the output weights.

This solution allows that the weights for the third layer aggregation neural network can act as a facilitator for the defuzzification process, allowing the transformed fuzzy representation in numerical outputs to be compared with the supervised training process, with the expected labels for the samples evaluated.

4 Tests

This section presents the settings for the identification test for adults with autism. Aspects related to the database, configurations, evaluation metrics and tools used in the tests will be presented below.

4.1 Assumptions and Initial Test Configurations

For the evaluation of the model presented in this paper, 70% of dataset is destined for the training and 30% of the samples were used for validation with all combinations of M = [3, 4, 5, 6, 7] in a cross-validation process evaluating the result that maximizes the training accuracy. The result of the cross-validation test evaluated the best value of accuracy, selecting the best value of M which will be used in comparative tests with other intelligent models. The outputs of the database were normalized to −1 and 1 to aid the correct calculations. The factors evaluated in this paper are as follows:

$$\begin{aligned} accuracy=\frac{TP+TN}{TP+FN+TN+FP} \end{aligned}$$
(7)
$$\begin{aligned} sensitivity=\frac{TP}{TP+FN} \end{aligned}$$
(8)
$$\begin{aligned} specificity=\frac{TN}{TN+TP} \end{aligned}$$
(9)
$$\begin{aligned} AUC=\frac{1}{2}(sensitivity+specificity) \end{aligned}$$
(10)

where, \(TP =\) true positive, \(TN=\) true negative, \(FN =\) false negative and \(FP=\) false positive.

In order to evaluate the effectiveness of the FNN model, we performed tests in the WEKA [19] Software, with the main classifier algorithms: MLP (Multilayer Perceptron) [31], J48 (J48) [42], Naive Bayes (NB) [30], Zero Rule (ZR) [43] and Random Tree (RT) [20]. These models were applied to adult autism classification tests with the initial WEKA settings, also using 70% of the samples to train the model and 30% of them for validation. All samples were resampled, and a total of 30 replicates were also applied in each model.

4.2 Dataset Used in the Tests.

In sequence to extract the desired classifications of the models being tested, the database collected and made available by [36], Is composed of tests of users aged 17 and over - adults - the database is formed by information and the result of 10 multiple choice queries, depending on the use of the mobile application, information from people from various countries of the world were collected. The data set gathers information from 701 people. As its characteristic, the FNN model, corresponds only to numerical type attributes, when necessary, literal attributes were changed to numeric. Database uses the following dimensionsFootnote 1:

  • Result Test = minimum of 0 and a maximum of 10 with mean in 4.89 and standard deviation of 2.49;

  • Age = minimum of 17 years and a maximum of 64 years with mean in 29.19 and standard deviation of 9.71 years;

  • Sex = 366 male and 49 female;

  • Ethnicity = 232 White-European, 24 Latino, 43 Black, 123 Asian, 92 Middle-Eastern, 11 Pasifika, 36 South Asian, 13 Hispanic, 6 Turkish and 121 Other;

  • Jundice = 632 not have jundice and 69 have the symptoms of the disease;

  • Autism in Family = 610 not and 91 yes;

  • Country = 67 countries, with a predominance of people in United States, New Zealand, United Arab Emirates, United Kingdom and India;

  • Used App Before = 689 not and 12 yes;

  • Relationship With The Patient = 521 Self, 50 Parent, 4 Health professional, 28 Relative, 98 Others;

  • The outcome of the treatment = 512 results of people without ASD and 189 people who present the characteristics of the disorder.

4.3 Binary Pattern Classification Tests

The Table 1 the pattern classification results for the adult autism dataset. Also unique are the tables that present the accuracy and final neurons after pruning and time is in seconds. The results highlighted in the table are the best results obtained in the test or equivalents statistically within the range of standard deviation analyzed.

Table 1. Accuracies of the model in the tests performed.

In the tests, we can verify that the neural network models MLP and J48 present excellent results in the identification of patients with autism. Closely followed is the model proposed in this paper that has statistically equivalent results to the models that obtained the best results. The exciting thing is that the proposal of this paper can achieve statistically equal results and aggregate the answers to knowledge extraction from the database used in the test.

figure a

In general, the only model that can not be identified as a suitable classifier for this type of problem is the ZR. The others presented good results, but without highlighting a knowledge obtained from the problem.

The FNN model of this article worked with very good execution time, especially compared to MLP that has its same layered architecture structure. The weights found by ELM allow an evaluation of the importance of the rule for the evaluation of a patient. The following fuzzy rules Algorithm 1 were derived from the model to exemplify the ability to abstract knowledge from a basis that the FNN possesses.

5 Conclusion

After performing the tests and extracting the fuzzy rules, we can conclude that the fuzzy neural network model presented in this paper can act as a tool to help physicians who diagnose adults with autism. Of course, the role of the specialist is crucial, but the more tools he has, the faster the diagnosis can be given, and the treatment can be started. Future work may address other fuzzy neural network architectures, broaden the assessment of people of different ages, and seek confirmation of the rules drawn from the problem with a specialist in the field.