Keywords

1 Introduction

Specific learning disorder (SLD) is a neurodevelopmental disorder characterized by difficulties in learning and using academic skills such as reading, writing and calculations despite the adequate socio-cultural opportunity, intact vision and hearing, normal intelligence and conventional schooling [5, 15, 17]. Dyscalculia is a type of SLD with specific impairment in Mathematics. It is an alternate term used to refer to a pattern of difficulties characterized by difficulty in numerical processing, learning and memorization of mathematical facts, mathematical reasoning and fluency [7]. About 5–15% of school-going children in India have SLD. There is a dearth of studies in India on SLD and Dyscalculia specifically.

The inability to process information can interfere with learning primitive skills such as reading, writing and/or mathematics. Along with those skills it can also interfere with much more complex skills such as time planning, attention, abstract reasoning, long or short term memory and organization. Learning disabilities are categorized into three types:

  1. a.

    Dyslexia: It is a type of learning disability that hinders a person’s ability to read. A dyslexic person faces problems while reading.

  2. b.

    Dysgraphia: It refers to the difficulty with writing. A dysgraphic person faces problems while thinking, writing their thoughts down, spelling, grammar and memory.

  3. c.

    Dyscalculia: It refers to the difficulty with calculations and mathematics. A person with Dyscalculia faces problems with numeric calculations and math reasoning. The three disabilities are very different yet very closely correlated and thus, it can become difficult to separate one from the other.

There are various tests to detect these learning disabilities, one of them is the Woodcock-Johnson Tests of Achievements. It is quite effective in detecting Dyslexia and Dysgraphia, but in a few cases, even the results of this test cause skepticism. In such cases, one has to turn to Curriculum-Based Tests [CBTs] or Wide Range Achievement Tests or both to identify Dyscalculia. Our goal is to use machine learning algorithms to accurately detect Dyscalculia.

2 Drawbacks of Existing System

The Woodcock-Johnson Tests of Achievement (WJ ACH) is a test designed to quantify the academic performance of not only children but also adults from age 2 to 95 and grades K.0 through 18.0. This test has 22 subtests for measuring five areas of academic achievement: reading, oral language, written language, math and knowledge. The standard battery comprises of seven subtests and the extended battery has 14. Additional subtests can provide supplemental scores. This study uses a math battery. It is ideal to examine progress in reading, writing and mathematics achievement areas [1].

Quantitative reasoning, computation skills, mental computation and math fluency are required to make mathematical calculations from addition to trigonometry. The test-taker is given a series of basic mathematical problems which include multiplication, division, decimals, fractions, basic algebra questions and so on. The Math and Calculations [Test 5], Applied Problems [Test 10], Quantitative Concepts [Test 18A] and Number Series [Test 18B] are not timed. The Math Fluency [Test 6] test is timed. The scores are evaluated on the basis of score received, grade and age. The invigilator assesses the types of mistakes made. If mistakes like 94 – 37 = 67 are made, the child may have difficulty in understanding rudimentary mathematical concepts, like carrying over and borrowing. But some children have a tendency to get the problem wrong even when they grasp the mathematical concepts they are working with. For example, after doing a couple of subtraction problems, a child may solve the third problem as a subtraction problem too even though it’s a division problem. This can point to attention issues [9].

The Wechsler IQ Test [WISC III and IV] has been administered to find the intellect of children. The test has a verbal and performance section. The score is calculated and noted down. If the score is beyond 130, it is considered excellent. A high score would be one that lies between 120 and 129. A score between 110 and 119 is considered moderate and average if it is less than 90. A score lower than 70 denotes borderline mental functionality and score below 69 denotes mental retardation [2].

Wide Range Achievement Test [WRAT] is an additional screening test that is used to determine if there is a need for a more inclusive achievement test. These tests refer to skills meant to be learned by individuals through direct instruction or intervention. It quantifies skills like spelling, reading and arithmetic. It is a timed test. It has two parts. The first part requires solving problems presented verbally related to reading number symbols, counting and solving arithmetic problems. The second consists of 40 arithmetic problems with a time frame of 15 min [3].

For the Diagnosis and Assessment of Dyscalculia, we have the Psychoeducational assessments and Curriculum-based Tests. Practical experience tells us that there are varying profiles of arithmetic skills. The test commonly used is the Woodcock-Johnson Psychoeducational battery designed to assess the basic arithmetic skills. The importance is not only paid to correct answers but also to the processing speed, solving strategies and qualitative assessment of performance. The above-mentioned procedure is followed at B.Y.L. Nair Ch. Hospital for assessment of dyscalculia and it may differ at different centers. The processing tests have not been standardized to the Indian children population and thereby results cannot be entirely relied upon and at times inconclusive despite our detailed clinical evaluation it indicates an impairment in mathematical skills. Curriculum-based Tests have to be invariably used to help identify children with Dyscalculia and supplement our findings. Being curriculum-based, it only determines if the child meets the learning objective of his/her grade level and does not fully reveal all the actual deficits. Also, the tests used are not able to delineate mathematical difficulties due to attentional/phonological deficits from the developmental dyscalculics and may lead to the overdiagnosis of Dyscalculia. The whole process of assessment hence is time consuming, tedious and vexing for the child.

Because of the inherent issues in the diagnostic tools available and the complex nature of Dyscalculia itself, it was necessary to develop a tool that would assist us in correctly assessing and diagnosing Dyscalculia in Indian children.

3 Literature Survey

The process of maturation of processing of numbers in our brain is a neuroplastic maturation process. It develops into a mature systematized complex neural network during the development from childhood to adolescence [11, 18]. After studying the functional images of the brain we can say that multiple areas of the brain are involved in numerical-arithmetic skills acquisition and operation. The maturity of these Domain-specific functions depends on the development of other areas such as attention and working memory (mental maths, multi-digit arithmetic), language, sensorimotor (finger counting) and visuospatial skills [11]. There may be a primary genetic vulnerability to impaired development of numerical functions, linguistics, visuospatial skills and executive functions or the maturation process being affected by epigenetic mediated environmental influences [8].

This justifies the correlation of Attention Deficit Hyperactivity Disorder [ADHD], Dyslexia with Dyscalculia. 20–60% of those with SLD have other learning difficulties/ disability such as ADHD and dyslexia [12, 14, 18]. Shavlev et al. [16], demonstrated Attention Deficit Disorder [ADD] in 32% of dyscalculics studied. Also, children with ADD noted to make mathematical errors secondary to impulsiveness and inattention. One empirical data noted the treatment of ADD with stimulant improved the calculating ability without any effect on rudimentary numerical skills [14].

Also, one study noted 52% variance in calculating ability was accounted by reading skills [10]. Therefore, deficient phonological skills in pre-school children were linked with unsatisfactory performance on calculation related questions in primary school [6]. Thus, a disorder of linguistic development is a risk factor for poor calculating ability [13].

4 Methodology

The Woodcock-Johnson Tests’ results are used as the input and training data for the machine learning algorithms. The input data contains the results of the Math and Calculations [Test 5], Math Fluency [Test 6], Applied Problems [Test 10], Quantitative Concepts[Test 18A] and Number Series [Test 18B] of Woodcock-Johnson and also the results of Wide Range Achievement Test [WRAT]. The test result of Woodcock-Johnson Tests uses aggregated result and it cannot precisely detect whether the patient is having dyscalculia or not. For our model focuses on each question rather than the aggregated score in a particular section. This allows looking at the trend of each question for a particular grade. The inputs for a particular question is considered 1 if the question is attempted and answered correctly, considered 0 if the question is not attempted and considered −1 if the question is attempted but incorrect. Using the same format for all tests, the input data set has been collected for 650 patients. Random Forest Classification Algorithm has been used to train the model. Two distinct models are created. One model uses the above-mentioned tests from Woodcock-Johnson along with the WRAT [549 cases] and the other model uses only Woodcock-Johnson test [650 cases]. The reason for creating two different models is that WRAT is not conducted for all the patients. While using the system, doctors will input the results of the test and the outcome will be predicted.

The emerging trends in Machine Learning and Data Science can be used in the health sector to predict the outcomes depending on the various results of the test. We are using the dataset of already diagnosed patients and the Random Forest algorithm to analyze and find out if the patient has Dyscalculia. Our system has been trained using the data collected from tests. The importance of all attributes has been determined. Figure 1 shows the flow diagram of the complete process of dyscalculia detection.

Fig. 1.
figure 1

Flow diagram.

The dataset consists of various factors that are responsible for the Dyscalculia of the child. Patients having dyscalculia face it difficult to solve certain questions of the WJ Test. Such questions are considered as an important factor for the distinction between a normal person and the person suffering from dyscalculia. Initially, the data of the prior patients were collected from the hospital. The data was then organized in CSV and later in the database and then we used it to train the model. The entire dataset has been split for training and testing. The following attributes are included in the dataset as shown in the table below Table 1.

Table 1. Dataset description

5 Algorithm

Classification is a supervised learning approach in machine learning. A classification model attempts to draw some conclusions from the observed values that it learns from the input data which is given to it and then it uses the learning from the input data to classify new observations. Classification problems are used to classify examples into a given set of categories [4]. Our system performs classification using supervised learning model to determine if the patient has dyscalculia.

5.1 Decision Trees

Regression and classification models are use cases of the decision tree. Decision trees work well with categorical as well as numerical data. The data set is broken down into subsets later the decision tree is incrementally developed. The tree consists of one root node, intermediate nodes and leaf nodes. A decision node has two or more branches. The leaf node represents the classification of the node. The intermediate nodes are the child and parent nodes of the nodes above and below it respectively. The topmost node is the root node and the best predictor. We have generated a decision tree using the Random Forest Algorithm considering all the features in the dataset. Figures 4 and 5 show the decision trees for the prediction of dyscalculia.

5.2 Random Forest

Random forests are an assembly of random decision forests, like decision trees they work well with tasks like classification and regression. The algorithm is used to construct multiple decision trees during the training time and it outputs the class that is the mode of the classes (classification) or mean prediction (regression) of individual trees. Random decision forests are used to correct the habit of decision trees of overfitting to their training data. For each decision tree, the importance of each node can be calculated using Gini importance (gi), but the assumption is that it is a binary tree, W rnode (right node) and W lnode (left node) is calculated using the Eq. 1.

$$\begin{aligned} g{i}_{j} = w_{lnode(j)}C_{lnode(j)} - w_{rnode(j) }C_{rnode(j)} \end{aligned}$$
(1)

The importance of feature is the decrease of node impurity divided by the probability of reaching towards the node. The probability of reaching towards the node can be calculated by the count of samples that reach the node, divided by the total count of samples. The importance of the feature can be determined by how higher is the value. Feature Importance (fx) for node y is calculated using the Eq. 2.

$$\begin{aligned} f{x}_{y} = \frac{ \sum _{y: node y splits on feature x}^{} gx_{y}}{ \sum _{z \in all nodes}^{} gx_{z}} \end{aligned}$$
(2)

Our system has made use of the above equations to calculate the feature importance which is shown in Fig. 3.

6 Results

The model has been trained and tested. It has an accuracy of 99.87% when trained without the results of the WRAT tests and 99.94% when trained with it. It highlights the importance of individual questions on the result of the test i.e. if the person has Dyscalculia or not. The result of the test is the confidence percentage of the model that the child has Dyscalculia.

6.1 Determination of Efficiency

The primary goal of the project is to find a set of questions from the current tests which will help to find out if the child has Dyscalculia. Figure 3 shows the importance of different features.

6.2 Determination of Accuracy

The accuracy of the model can be found out by splitting the dataset as training and testing data. Testing data can be the same as training data. Labels are not considered in the training data. We have split training and testing data in a 70:30 ratio. Patients of the same grade and similar IQ (intelligence quotient) are used in testing to compare the results. The accuracy in the context of whether it is actually detecting Dyscalculia can be determined when these sets of questions alone can help to predict Dyscalculia.

Furthermore, the model is tested on fresh data for new patients and it could successfully detect the Dyscalculia. Figure 3 shows the decision tree generated by the Random Forest Classification Algorithm. This tree includes WJ Johnson Test results along with WRAT results as input. Figure 4 shows the decision tree generated by the Random Forest Classification Algorithm. This tree includes only WJ Johnson Test results as input.

The graph in Fig. 5 shows the analysis of Question 27 of Test 5 and grade 9. Total patients having dyscalculia and having attempted question right, wrong or not attempted is specified in the graph. Question 27 of Test 5 was selected as it has high importance (Fig. 2).

Fig. 2.
figure 2

Caption

Fig. 3.
figure 3

Decision tree with WRAT as input.

Fig. 4.
figure 4

Decision tree without WRAT as input.

Fig. 5.
figure 5

Analysis of Grade 9 and Test 5 Question 27.

7 Conclusion

This tool will help in correctly assessing and diagnosing Dyscalculia among children in India. It will not only reduce the time spent on detecting Dyscalculia, but it will also ensure that the results are more efficient. The current process of assessment is time consuming, tedious and vexing for the child and therefore, a machine learning approach would save time for the medical experts and patients alike, thereby offering speedy diagnosis and earlier intervention in Dyscalculia.