Predicting Java Computer Programming Task Difficulty Levels Using EEG for Educational Environments

Palaniappan, Ramaswamy; Duraisingam, Aruna; Chinnaiah, Nithyakalyani; Murugappan, Murugappan

doi:10.1007/978-3-030-22419-6_32

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11580))

Included in the following conference series:

International Conference on Human-Computer Interaction

1852 Accesses
1 Citations

Abstract

Understanding how difficult a learning task is for a person allows teaching material to be appropriately designed to suit the person, especially for programming material. A first step for this would be to predict on the task difficulty level. While this is possible through subjective questionnaire, it could lead to misleading outcome and it would be better to do this by tapping the actual thought process in the brain while the subject is performing the task, which can be done using electroencephalogram. We set out on this objective and show that it is possible to predict easy and difficult levels of mental tasks when subjects are attempting to solve Java programming problems. Using a proposed confidence threshold, we obtained a classification performance of 87.05% thereby showing that it is possible to use brain data to determine the teaching material difficulty level which will be useful in educational environments.

You have full access to this open access chapter, Download conference paper PDF

Evaluation of Difficulty During Visual Programming Learning Using a Simple Electroencephalograph and Minecraft Educational Edition

Profiles in Brain Type in Programming Performance for Non-vocational Courses

Study of EEG characteristics while solving scientific problems with different mental effort

Article Open access 10 December 2021

Yanmei Zhu, Qian Wang & Li Zhang

Keywords

1 Introduction

Gerjets et al. [1] describes optimum learning conditions as providing learning at the appropriate level and pace for the learner. To be able to tailor the teaching material, it is first imperative to decide on the difficulty level of material as perceived by the learner. But assessment measures such as obtaining correct vs incorrect responses in the exams may not be a good indication to gauge the understanding of the students. Often, learners could miss-assume their level of understanding and therefore causing incorrect tailoring of the material, pace etc.

Therefore, it becomes necessary to utilise measures that can correctly predict the difficulty level. Subjective and dual-task procedures can be used for this purpose but likely to interrupt the subjects in-between the experiments and it may annoy them though it could produce less noisy data and provide promising results [2].

Electroencephalogram (EEG) is a suitable approach for unobstructive and continuous measure of the task level difficulty [3] as it can measure the brain’s response to the learning material presented and therefore offers a direct measure on the task difficulty level (TDL). Furthermore, EEG is non-invasive, portable and relatively cheap when compared to other measures of brain activity such as functional magnetic resonance imaging (fMRI).

Klimesch [4] has proposed using event related desynchronisation (ERD) feature extracted from the EEG as a measure of task difficulty level. ERD measures the extent to which neuron populations no longer oscillate synchronously to process the given task [5]. Band energies in specific EEG bands such as delta, alpha and beta in frontal areas of the brain have also been used to predict the memory load [6,7,8]. Here, we set out to use more channels to cover more areas of the brain and additionally combine inter-hemispheric asymmetry (ASR) features [9] as additional measure of cognitive load. We also use subjective measurement with NASA TLX index [10].

The band energies, ERD and ASR are used individually and in combination with six different classifiers: Quadratic Discriminant Analysis (QDA), Support Vector Machine (SVM), Naïve Bayes (NB), k-Nearest Neighbour (KNN), neural network (NN) and random forest decision tree (TREE) to classify the programming mental task into either the easy or the difficult levels. We also employ a confidence approach to further increase the prediction performance. Java programming language was used here as it is popular in Computer Science programmes throughout the world but any coding language could have been used instead.

2 Methodology

2.1 Experimental Paradigm

Nine subjects were recruited from a pool of postgraduate students from School of Computing, University of Kent, who had at least six months of Java experience or have taken Java programming module as a part of their postgraduate course. Out of nine subjects, seven were males and two females. Subjects age ranged between 20 and 37 years (mean = 26±3.74). However, data from two male subjects could not be used as they did not complete a baseline task which was necessary to compute the ERD features (discussed later).

Ethical approval was obtained from University of Kent Sciences Research Ethics Committee and subjects signed a voluntary consent form and were paid £15 each. The subjects were briefed on the tasks and the experiment was designed such that the subjects would be able to understand the given program and perform the code execution mentally. Subjects have to give the final output of the program code as an answer and this method was chosen to avoid inductive bias. All codes were written in Java programming language and initially, a total of 20 Java programs were developed into three categories (spatial relation, visual object grouping, mathematical execution), each from two different TDL (easy, difficult). From this, six Java programs as deemed easy and difficult by questionnaire responders were selected (three for easy and three for difficult categories).

The easy and difficult TDL were pre-determined using questionaire responses from 15 subjects who were not involved in the EEG data collection. The volunteers (age: 28.8±4.63, 9 males and 6 females, all non-related to University of Kent, who have sufficient experience in Java (currently working or proficient in Java - mean experience of 30.53±3.56 months). This good Java experience ensures correct ‘ground truth’ of choosing different task difficulty level; there was no statistical difference in age range for these and the volunteers for EEG based study. These subjects completed a questionnaire on time-spent and task difficulty level rating for each task. The difficulty level ranged from 1 to 10 (where 1 is very easy task, 10 is impossible to solve mentally). Only those questionaire with correct answers to the questions were considered. The different task categories to be solved were:

Spatial relation tasks that test subject’s spatial reasoning skills like visualising shape of objects mentally. For example, visualising two rectangle objects mentally using parameters of x-axis and y-axis coordinates, width and height and to solve whether the two rectangles overlaps or not.
Visual object grouping tasks that utilises subject’s working memory to recall the swapped, mapped or sorted shape of objects group correctly. For example, given a number of shape objects mapped to variables and grouped in an array in different order, subject has to map the object variable name with the shape objects correctly and output those objects in order.
Mathematical execution where the subject had to perform arithmetic calculations mentally. For example, subject has to compute the mean of an array of integers.

Prior to performing the tasks, subjects were asked to relax for one minute (EEG was also collected during this time as baseline). Table 1 shows the GUI steps in collecting the EEG data. Steps 3 and 4 will repeat until all six programs are shown (in random order). Figure 1 shows an example of the task screen.

Table 1. GUI sequence for the experiment

Full size table

This GUI not only serves as a front-end, but also communicates with the EEG collection device via COM port (emulated serial port) by sending different markers values for different user activities such as relax and task execution states. Table 2 gives the details of marker type and the sent value to EEG device during the experiment. This information can be used to segment the EEG into the different tasks.

Table 2. Marker values sent by GUI to EEG device

Full size table

Subjects were demonstrated the working of the GUI and were asked to perform the practise tasks in order to familiarise with the tool. Before the experiment was started, subjects sat comfortably. They were discouraged to make physical movement (example avoid blinking where possible, excessive swallowing or any hand gestures etc during the task experiment and to focus on the presented task while solving the program code. Figures 2, 3 and 4 show examples of the tested Java codes.

2.2 NASA TLX Survey

After solving each task, the subjects were instructed to fill a paper based NASA TLX rating sheet based on their perception on task difficulty level. NASA TLX index is a six dimensional subjective measurement method developed NASA to measure cognitive loads [10]. The six dimensional sub scales are mental demand, physical demand, temporal demand, performance, effort and frustration level. The workload is evaluated in two procedures for each task: first, subjects have to give their perspective in a sub-scale rating range from 0–100 (divided into 20 equal intervals) and second is the sub-scale weights created by forming 15 possible pair from six dimensional elements and subjects choose the most important dimension or factor contributing to the workload.

Here, after marking the six dimension ratings, the subject were instructed to circle the most important dimension that contributes to the task which is given in pairs as mentioned above. The overall Weighted Workload Score (WWS) is computed from the subjects rating and weight that contribute to the cognitive workload. This procedure is similar to the usage of NASA TLX Index form in the study done by Fritz et al. [11].

2.3 EEG Data

The EEG data was obtained from Emotiv Epoc 14 channels (configuration as shown in Fig. 5) wireless EEG device sampled at 128 Hz. During the experiment, the signal strength was continually checked and adjusted to ensure all the electrodes had good contact with the scalp through the use of saline solution.

The EEG data was segmented to one second lengths. Elliptic IIR filters were used to filter the segmented EEG signals in delta (1–4) Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), gamma (30–50 Hz) bands [12] and feature extraction techniques were performed on these segments. Eighty such segments were obtained for each category and as such there were 480 patterns from six tasks altogether (easy and difficult tasks from three categories).

2.4 EEG Analysis

ERD was computed by band pass filtering the EEG signal within the specified frequency band and percentage band power change was computed between the relaxed state and task execution state using (1):

$$ ERD_{b} = \left( {BE_{r} - BEtask_{b} } \right) / BE_{r} $$

(1)

where band energy during resting was computed using

$$ BE_{r} = \sum\limits_{i = 1}^{n} {\left( {x - \overline{x} } \right)^{2} } $$

(2)

and band energy during task using

$$ BEtask_{b} = \sum\limits_{i = 1}^{n} {\left( {x - \overline{x} } \right)^{2} } $$

(3)

with x is EEG data from each channel with length n from either the rest or task execution state and $ \overline{x} $ is the mean of each channel. Given 14 channels and 5 bands, there were 70 ERD features for each one second EEG.

The ASR of each spectral band was computed using (2) and as in [9]:

$$ ASR_{b} = \left( {BE_{left} - BE_{right} } \right)/\left( {BE_{left} - BE_{right} } \right) $$

(4)

where ASR is the asymmetric ratio between left and right hemispheres, BE_left is the spectral energy from one channel in left hemisphere (computed using (3)) and BE_right is spectral energy from opposite channel in right hemisphere. Since there were 14 channels (7 on each hemisphere) and 5 spectral bands, ASR resulted in a total of 35 features.

In addition, band energies (EN) for each channel in the five bands were also computed using (3) giving 70 features. Finally, all the available features were combined giving all feature (AF) set of 175 features.

2.5 Classification

These features were used by six different classifiers: QDA, SVM, NB, KNN, NN and TREE. For KNN, Euclidean distance was used whereas for QDA, the covariance matrices could vary among classes. TREE approach used an ensemble of 100 decision trees. For NN, the two output layer nodes values were set as either [1 0] or [0 1] with 10 hidden units (size chosen randomly) and trained using Matlab’s trainlm. For the rest, classifier default parameters as available in Matlab’s fitcsvm, fitcnb, fitcensemble, fitcdicsr, patternnet and fitcknn were used [13]. The two easy and difficult TDL were predicted with randomly split 40 fold cross validation.

Classifier Confidence.

The classifier confidence (CC) approach used here worked by computing the output of the classifier for the test data. From the results, it was found that NN gave the best performance for most subjects, so only the output of this classifier was used. Also, all the features gave the best performance for majority of the subjects, so these features were used. The two classifier outputs for each test pattern were checked and the predicted class was seen as confident only if the two outputs differed by at least 0.1. With perfect classification, the best outputs would differ by 1 since one output would have a value of 1 and the other value of 0. Hence, having a 10% threshold value of 0.1 is appropriate though this value would need to be experimented in future to obtain the best threshold. It should be noted that some data will be discarded where classification outputs have lower confidence than the threshold. Figure 6 shows the flow of the experimental design.

3 Results and Discussion

Figure 7 shows the overall WWS from NASA TLX for the different task difficulty levels. Non-parametric Kruskal-Wallis test (as normality was not assumed) showed that there is significant difference between TDL (p < 0.01). Comparing each sub-scale (refer to Table 3), there were significant differences (using sign rank tests, p < 0.01) between TDL for mental demand, temporal demand, frustration and effort. Performance and physical demand did not indicate any difference. The latter is not surprising since there is no physical effort required in the tasks, though it is somewhat surprising there was no difference in performance measure. This clearly indicates the necessity of utilising measures such as EEG as subjects are unable to differentiate different levels of performance required to complete the tasks.

Table 3. NASA TLX – subscale

Full size table

Kruskal-Wallis test showed that there is a statistically significant difference between easy and difficult tasks of EEG features (p < 0.05). Table 4 shows the classification results for EN, ERD, ASR and combined features for the five different classifiers for subject 1.

Table 4. Subject 1 results

Full size table

Similarly, Tables 5, 6, 7, 8, 9 and 10 show the results for rest of the subjects. To decide on the best classifier, all the features were combined and statistical test revealed significant difference between the classifier performances (p < 0.05). The mean rank comparison showed that NN classifier gave the best performance. It also gave the best performance for five out of seven subjects.

Table 5. Subject 2 results

Full size table

Table 6. Subject 3 results

Full size table

Table 7. Subject 4 results

Full size table

Table 8. Subject 5 results

Full size table

Table 9. Subject 6 results

Full size table

Table 10. Subject 7 results

Full size table

Next, using NN classification results (as it gave the best significant performance overall), significant difference was obtained in the classification accuracies between the different feature extraction approaches, H(3) = 26.33, p = 8.12e–6. Comparing the mean rank values (EN: 581.03, ERD: 576.88, ASR: 478.44, AF:605.66) showed that ERD features had the highest discriminatory information to separate the two mental tasks with combination of all features giving the best results. Using all the features also gave the best accuracy for six out of seven subjects with the remaining subject having ERD giving the best accuracy.

Using CC approach revealed further improvement in the classification performance. As NN gave the best performance, it was decided to use this classifier with the best performing all feature combination. Figure 8 shows the performance for the seven subjects and it can be seen that performances were higher when CC was used. The improvements were statistically significant for all subjects (sign rank test, p < 0.05) except subject 6. This is as expected since only the classification outputs that have slightly more confident predictions are being used (the experiment revealed that about 10% of patterns were dropped).

Table 11 shows the average response time (i.e. the time taken to complete the tasks). It can be seen that the difficult tasks take longer to complete as compared to easy tasks as expected.

Table 11. Average completion time (secs) for different task levels.

Full size table

This research was limited by significant noise in experiment design procedure with some subjects verbalising, flicking pens, nodding etc. Eye blinks occurred in the EEG data as shown in Fig. 9 (example shown for one subject but similar artifacts were observed for other subjects too). While these could have been removed in the pre-processing stage (for example using independent component analysis), we chose not to in order to simulate actual classroom settings where it will be difficult to force students to adhere to strict no-movement instruction.

4 Conclusion

Both NASA TLX and task completion time showed significant differences between TDL. NASA TLX has been used as a non-physiological measure to discriminate different cognitive load for different programming language [14]. However, based on the lack of statistical difference in the performance measure in TLX sub scale, we can infer that it is difficult for subjects to estimate the TDL, hence showing the necessity to have measures that directly measure the ability.

In this report, we have shown that it is possible to differentiate the task difficulty of Java programming code using EEG signals. Though the subject pool is small and the performance needs improvement for real-life implementation, there is sufficient promise in the method to be studied further. The combination of proposed ASR with ERD and EN features improves the classification performance and among the tested classifiers, NN gave the best performance. The use of CC approach further improved the performance to give a maximum accuracy of 87.05%. It is possible that with proper feature selection and tuning of classifier parameters could further improve the accuracy.

In conclusion, the findings here will hopefully pave the way for future research studies on tailoring learning material with appropriate level of difficulty, which will be especially useful for those with independent learning plans.

References

Gerjets, P., Walter, C., Rosenstiel, W., Bogdan, M., Zander, T.O.: Cognitive state monitoring and the design of adaptive instruction in digital environments: lessons learned from cognitive workload assessment using a passive brain-computer interface approach. Front. Neurosci. 8, 1–21 (2014)
Article Google Scholar
Mihalcaa, L., Salden, R.J.C.M., Corbalan, G., Paas, F., Miclea, M.: Effectiveness of cognitive-load based adaptive instruction in genetics education. Comput. Hum. Behav. 27, 82–88 (2011)
Article Google Scholar
Lee, J.C., Tan, D.S.: Using a low-cost electroencephalograph for task classification in HCI research. In: Proceedings of the 19th ACM Symposium on User Interface Software and Technology, pp. 81–90 (2006)
Google Scholar
Klimesch, W.: EEG alpha and theta oscillations reflect cognitive and memory performance: a review and analysis. Brain Res. Rev. 29(2–3), 169–195 (1999)
Article Google Scholar
Crk, I., Kluthe, T., Stefik, A.: Understanding programming expertise: an empirical study of phasic brain wave changes. ACM Trans. Comput.-Hum. Interact. 23(1), 2 (2015)
Article Google Scholar
Zarjam, P., Epps, J., Chen, F.: Characterizing working memory load using EEG delta activity. In: Proceedings of 19th European Signal Processing Conference, pp. 1554–1558 (2011)
Google Scholar
Jensen, O., Tesche, C.D.: Frontal theta activity in humans increases with memory load in a working memory task. Eur. J. Neurosci. 15(8), 1395–1399 (2002)
Article Google Scholar
Klimesch, W.: Alpha-band oscillations, attention, and controlled access to stored information. Trends Cogn. Sci. 16(12), 606–617 (2012)
Article Google Scholar
Duraisingam, A., Palaniappan, R., Samraj, A.: Cognitive task difficulty analysis using EEG and data mining. In: IEEE Conference on Emerging Devices and Smart Systems, Salem, India (2017)
Google Scholar
Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Adv. Psychol. 52, 139–183 (1988)
Article Google Scholar
Fritz, T., Begel, A., Müller, S.C., Yigit-Elliott, S., Züger, M.: Using psycho-physiological measures to assess task difficulty in software development. In: ACM Proceedings of the 36th International Conference on Software Engineering, pp. 402–413 (2014)
Google Scholar
Palaniappan, R.: Utilizing gamma band spectral power to improve mental task based brain computer interface design. IEEE Trans. Neural Syst. Rehabil. Eng. 14(3), 299–303 (2006)
Article Google Scholar
Mathworks, Statistics and Machine Learning Toolbox (2018)
Google Scholar
Yousoof, M., Sapiyan, M.: Optimizing instruction for learning computer programming – a novel approach. In: Intan, R., Chi, C.-H., Palit, H.N., Santoso, L.W. (eds.) ICSIIT 2015. CCIS, vol. 516, pp. 128–139. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46742-8_12
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Data Science Research Group, School of Computing, University of Kent, Canterbury, UK
Ramaswamy Palaniappan, Aruna Duraisingam & Nithyakalyani Chinnaiah
Mayfield Grammar School, Gravesend, Kent, UK
Nithyakalyani Chinnaiah
Kuwait College of Science and Technology, Doha, Kuwait
Murugappan Murugappan

Authors

Ramaswamy Palaniappan
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Duraisingam
View author publications
You can also search for this author in PubMed Google Scholar
Nithyakalyani Chinnaiah
View author publications
You can also search for this author in PubMed Google Scholar
Murugappan Murugappan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramaswamy Palaniappan .

Editor information

Editors and Affiliations

Soar Technology Inc., Orlando, FL, USA
Dylan D. Schmorrow
Design Interactive, Inc., Orlando, FL, USA
Cali M. Fidopiastis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Palaniappan, R., Duraisingam, A., Chinnaiah, N., Murugappan, M. (2019). Predicting Java Computer Programming Task Difficulty Levels Using EEG for Educational Environments. In: Schmorrow, D., Fidopiastis, C. (eds) Augmented Cognition. HCII 2019. Lecture Notes in Computer Science(), vol 11580. Springer, Cham. https://doi.org/10.1007/978-3-030-22419-6_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-22419-6_32
Published: 20 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22418-9
Online ISBN: 978-3-030-22419-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Java Computer Programming Task Difficulty Levels Using EEG for Educational Environments

Abstract

Similar content being viewed by others

Evaluation of Difficulty During Visual Programming Learning Using a Simple Electroencephalograph and Minecraft Educational Edition

Profiles in Brain Type in Programming Performance for Non-vocational Courses

Study of EEG characteristics while solving scientific problems with different mental effort

Keywords

1 Introduction